Change Picture

Upload a photo from your computer

Click or drag & drop an image

PNG, JPG, GIF, WebP — max 8 MB

or paste URL

Welcome back

Sign in to your RewardGuard account

Catch Reward Hacking Before It Costs You

Stop RL Agents from
Exploiting Your
Reward Function

RewardGuard catches misaligned incentives, training stagnation, and reward hacking in your RL — before it derails your model.

rewardguard — python3
import rewardguard as rg # Initialize with expected reward distribution monitor = rg.Monitor( expected={"task": 0.7, "safety": 0.3}, tolerance=5.0, )
0 Downloads on PyPI
0 Training episodes analyzed
0 Detection accuracy

From raw logs to
aligned AI in 4 steps

Integrate RewardGuard into your existing workflow in minutes — no infrastructure changes needed.

1

Install the Library

Install RewardGuard via pip and import it directly into your Python training script. Works with any RL framework.

2

Analyze Dynamics

Deep analysis of reward signals, distributions, and temporal patterns.

3

Detect Issues

Instantly flags reward hacking, misalignment, and stagnation patterns.

4

Fix & Align

Get recommendations or let Premium auto-adjust your reward parameters.

Everything you need
to trust your AI

Real-Time Reward Analysis

Monitor reward distributions live as your model trains. Catch anomalies the instant they appear — not after 100,000 wasted compute steps.

# Live output stream Step 48200: ✓ Reward distribution nominal Step 48400: ⚠ Hacking pattern detected (94% confidence) Step 48400: Fix: clip_reward_range(−10, +10)

AI Safety Guardrails

Active protection against reward hacking during production training runs.

99.7%

Detection accuracy across 2M+ analyzed episodes

Trend Detection

Visualize reward trends and spot training stagnation before it costs you compute budget.

Alignment Reports

Export detailed PDF reports for documentation and auditing purposes.

Auto-Adjust Premium

Automatically tunes reward weights mid-run. Zero manual intervention required.

Simple, transparent pricing

Start free. Detect and prevent reward hacking in your RL.

Free

For research, experimentation, and side projects.

$0 forever
  • Basic reward signal analysis
  • Trend detection & visualization
  • Reward hacking alerts
  • Actionable recommendations
  • Up to 100k training steps
Get Started Free
Pay as you go

Premium

Buy analysis credits once — no subscription, no recurring charge.

$19.90 starter pack 105,000 credits · never expire
  • Everything in Free, plus:
  • Automatic parameter adjustment
  • Dynamic reward rebalancing
  • Continuous training monitoring
  • Starter Pack $19.90 · $49.90 · $99.90 · $299.90 bundles
  • Priority email & chat support
  • Advanced anomaly detection
Get Premium Credits

Up and running in minutes

Everything you need to integrate RewardGuard into your existing RL training workflow.

1

Install the package

Add the free or premium package to your environment.

2

Import & Configure

Initialize with your environment settings.

3

Analyze Training Logs

Run analysis on your existing training data.

4

Auto-Fix (Premium)

Enable automatic reward parameter adjustment.

pip install rewardguard pip install rewardguard-premium
import rewardguard as rg # Define expected reward component distribution monitor = rg.Monitor( expected={"task": 0.7, "safety": 0.3}, tolerance=5.0, window=200, )
# Record rewards every step for episode in range(num_episodes): for step in range(max_steps): r_task, r_safety = env.step(action) monitor.step({"task": r_task, "safety": r_safety}) # Inspect results result = monitor.check() print(result.severity) # "ok" / "warning" / "critical" print(result.suggested_reward_weights) monitor.print_report()
# Premium: automatic reward correction from rewardguard_premium import AutoMonitor monitor = AutoMonitor( expected={"task": 0.7, "safety": 0.3}, baseline_steps=500, auto_correct=True, ) for step_idx in range(total_steps): rewards = env.step(action) snapshot = monitor.step(rewards) if snapshot and snapshot.corrections_applied: env.set_reward_weights(monitor.weights) monitor.to_csv("audit.csv")

AI should do what you
actually want it to do

Misaligned AI is not a future risk — it's happening right now in every RL training run. RewardGuard is our answer.

Safety First

Every feature we build starts with one question: does this make AI systems safer and more predictable?

Reward Alignment

The reward function is the soul of your AI. We ensure it incentivizes the right behaviors, not just high scores through shortcuts.

Full Transparency

Understand exactly what your AI is learning at every step. Clear insights, no black boxes.

Get in Touch

Questions? We'd love
to hear from you.

Fill in the form and we'll get back to you as soon as possible.