Catch Reward Hacking Before It Costs You

Stop RL Agents from
Exploiting Your
Reward Function

RewardGuard catches misaligned incentives, training stagnation, and reward hacking in your RL — before it derails your model.

Get Started Free Upgrade to Premium

rewardguard — python3

import rewardguard as rg # Initialize with expected reward distribution monitor = rg.Monitor( expected={"task": 0.7, "safety": 0.3}, tolerance=5.0, )

How it works

From raw logs to
aligned AI in 4 steps

Integrate RewardGuard into your existing workflow in minutes — no infrastructure changes needed.

Install the Library

Install RewardGuard via pip and import it directly into your Python training script. Works with any RL framework.

Analyze Dynamics

Deep analysis of reward signals, distributions, and temporal patterns.

Detect Issues

Instantly flags reward hacking, misalignment, and stagnation patterns.

Fix & Align

Get recommendations or let Premium auto-adjust your reward parameters.

Features

Everything you need
to trust your AI

Real-Time Reward Analysis

Monitor reward distributions live as your model trains. Catch anomalies the instant they appear — not after 100,000 wasted compute steps.

# Live output stream → Step 48200: ✓ Reward distribution nominal → Step 48400: ⚠ Hacking pattern detected (94% confidence) → Step 48400: Fix: clip_reward_range(−10, +10)

AI Safety Guardrails

Active protection against reward hacking during production training runs.

99.7%

Detection accuracy across 2M+ analyzed episodes

Trend Detection

Visualize reward trends and spot training stagnation before it costs you compute budget.

Alignment Reports

Export detailed PDF reports for documentation and auditing purposes.

Auto-Adjust Premium

Automatically tunes reward weights mid-run. Zero manual intervention required.

Pricing

Simple, transparent pricing

Start free. Detect and prevent reward hacking in your RL.

Free

For research, experimentation, and side projects.

$0 forever

Basic reward signal analysis
Trend detection & visualization
Reward hacking alerts
Actionable recommendations
Up to 100k training steps

Get Started Free

Pay as you go

Premium

Buy analysis credits once — no subscription, no recurring charge.

$19.90 starter pack 105,000 credits · never expire

Everything in Free, plus:
Automatic parameter adjustment
Dynamic reward rebalancing
Continuous training monitoring
Starter Pack $19.90 · $49.90 · $99.90 · $299.90 bundles
Priority email & chat support
Advanced anomaly detection

Get Premium Credits

Developer Docs

Up and running in minutes

Everything you need to integrate RewardGuard into your existing RL training workflow.

Install the package

Add the free or premium package to your environment.

Import & Configure

Initialize with your environment settings.

Analyze Training Logs

Run analysis on your existing training data.

Auto-Fix (Premium)

Enable automatic reward parameter adjustment.

pip install rewardguard

pip install rewardguard-premium

import rewardguard as rg

# Define expected reward component distribution
monitor = rg.Monitor(
    expected={"task": 0.7, "safety": 0.3},
    tolerance=5.0,
    window=200,
)

# Record rewards every step
for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)
        monitor.step({"task": r_task, "safety": r_safety})

# Inspect results
result = monitor.check()
print(result.severity)        # "ok" / "warning" / "critical"
print(result.suggested_reward_weights)
monitor.print_report()

# Premium: automatic reward correction
from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=500,
    auto_correct=True,
)

for step_idx in range(total_steps):
    rewards = env.step(action)
    snapshot = monitor.step(rewards)
    if snapshot and snapshot.corrections_applied:
        env.set_reward_weights(monitor.weights)

monitor.to_csv("audit.csv")

Our Philosophy

AI should do what you
actually want it to do

Misaligned AI is not a future risk — it's happening right now in every RL training run. RewardGuard is our answer.

Safety First

Every feature we build starts with one question: does this make AI systems safer and more predictable?

Reward Alignment

The reward function is the soul of your AI. We ensure it incentivizes the right behaviors, not just high scores through shortcuts.

Full Transparency

Understand exactly what your AI is learning at every step. Clear insights, no black boxes.

Change Picture

Welcome back

Stop RL Agents from Exploiting Your Reward Function

From raw logs toaligned AI in 4 steps