RewardGuard is designed to drop into existing training workflows with minimal friction. You don't need to change your model architecture, your optimizer, or your reward function. You just need to tell RewardGuard what your reward components are and log them at each step — it handles the rest.
This tutorial uses a simple PyTorch RL loop, but the same pattern works with JAX, any gym-compatible environment, and Stable Baselines 3.
Install the Package
The free package is open-source and available on PyPI:
For the premium package (auto-adjustment), you'll need a license key from your dashboard:
Identify Your Reward Components
Before adding monitoring, identify the distinct components that make up your reward signal. If your reward function looks like this:
The components are survival, goal_dist, food_bonus, and death_pen. RewardGuard needs each of these separately — not just the total.
Initialize the Monitor
The primary parameter tells RewardGuard which component represents genuine task progress. Components with higher accumulated reward than the primary component by the threshold factor will trigger an alert.
Log Components in Your Training Loop
Add a single logging call inside your step loop:
Reading the Report
When RewardGuard detects a problem, the summary looks like this:
Export a Full Report (Optional)
With a premium license, replace rg.Monitor with rg.PremiumMonitor and add auto_adjust=True. When hacking is detected, the monitor will automatically rebalance your reward weights without stopping training. The adjustment is logged to the report.
Integrating with CI/CD
For production workflows, you want monitoring to fail the run automatically if reward hacking is detected above a severity threshold. The report object exposes a severity score from 0 to 1:
Add this check after each evaluation step in your training loop, and your CI system will catch reward hacking before the run completes — saving compute and giving you a clear signal about what went wrong.
That's it. Three objects (Monitor, log(), analyze()), one extra line per training step, and you have continuous reward balance monitoring integrated into your existing loop. The free package gives you detection and diagnosis. The premium package closes the loop with automatic correction.