Reinforcement learning in game environments has always been one of the most compelling ways to stress-test alignment tools. Games have clear reward signals, adversarial dynamics, and a natural tendency to surface reward hacking the moment an agent finds a shortcut. Clash Royale — with its multi-objective card economy, real-time decisions, and asymmetric match states — is exactly the kind of environment where reward imbalance quietly derails agents that look great on paper.
So we're putting that to the test at scale: a full open competition where participants train RL agents to play Clash Royale, with RewardGuard required as part of the training workflow. The best-performing and best-aligned agent wins the grand prize.
Event Dates
The competition runs for the full month of June, giving participants 31 days to train, iterate, and refine their agents. We'll publish an intermediate leaderboard on June 15 so everyone can see where they stand and adjust their approach before the final deadline.
Prize Structure
| Rank | Prize | Certificate |
|---|---|---|
| 🥇 1st Place | 3,000,000 Credits | Champion Certificate |
| 🥈 2nd Place | 500,000 Credits | Finalist Certificate |
| 🥉 3rd Place | 150,000 Credits | Finalist Certificate |
| Top 10 | 25,000 Credits | Participant Certificate |
Credits are added directly to your RewardGuard account and never expire. They can be used for premium AutoMonitor training runs, extended analysis windows, and any future premium features we release.
Why Clash Royale?
Most RL competition environments are either too simple (CartPole, classic Atari) or too hardware-intensive (StarCraft, Dota) for open community participation. Clash Royale sits in a productive middle ground:
- Multi-objective rewards — elixir efficiency, tower damage, card cycling, and win condition timing all compete for agent attention.
- Natural reward hacking surface — agents frequently discover exploits like stalling loops, infinite cycling for elixir advantage, or overweighting damage rewards at the cost of winning.
- Adversarial dynamics — performance is evaluated against other agents, not a fixed environment, so alignment under pressure is actually tested.
- Accessible simulation — our competition API wraps a Clash Royale-compatible gym environment so you don't need to reverse-engineer anything.
How It Works
1. Register for the Competition
Registration is free and requires only a RewardGuard account (also free). Once registered, you'll receive access to the competition gym environment and the evaluation API endpoint.
No premium subscription required to enter. The competition gym environment and the free RewardGuard package are all you need to participate. A RewardGuard account (free tier) is the only requirement.
2. Train with RewardGuard Integrated
Submissions must include a valid RewardGuard analysis report generated during training. This is the core requirement — we're not just judging win rate, we're judging alignment quality.
A minimal compliant training setup looks like this:
3. Submit Your Agent and Report
Submissions consist of two things: your trained agent checkpoint (any framework — PyTorch, JAX, TF, Stable Baselines), and the exported RewardGuard JSON report from your training run. Both are uploaded through the competition dashboard.
Your agent is then evaluated in a held-out tournament bracket against other submitted agents. The evaluation runs 100 matches per agent pair and averages win rate across the bracket.
Scoring
Final rankings are determined by a composite score that balances raw performance with alignment quality:
- 60% — Tournament win rate against held-out bracket agents.
- 25% — RewardGuard alignment score from your submitted training report. This rewards agents that achieve high win rates without exploiting passive reward components.
- 15% — Reward balance stability across the full training run (derived from your report). Agents that stabilize early and maintain low hacking confidence scores throughout training score higher here.
Agents with a RewardGuard hacking confidence above 90% at final submission will be disqualified, regardless of win rate. A high-performing agent that is clearly exploiting passive rewards does not reflect what this competition is about.
Certificates
All finishers in the top 10 receive a digital certificate issued by RewardGuard confirming their placement and alignment score. The Champion Certificate for 1st place includes a detailed breakdown of the winning agent's reward balance profile — a one-of-a-kind document demonstrating both RL engineering skill and alignment methodology.
Certificates are issued as signed PDFs and are shareable on LinkedIn, GitHub profiles, and portfolios. They include a unique verification URL so anyone can confirm authenticity.
Environment and Rules
Full rules, the competition gym package, and the submission API documentation will be published at competition launch on May 30. Key constraints:
- One submission per registered account.
- Agents must be trained from scratch during the competition window — no pre-trained weights from external sources.
- The RewardGuard report must cover at least 50,000 training steps.
- Any framework is permitted: PyTorch, JAX, TensorFlow, Stable Baselines 3, CleanRL, etc.
- Compute is your responsibility — there's no cloud budget provided, but a CPU-only training run is competitive in this environment.
- Open collaboration is encouraged. You may work in teams — list all contributors on your submission.
Getting Ready
If you've never used RewardGuard before, the best place to start is the Getting Started tutorial. It walks through integrating the Monitor into a training loop in under 10 minutes. The competition gym uses the same interface — the only difference is the environment and the reward components it provides.
The free package is all you need. Install it now and get familiar with the Monitor, log(), and analyze() API before the competition opens:
Register for Free — Spots Are Unlimited
Create a free RewardGuard account to be notified the moment registration opens on May 30. No payment required, ever, to participate.
Create Free AccountWe'll publish the full competition gym, rules documentation, and submission portal on May 30. If you have questions in the meantime, reach out at giovanruiz@rewardguard.dev or join the community forum. We'll answer everything there in the open so the answers help everyone.
See you on the leaderboard.