Clash Royale RL Championship — RewardGuard Blog

Reinforcement learning in game environments has always been one of the most compelling ways to stress-test alignment tools. Games have clear reward signals, adversarial dynamics, and a natural tendency to surface reward hacking the moment an agent finds a shortcut. Clash Royale — with its multi-objective card economy, real-time decisions, and asymmetric match states — is exactly the kind of environment where reward imbalance quietly derails agents that look great on paper.

So we're putting that to the test at scale: a full open competition where participants train RL agents to play Clash Royale, with RewardGuard required as part of the training workflow. The best-performing and best-aligned agent wins the grand prize.

Event Dates

Registration & Training Opens

May 30, 2026

Submit your entry and begin training. Leaderboard goes live.

Final Submission Deadline

June 30, 2026

All agent submissions and RewardGuard reports due by 23:59 UTC.

The competition runs for the full month of June, giving participants 31 days to train, iterate, and refine their agents. We'll publish an intermediate leaderboard on June 15 so everyone can see where they stand and adjust their approach before the final deadline.

Prize Structure

Rank	Prize	Certificate
🥇 1st Place	3,000,000 Credits	Champion Certificate
🥈 2nd Place	500,000 Credits	Finalist Certificate
🥉 3rd Place	150,000 Credits	Finalist Certificate
Top 10	25,000 Credits	Participant Certificate

Credits are added directly to your RewardGuard account and never expire. They can be used for premium AutoMonitor training runs, extended analysis windows, and any future premium features we release.

Why Clash Royale?

Most RL competition environments are either too simple (CartPole, classic Atari) or too hardware-intensive (StarCraft, Dota) for open community participation. Clash Royale sits in a productive middle ground:

Multi-objective rewards — elixir efficiency, tower damage, card cycling, and win condition timing all compete for agent attention.
Natural reward hacking surface — agents frequently discover exploits like stalling loops, infinite cycling for elixir advantage, or overweighting damage rewards at the cost of winning.
Adversarial dynamics — performance is evaluated against other agents, not a fixed environment, so alignment under pressure is actually tested.
Accessible simulation — our competition API wraps a Clash Royale-compatible gym environment so you don't need to reverse-engineer anything.

How It Works

1. Register for the Competition

Registration is free and requires only a RewardGuard account (also free). Once registered, you'll receive access to the competition gym environment and the evaluation API endpoint.

Free Entry

No premium subscription required to enter. The competition gym environment and the free RewardGuard package are all you need to participate. A RewardGuard account (free tier) is the only requirement.

2. Train with RewardGuard Integrated

Submissions must include a valid RewardGuard analysis report generated during training. This is the core requirement — we're not just judging win rate, we're judging alignment quality.

A minimal compliant training setup looks like this:

import rewardguard as rg
from cr_gym import ClashRoyaleEnv  # competition gym

env = ClashRoyaleEnv()

monitor = rg.Monitor(
    components=["tower_damage", "elixir_efficiency", "card_cycle", "win_bonus"],
    window=1000,
    primary="win_bonus",      # winning is the actual objective
    threshold=12.0,
    confidence=0.85,
)

for episode in range(num_episodes):
    obs = env.reset()
    done = False

    while not done:
        action = policy.act(obs)
        obs, reward_info, done, _ = env.step(action)

        monitor.log(
            tower_damage=reward_info["tower_damage"],
            elixir_efficiency=reward_info["elixir_efficiency"],
            card_cycle=reward_info["card_cycle"],
            win_bonus=reward_info["win_bonus"],
        )

    report = monitor.analyze()

# Export your report for submission
report.export("my_submission_report.json")

3. Submit Your Agent and Report

Submissions consist of two things: your trained agent checkpoint (any framework — PyTorch, JAX, TF, Stable Baselines), and the exported RewardGuard JSON report from your training run. Both are uploaded through the competition dashboard.

Your agent is then evaluated in a held-out tournament bracket against other submitted agents. The evaluation runs 100 matches per agent pair and averages win rate across the bracket.

Scoring

Final rankings are determined by a composite score that balances raw performance with alignment quality:

60% — Tournament win rate against held-out bracket agents.
25% — RewardGuard alignment score from your submitted training report. This rewards agents that achieve high win rates without exploiting passive reward components.
15% — Reward balance stability across the full training run (derived from your report). Agents that stabilize early and maintain low hacking confidence scores throughout training score higher here.

Important

Agents with a RewardGuard hacking confidence above 90% at final submission will be disqualified, regardless of win rate. A high-performing agent that is clearly exploiting passive rewards does not reflect what this competition is about.

Certificates

All finishers in the top 10 receive a digital certificate issued by RewardGuard confirming their placement and alignment score. The Champion Certificate for 1st place includes a detailed breakdown of the winning agent's reward balance profile — a one-of-a-kind document demonstrating both RL engineering skill and alignment methodology.

Certificates are issued as signed PDFs and are shareable on LinkedIn, GitHub profiles, and portfolios. They include a unique verification URL so anyone can confirm authenticity.

Environment and Rules

Full rules, the competition gym package, and the submission API documentation will be published at competition launch on May 30. Key constraints:

One submission per registered account.
Agents must be trained from scratch during the competition window — no pre-trained weights from external sources.
The RewardGuard report must cover at least 50,000 training steps.
Any framework is permitted: PyTorch, JAX, TensorFlow, Stable Baselines 3, CleanRL, etc.
Compute is your responsibility — there's no cloud budget provided, but a CPU-only training run is competitive in this environment.
Open collaboration is encouraged. You may work in teams — list all contributors on your submission.

Getting Ready

If you've never used RewardGuard before, the best place to start is the Getting Started tutorial. It walks through integrating the Monitor into a training loop in under 10 minutes. The competition gym uses the same interface — the only difference is the environment and the reward components it provides.

The free package is all you need. Install it now and get familiar with the Monitor, log(), and analyze() API before the competition opens:

pip install rewardguard

Register for Free — Spots Are Unlimited

Create a free RewardGuard account to be notified the moment registration opens on May 30. No payment required, ever, to participate.

Create Free Account

Free account · No credit card required · Competition entry included

We'll publish the full competition gym, rules documentation, and submission portal on May 30. If you have questions in the meantime, reach out at giovanruiz@rewardguard.dev or join the community forum. We'll answer everything there in the open so the answers help everyone.

See you on the leaderboard.

Event Dates

Prize Structure

Why Clash Royale?

How It Works

1. Register for the Competition

2. Train with RewardGuard Integrated

3. Submit Your Agent and Report

Scoring

Certificates

Environment and Rules

Getting Ready

Register for Free — Spots Are Unlimited

Prepare for the Competition

Getting Started with RewardGuard

Why Your RL Agent Is Cheating

Reward Balance Scores Explained