While supervised fine-tuning gets you far, agentic reinforcement learning with verifiable rewards (RLVR) is how you push models to actually solve problems. We've curated Gauntlet into a focused 4,000-example RLVR dataset, delivered in Harbor format for seamless integration with agentic RL training pipelines.
We trained EssentialAI's rnj-1-instruct 8B, an agentic coding model that punches well above its weight class, using the Harbor framework, with rollouts pushed to SkyRL. We saw a 3x improvement on Terminal Bench 2.0, an out-of-distribution benchmark the model never saw during training.
What's in Gauntlet 4K RLVR
Gauntlet 4K RLVR is a carefully curated subset of the original Gauntlet dataset, optimized for reinforcement learning workflows. Each example provides a verifiable reward signal through pytest tests.
Training Setup
We trained rnj-1-instruct 8B using the terminus-2 harness with our Gauntlet 4K RLVR dataset. After GRPO showed instability during training, we switched to the DAPO algorithm which provided consistent convergence.
Terminal Bench 2.0 Results
We evaluated on Terminal Bench 2.0, an out-of-distribution benchmark the model never saw during training. This tests whether the model learned generalizable coding skills, not just pattern matching on the training data.
The trained 8B model shows outsized performance for its size, achieving results comparable to 20B+ parameter models. Below we compare against other models evaluated on the same terminus-2 harness.
*Results pulled from the Terminal Bench 2.0 leaderboard.
Note: Terminal Bench 2.0 is an out-of-distribution benchmark with 89 terminal-based coding tasks. The model was never exposed to these tasks during training, demonstrating genuine skill transfer from the Gauntlet RLVR training.
Summary
RLVR works. With just 4,000 carefully curated examples and verifiable rewards, we achieved a 3x improvement on an out-of-distribution benchmark. The key is quality over quantity, and rewards you can trust.
Interested in training on Gauntlet 4K RLVR? Book a call.

