Pinned
Frontier models don't just need harder questions.
They need better signals.
Our Advanced Reasoning Rubrics dataset captures how experts evaluate reasoning, not just outcomes, making failures visible and improvement measurable.
Download the dataset on @huggingface below:









