Tianjian Li
  • about
  • blog
  • publications
  • cv

Principia

March 20, 2026

2026

Our new work: Reasoning over mathematical objects: on-policy reward modeling and test time aggregation is out! In this work we 1) built and released training data for deriving mathematical objects; 2) show that on-policy RL with strong verifier boosts performance, and 3) on-policy training on parallel generation + verification further boosts the performance.

© Copyright 2026 Tianjian Li. Powered by Jekyll with al-folio theme. Hosted by GitHub Pages. Photos from Unsplash.