Are we AI Math yet?
Is there a computer system that, given a theorem statement, can prove it more reliably than 100 top mathematicians given 100 years?
Not yet.
Frontier Math Benchmark
FrontierMath is a benchmark by Epoch AI in collaboration with OpenAI intended to test the limits of AI mathematics. It was announced in November 2024.
It uses 290 questions where the answers are mathematical objects such as integers and matrices.
- Epoch AI FrontierMath information page
- Can AI do maths yet? Thoughts from a mathematician – Kevin Buzzard, Dec 2024.
Math Olympiads
Math olympiads are competitions where contestants solve mathematical problems. The most famous one is the International Mathematical Olympiad, a competition for high-school students from across the globe.
There is much interest in getting AI to be good at these competitions. In November 2023, XTX Markets launched the AIMO prize, with the Grand Prize of $5M for the first publicly shared AI model capable of winning a gold medal in the IMO, and an additional $5M for partial progress prizes. The AIMO prize requires the same input/output as human contestants (not computer-formal).
AlphaProof and AlphaGeometry 2
In July 2024, Google DeepMind announced AI achieves silver-medal standard solving International Mathematical Olympiad problems. On hand-formalized problem statements, the systems AlphaProof and AlphaGeometry 2 achieved a score at the silver medal level on the IMO.
The types of problems on the IMO seem to present different levels of challenge for AI solutions:
AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct. This included the hardest problem in the competition, solved by only five contestants at this year’s IMO. AlphaGeometry 2 proved the geometry problem, while the two combinatorics problems remained unsolved.
MiniF2F
MiniF2F is a benchmark of AI formal mathematics. It consists of 244 "exercise statements from olympiads (AMC, AIME, IMO) as well as high-school and undergraduate maths classes".
The current state of the art seems to be ProofAug, solving 66% of the problems.
Other links:
Millennium Prize
The Millennium Prize Problems are seven important problems in mathematics for which the Clay Mathematics Institute has offered a prize of $1 million per problem. Currently, 1 problem has been solved by a human, 0 have been solved by AI, and 6 remain unsolved.
Problem | Human or AI? | Solved By / Date |
---|---|---|
Birch and Swinnerton‑Dyer Conjecture | ❓ | — |
Hodge Conjecture | ❓ | — |
Navier‑Stokes Equation | ❓ | — |
P vs NP | ❓ | — |
Poincaré Conjecture | 🧑 | Grigori Perelman, 2003 |
Riemann Hypothesis | ❓ | — |
Yang–Mills Existence and Mass Gap | ❓ | — |
Prediction Markets
- Polymarket: real-money prediction market site with curated questions.
- Manifold: play-money prediction market site with user-submitted questions.
- Metaculus: market-adjacent forecasting site.
IMO in 2025
FrontierMath in 2025
Long term
Other links
- Papers With Code – Automated theorem proving
- Papers With Code – Mathematical reasoning
- Epoch AI Benchmarking Hub
- Mastodon posts about AI – Terence Tao, 2022–2025
- Think of a number – Kevin Buzzard, Jan 2025.
- Think of a number: an update – Kevin Buzzard, March 2025