Accuracy scores of models on our MV-MATH benchmark
Model | Overall | AG | Algebra | MG | Combinatorics | TG | Logic | SG | Arithmetic | CG | DG | Statistics |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Claude-3.5🥇 | 33.9 | 32.7 | 38.1 | 34.3 | 46.7 | 33.3 | 29.8 | 36.3 | 54.2 | 27.0 | 38.2 | 41.1 |
GPT-4o🥈 | 32.1 | 28.7 | 36.7 | 34.4 | 39.4 | 30.6 | 29.8 | 38.2 | 41.7 | 20.8 | 44.3 | 47.0 |
Gemini-1.5-Pro🥉 | 29.1 | 29.9 | 32.9 | 28.3 | 28.0 | 30.5 | 40.5 | 33.9 | 42.7 | 21.7 | 30.6 | 35.2 |
Qwen-vl-max | 26.9 | 27.6 | 32.1 | 24.7 | 36.5 | 29.6 | 31.8 | 30.9 | 37.5 | 23.7 | 32.3 | 23.5 |
GPT-4V | 24.5 | 18.7 | 31.6 | 32.4 | 25.6 | 26.3 | 36.3 | 26.8 | 43.7 | 19.3 | 33.8 | 35.2 |
Qwen-vl-plus | 19.7 | 17.9 | 24.1 | 22.0 | 16.0 | 19.9 | 24.8 | 15.9 | 15.2 | 18.7 | 31.4 | 29.4 |
QVQ-72B-Preview | 29.3 | - | - | - | - | - | - | - | - | - | - | - |
LLaVA-OneVision-Chat-72B | 26.2 | 25.1 | 32.4 | 23.9 | 35.3 | 28.1 | 27.2 | 31.6 | 31.2 | 22.6 | 35.9 | 35.2 |
LLaVA-OneVision-SFT-72B | 25.9 | 24.2 | 31.3 | 21.1 | 23.1 | 28.9 | 31.8 | 32.8 | 18.7 | 21.5 | 39.5 | 29.4 |
LLaVA-OneVision-SI-72B | 25.0 | 24.7 | 24.3 | 27.6 | 27.0 | 25.3 | 37.9 | 24.4 | 37.1 | 20.4 | 31.2 | 23.5 |
LLaVA-OneVision-Chat-7B | 19.1 | 19.6 | 20.4 | 21.4 | 14.6 | 18.8 | 4.5 | 20.4 | 43.7 | 16.7 | 28.9 | 29.4 |
LLaVA-OneVision-SFT-7B | 18.8 | 18.2 | 20.3 | 22.3 | 17.3 | 20.1 | 9.0 | 15.8 | 43.1 | 15.8 | 27.3 | 23.5 |
LLaVA-OneVision-SI-7B | 17.2 | 16.1 | 19.5 | 13.2 | 16.0 | 19.5 | 12.6 | 15.0 | 36.5 | 13.2 | 31.3 | 13.6 |
Qwen2VL-Instruct-7B | 16.5 | 14.2 | 18.6 | 14.8 | 17.0 | 21.9 | 22.7 | 17.2 | 31.2 | 16.1 | 25.1 | 23.5 |
Mantis-siglip-8B | 15.8 | 17.9 | 17.7 | 17.9 | 14.6 | 20.4 | 22.7 | 12.1 | 18.7 | 10.8 | 32.3 | 17.6 |
LLaVA-NeXT-Interleave-7B | 14.7 | 14.0 | 15.5 | 15.2 | 17.0 | 18.2 | 18.1 | 16.3 | 6.2 | 14.1 | 24.4 | 23.5 |
Deepseek-VL-7B | 14.5 | 14.8 | 20.2 | 10.8 | 17.0 | 19.8 | 9.0 | 15.1 | 18.7 | 10.9 | 26.6 | 29.4 |
Llama-3.2-Vision-Instruct-11B | 14.4 | 15.0 | 15.4 | 16.2 | 23.1 | 15.6 | 18.1 | 11.9 | 31.2 | 14.3 | 25.1 | 17.6 |
InternVL-Chat-8B | 14.4 | 14.1 | 20.4 | 17.5 | 19.5 | 19.6 | 27.2 | 13.0 | 31.2 | 9.9 | 20.1 | 23.5 |
InternLM-XComposer2.5-VL-7B | 13.1 | 12.2 | 12.6 | 13.2 | 24.3 | 20.6 | 36.3 | 9.4 | 18.7 | 11.1 | 23.7 | 17.6 |
VILA-13B | 12.0 | 11.5 | 11.0 | 11.0 | 12.1 | 14.4 | 18.1 | 13.2 | 37.5 | 10.6 | 20.8 | 5.8 |
LLaVA-v1.5-7B | 10.3 | 9.3 | 11.7 | 11.2 | 9.7 | 12.8 | 13.6 | 10.2 | 0.0 | 7.7 | 23.7 | 11.7 |
LLaVA-v1.5-13B | 5.0 | 4.8 | 6.8 | 4.1 | 4.8 | 8.7 | 9.0 | 3.5 | 12.5 | 5.1 | 5.0 | 11.7 |
Math-LLaVA-13B | 3.0 | 1.6 | 6.9 | 4.7 | 4.8 | 2.9 | 0.0 | 3.2 | 18.7 | 6.6 | 2.1 | 5.8 |
Mathematical Subjects: AG: Analytic Geometry, MG: Metric Geometry, TG: Transformation Geometry, SG: Solid Geometry, CG: Combinatorial Geometry, DG: Descriptive Geometry.
🚨 For more details, please refer to this link