Geometry problem solving (GPS) involves high-level mathematical reasoning, necessitating the ability to fuse multi-modal information and apply geometric knowledge. Recent neural solvers have shown great potential in this area but struggle with diagram presentation and modal fusion. To address these challenges, we introduce PGPSNet, a neural solver that effectively converts diagrams into basic textual clauses to describe their features. PGPSNet integrates multi-modal information through structural and semantic pre-training, data augmentation, and self-limited decoding. This endows it with rich knowledge of geometric theorems and representations, enhancing geometric understanding and reasoning.
Overview of PGPSNet solver. PGPSNet is a multi-modal learning framework whose modal inputs contain not only the diagram and textual problem, but also the textual clauses parsed from diagram. It generates the theorem-based interpretable solution program to solve geometry problem.
Pipeline of structural and semantic pre-training. [M] denotes the mask token.
Class tags of [G], [N], [ARG], [P], [ANGID] represent
tokens of general, variable, argument, point and angle ID, respectively.
Section tags of [S], [C], [T] refer to tokens of structure, condition and
target, respectively.
To support GPS research, we have constructed a large-scale, finely-annotated dataset named PGPS9K, which includes detailed diagram annotations and interpretable solution programs. PGPS9K dataset has fve properties:
The Plane Geometry Problem Solving Dataset (PGPS9K) was constructed by the State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation of Chinese Academy of Sciences(CASIA). The samples in PGPS9K are labeled with both fine-grained diagram annotation and interpretable solution program, where the diagram annotation is converted into structural clauses and semantic clauses to effectively describe multi-level information in geometry diagram.
PGPS9K is composed of 9,022 geometry problems paired with non-duplicate 4,000 geometry diagrams, where 2,891 problems paired with 1,738 diagrams are selected from Geometry3K dataset, the rest of problems are collected from five popular textbooks across grades 6-12 on mathematics curriculum websites. Our PGPS9K is divided into 30 problem types as exhibited in the following figure, covering almost all problem types of plane geometry problem in corresponding grades.
Distribution of problem types of PGPS9K dataset.
Comparison with existing GPS datasets.
Type, OP and PL represent problem type, operator number and program length, respectively.
The annotations of PGPS9K include diagram annotation and solution program, where the diagram annotation is to extract structural and semantic information in diagram and the solution program defines the solution steps of problem. Diagram annotation, textual clauses and solution program are shown as follows.
Example presentation of PGPS9K dataset.
Templates of textual clauses. The symbols of &, *, $, % denote point, line, variable and angle ID, respectively
Annotation of solution program and its interpretability
Program sets defined in our solution program, consisting of 34 operators and 55 operands,
where operands involve 11 problem
variables, 7 process variables, 26 augments and 11 constants
Method | Geometry3K | PGPS9K | ||||
---|---|---|---|---|---|---|
Completion | Choice | Top-3 | Completion | Choice | Top-3 | |
Human Expert [Lu et al., 2021] | - | 90.9 | - | - | - | - |
Baseline (Neural Solver) [Lu et al., 2021] | - | 35.9 | - | - | - | - |
InterGPS (Predict)* [Lu et al., 2021] | 44.6 | 56.9 | - | - | - | - |
InterGPS (Diagram GT)* [Lu et al., 2021] | 64.2 | 71.7 | - | - | - | - |
InterGPS (All GT)* [Lu et al., 2021] | 69.0 | 75.9 | - | - | - | - |
NGS# [Chen et al., 2021] | 35.3 | 58.8 | 62.0 | 34.1 | 46.1 | 60.9 |
Geoformer# [Chen et al., 2022] | 36.8 | 59.3 | 62.5 | 35.6 | 47.3 | 62.3 |
PGPSNet | 65.0 | 77.9 | 80.7 | 62.7 | 70.4 | 79.5 |
Self-limited Decoder | Data Aug | Structural Clauses | Pre-trained LM | Ans acc | Prog acc | ||||
---|---|---|---|---|---|---|---|---|---|
Completion | Choice | Top-3 | Completion | Choice | Top-3 | ||||
✔ | ✘ | ✔ | ✘ | 32.5 | 52.2 | 57.6 | 27.2 | 47.3 | 53.1 |
✘ | ✔ | ✔ | ✘ | 28.2 | 48.3 | 50.7 | 25.4 | 42.7 | 45.6 |
✔ | ✔ | ✔ | ✘ | 36.6 | 59.5 | 62.4 | 33.9 | 52.8 | 58.6 |
✔ | ✔ | ✘ | ✘ | 38.4 | 61.7 | 64.8 | 34.8 | 54.2 | 59.2 |
✔ | ✔ | ✔ | ✔ | 48.1 | 67.5 | 71.4 | 45.4 | 62.0 | 68.1 |
✔ | ✔ | ✔ | ✔ | 65.0 | 77.9 | 80.7 | 62.8 | 72.4 | 78.2 |
🚨 For more details, please refer to this link
@inproceedings{Zhang2023PGPS,
title = {A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram},
author = {Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin},
booktitle = {IJCAI},
year = {2023},
}