PGPSNet: A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

Geometry problem solving (GPS) involves high-level mathematical reasoning, necessitating the ability to fuse multi-modal information and apply geometric knowledge. Recent neural solvers have shown great potential in this area but struggle with diagram presentation and modal fusion. To address these challenges, we introduce PGPSNet, a neural solver that effectively converts diagrams into basic textual clauses to describe their features. PGPSNet integrates multi-modal information through structural and semantic pre-training, data augmentation, and self-limited decoding. This endows it with rich knowledge of geometric theorems and representations, enhancing geometric understanding and reasoning.

Overview of PGPSNet solver. PGPSNet is a multi-modal learning framework whose modal inputs contain not only the diagram and textual problem, but also the textual clauses parsed from diagram. It generates the theorem-based interpretable solution program to solve geometry problem.

Pipeline of structural and semantic pre-training. [M] denotes the mask token.
Class tags of [G], [N], [ARG], [P], [ANGID] represent tokens of general, variable, argument, point and angle ID, respectively.
Section tags of [S], [C], [T] refer to tokens of structure, condition and target, respectively.

To support GPS research, we have constructed a large-scale, finely-annotated dataset named PGPS9K, which includes detailed diagram annotations and interpretable solution programs. PGPS9K dataset has fve properties:

Theorem-based: Solving problems in PGPS9K need to apply geometric theorem knowledge to carry out algebraic calculation and get numerical results finally.
Diagram-dependent: Above 90% of problems must be solved using the diagrams because necessary conditions such as variable content and geometric structure are displayed via visual form instead of text.
Abstract: The diagram is integrated with basic geometric primitives (point, line, circle) and non-geometric primitives (text, symbol). No complex semantic scenarios are involved in textual problem except abstract geometric conditions.
Fine-grained: Problems with the same diagram vary in conditions or targets. Slight distinctions in textual problems usually lead to completely different solutions to problems.
Condition-redundancy: Lots of conditions in semantic clauses or textual problem are not needed in problem solving at hand.

Overview

The Plane Geometry Problem Solving Dataset (PGPS9K) was constructed by the State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation of Chinese Academy of Sciences(CASIA). The samples in PGPS9K are labeled with both fine-grained diagram annotation and interpretable solution program, where the diagram annotation is converted into structural clauses and semantic clauses to effectively describe multi-level information in geometry diagram.

PGPS9K is composed of 9,022 geometry problems paired with non-duplicate 4,000 geometry diagrams, where 2,891 problems paired with 1,738 diagrams are selected from Geometry3K dataset, the rest of problems are collected from five popular textbooks across grades 6-12 on mathematics curriculum websites. Our PGPS9K is divided into 30 problem types as exhibited in the following figure, covering almost all problem types of plane geometry problem in corresponding grades.

Distribution of problem types of PGPS9K dataset.

Comparison with existing GPS datasets.
Type, OP and PL represent problem type, operator number and program length, respectively.

Examples

The annotations of PGPS9K include diagram annotation and solution program, where the diagram annotation is to extract structural and semantic information in diagram and the solution program defines the solution steps of problem. Diagram annotation, textual clauses and solution program are shown as follows.

Example presentation of PGPS9K dataset.

Templates of textual clauses. The symbols of &, *, $, % denote point, line, variable and angle ID, respectively

Annotation of solution program and its interpretability

Program sets defined in our solution program, consisting of 34 operators and 55 operands,
where operands involve 11 problem variables, 7 process variables, 26 augments and 11 constants

Numerical answer accuracies of state-of-the-art GPS solvers.

Method	Geometry3K	PGPS9K
Human Expert [Lu et al., 2021]	-	90.9	-	-	-	-
Baseline (Neural Solver) [Lu et al., 2021]	-	35.9	-	-	-	-
InterGPS (Predict)* [Lu et al., 2021]	44.6	56.9	-	-	-	-
InterGPS (Diagram GT)* [Lu et al., 2021]	64.2	71.7	-	-	-	-
InterGPS (All GT)* [Lu et al., 2021]	69.0	75.9	-	-	-	-
NGS# [Chen et al., 2021]	35.3	58.8	62.0	34.1	46.1	60.9
Geoformer# [Chen et al., 2022]	36.8	59.3	62.5	35.6	47.3	62.3
PGPSNet	65.0	77.9	80.7	62.7	70.4	79.5

Method

Geometry3K

PGPS9K

Completion

Choice

Top-3

Completion

Choice

Top-3

Human Expert [Lu et al., 2021]

90.9

Baseline (Neural Solver) [Lu et al., 2021]

35.9

InterGPS (Predict)* [Lu et al., 2021]

44.6

56.9

InterGPS (Diagram GT)* [Lu et al., 2021]

64.2

71.7

InterGPS (All GT)* [Lu et al., 2021]

69.0

75.9

NGS# [Chen et al., 2021]

35.3

58.8

62.0

34.1

46.1

60.9

Geoformer# [Chen et al., 2022]

36.8

59.3

62.5

35.6

47.3

62.3

PGPSNet

65.0

77.9

80.7

62.7

70.4

79.5

Ablation studies on Geometry3K.

Self-limited Decoder	Data Aug	Structural Clauses	Pre-trained LM	Ans acc	Prog acc
✔	✘	✔	✘	32.5	52.2	57.6	27.2	47.3	53.1
✘	✔	✔	✘	28.2	48.3	50.7	25.4	42.7	45.6
✔	✔	✔	✘	36.6	59.5	62.4	33.9	52.8	58.6
✔	✔	✘	✘	38.4	61.7	64.8	34.8	54.2	59.2
✔	✔	✔	✔	48.1	67.5	71.4	45.4	62.0	68.1
✔	✔	✔	✔	65.0	77.9	80.7	62.8	72.4	78.2

Self-limited Decoder

Data Aug

Structural Clauses

Pre-trained LM

Ans acc

Prog acc

Completion

Choice

Top-3

Completion

Choice

Top-3

✔

✘

✔

✘

32.5

52.2

57.6

27.2

47.3

53.1

✘

✔

✘

28.2

48.3

50.7

25.4

42.7

45.6

✔

✘

36.6

59.5

62.4

33.9

52.8

58.6

✔

✘

38.4

61.7

64.8

34.8

54.2

59.2

✔

48.1

67.5

71.4

45.4

62.0

68.1

✔

65.0

77.9

80.7

62.8

72.4

78.2

🚨 For more details, please refer to this link

@inproceedings{Zhang2023PGPS, title = {A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram}, author = {Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin}, booktitle = {IJCAI}, year = {2023}, }

A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

Introduction

PGPS9K Dataset

Overview

Examples

Experiment Results

Numerical answer accuracies of state-of-the-art GPS solvers.

Ablation studies on Geometry3K.

BibTeX