A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

1MAIS, Institute of Automation of Chinese Academy of Sciences, 2School of Artifcial Intelligence, University of Chinese Academy of Sciences
IJCAI 2023

Introduction

Geometry problem solving (GPS) involves high-level mathematical reasoning, necessitating the ability to fuse multi-modal information and apply geometric knowledge. Recent neural solvers have shown great potential in this area but struggle with diagram presentation and modal fusion. To address these challenges, we introduce PGPSNet, a neural solver that effectively converts diagrams into basic textual clauses to describe their features. PGPSNet integrates multi-modal information through structural and semantic pre-training, data augmentation, and self-limited decoding. This endows it with rich knowledge of geometric theorems and representations, enhancing geometric understanding and reasoning.



To support GPS research, we have constructed a large-scale, finely-annotated dataset named PGPS9K, which includes detailed diagram annotations and interpretable solution programs. PGPS9K dataset has fve properties:

  • Theorem-based: Solving problems in PGPS9K need to apply geometric theorem knowledge to carry out algebraic calculation and get numerical results finally.
  • Diagram-dependent: Above 90% of problems must be solved using the diagrams because necessary conditions such as variable content and geometric structure are displayed via visual form instead of text.
  • Abstract: The diagram is integrated with basic geometric primitives (point, line, circle) and non-geometric primitives (text, symbol). No complex semantic scenarios are involved in textual problem except abstract geometric conditions.
  • Fine-grained: Problems with the same diagram vary in conditions or targets. Slight distinctions in textual problems usually lead to completely different solutions to problems.
  • Condition-redundancy: Lots of conditions in semantic clauses or textual problem are not needed in problem solving at hand.

PGPS9K Dataset

Overview

The Plane Geometry Problem Solving Dataset (PGPS9K) was constructed by the State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation of Chinese Academy of Sciences(CASIA). The samples in PGPS9K are labeled with both fine-grained diagram annotation and interpretable solution program, where the diagram annotation is converted into structural clauses and semantic clauses to effectively describe multi-level information in geometry diagram.

PGPS9K is composed of 9,022 geometry problems paired with non-duplicate 4,000 geometry diagrams, where 2,891 problems paired with 1,738 diagrams are selected from Geometry3K dataset, the rest of problems are collected from five popular textbooks across grades 6-12 on mathematics curriculum websites. Our PGPS9K is divided into 30 problem types as exhibited in the following figure, covering almost all problem types of plane geometry problem in corresponding grades.

Examples

The annotations of PGPS9K include diagram annotation and solution program, where the diagram annotation is to extract structural and semantic information in diagram and the solution program defines the solution steps of problem. Diagram annotation, textual clauses and solution program are shown as follows.



Experiment Results

Numerical answer accuracies of state-of-the-art GPS solvers.

Method Geometry3K PGPS9K
Completion Choice Top-3 Completion Choice Top-3
Human Expert [Lu et al., 2021] - 90.9 - - - -
Baseline (Neural Solver) [Lu et al., 2021] - 35.9 - - - -
InterGPS (Predict)* [Lu et al., 2021] 44.6 56.9 - - - -
InterGPS (Diagram GT)* [Lu et al., 2021] 64.2 71.7 - - - -
InterGPS (All GT)* [Lu et al., 2021] 69.0 75.9 - - - -
NGS# [Chen et al., 2021] 35.3 58.8 62.0 34.1 46.1 60.9
Geoformer# [Chen et al., 2022] 36.8 59.3 62.5 35.6 47.3 62.3
PGPSNet 65.0 77.9 80.7 62.7 70.4 79.5
* denotes results re-produced with the authors' code.
# denotes methods re-implemented by us.

Ablation studies on Geometry3K.

Self-limited Decoder Data Aug Structural Clauses Pre-trained LM Ans acc Prog acc
Completion Choice Top-3 Completion Choice Top-3
32.5 52.2 57.6 27.2 47.3 53.1
28.2 48.3 50.7 25.4 42.7 45.6
36.6 59.5 62.4 33.9 52.8 58.6
38.4 61.7 64.8 34.8 54.2 59.2
48.1 67.5 71.4 45.4 62.0 68.1
65.0 77.9 80.7 62.8 72.4 78.2

🚨 For more details, please refer to this link

BibTeX

@inproceedings{Zhang2023PGPS,
      title     = {A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram},
      author    = {Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin},
      booktitle = {IJCAI},
      year      = {2023},
    }