LIGHT: Multi-modal Text Linking on Historical Maps

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Text on historical maps provides valuable information for studies in history, economics, geography, and other related fields. Unlike structured or semi-structured documents, text on maps varies significantly in orientation, reading order, shape, and placement. Many modern methods can detect and transcribe text regions, but they struggle to effectively “link” the recognized text fragments, e.g., determining a multi-word place name. Existing layout analysis methods model word relationships to improve text understanding in structured documents, but they primarily rely on linguistic features and neglect geometric information, which is essential for handling map text. To address these challenges, we propose LIGHT, a novel multi-modal approach that integrates linguistic, image, and geometric features for linking text on historical maps. In particular, LIGHT includes a geometry-aware embedding module that encodes the polygonal coordinates of text regions to capture polygon shapes and their relative spatial positions on an image. LIGHT unifies this geometric information with the visual and linguistic token embeddings from LayoutLMv3, a pretrained layout analysis model. LIGHT uses the cross-modal information to predict the reading-order successor of each text instance directly with a bi-directional learning strategy that enhances sequence robustness. Experimental results show that LIGHT outperforms existing methods on the ICDAR 2024/2025 MapText Competition data, demonstrating the effectiveness of multi-modal learning for historical map text linking.

Original languageEnglish (US)
Title of host publicationDocument Analysis and Recognition – ICDAR 2025 - 19th International Conference, Proceedings
EditorsXu-Cheng Yin, Dimosthenis Karatzas, Daniel Lopresti
PublisherSpringer Science and Business Media Deutschland GmbH
Pages60-77
Number of pages18
ISBN (Print)9783032046161
DOIs
StatePublished - 2026
Event19th International Conference on Document Analysis and Recognition, ICDAR 2025 - Wuhan, China
Duration: Sep 16 2025Sep 21 2025

Publication series

NameLecture Notes in Computer Science
Volume16024 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Document Analysis and Recognition, ICDAR 2025
Country/TerritoryChina
CityWuhan
Period9/16/259/21/25

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

Keywords

  • Historical Maps
  • Layout Analysis
  • Text Linking

Fingerprint

Dive into the research topics of 'LIGHT: Multi-modal Text Linking on Historical Maps'. Together they form a unique fingerprint.

Cite this