Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black'' and "Mountain'' vs. "Black Mountain''). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.
|Original language||English (US)|
|Title of host publication||KDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Publisher||Association for Computing Machinery|
|Number of pages||9|
|State||Published - Aug 23 2020|
|Event||26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States|
Duration: Aug 23 2020 → Aug 27 2020
|Name||Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Conference||26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020|
|Period||8/23/20 → 8/27/20|
Bibliographical noteFunding Information:
This material is based upon work supported in part by the National Science Foundation under Grant Nos. IIS 1564164 (to the University of Southern California) and IIS 1563933 (to the University of Colorado at Boulder), NVIDIA Corporation, and the USC Undergraduate Research Associates Program.
© 2020 Owner/Author.
- entity matching
- historical map processing
- information extraction
- neural networks
- text linking