A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents

Florendia Fourli-Kartsouni, Kostas Slavakis, Georgios Kouroupetroglou, Sergios Theodoridis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

The wide-spread applications of document digitization have lead to the use of structured digital representation methods such as the XML language. Extraction methodologies for the formatting metadata can be used on such structured documents for enhancing their accessibility, including augmented audio representation of documents. To the best of our knowledge, an effort has yet to be made to produce an automatic extraction system of semantic information of the document formatting, solely from document layout, without the use of natural language processing. In this study a corpus of XML representations of several issues of a Greek newspaper is used in order to create and evaluate a semantic classifier of text formatting, based on Bayesian Networks.

Original languageEnglish (US)
Title of host publicationUniversal Access in Human-Computer Interaction
Subtitle of host publicationApplications and Services - 4th Int. Conference on Universal Access in Human-Computer Interaction, UAHCI 2007. Held as Part of HCI Int. 2007 Proc.
Pages299-308
Number of pages10
EditionPART 3
Publication statusPublished - Dec 1 2007
Event4th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2007 - Beijing, China
Duration: Jul 22 2007Jul 27 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 3
Volume4556 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other4th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2007
CountryChina
CityBeijing
Period7/22/077/27/07

    Fingerprint

Keywords

  • Document accessibility
  • Document analysis
  • Semantic labeling

Cite this

Fourli-Kartsouni, F., Slavakis, K., Kouroupetroglou, G., & Theodoridis, S. (2007). A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents. In Universal Access in Human-Computer Interaction: Applications and Services - 4th Int. Conference on Universal Access in Human-Computer Interaction, UAHCI 2007. Held as Part of HCI Int. 2007 Proc. (PART 3 ed., pp. 299-308). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4556 LNCS, No. PART 3).