TY - GEN
T1 - A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents
AU - Fourli-Kartsouni, Florendia
AU - Slavakis, Kostas
AU - Kouroupetroglou, Georgios
AU - Theodoridis, Sergios
PY - 2007
Y1 - 2007
N2 - The wide-spread applications of document digitization have lead to the use of structured digital representation methods such as the XML language. Extraction methodologies for the formatting metadata can be used on such structured documents for enhancing their accessibility, including augmented audio representation of documents. To the best of our knowledge, an effort has yet to be made to produce an automatic extraction system of semantic information of the document formatting, solely from document layout, without the use of natural language processing. In this study a corpus of XML representations of several issues of a Greek newspaper is used in order to create and evaluate a semantic classifier of text formatting, based on Bayesian Networks.
AB - The wide-spread applications of document digitization have lead to the use of structured digital representation methods such as the XML language. Extraction methodologies for the formatting metadata can be used on such structured documents for enhancing their accessibility, including augmented audio representation of documents. To the best of our knowledge, an effort has yet to be made to produce an automatic extraction system of semantic information of the document formatting, solely from document layout, without the use of natural language processing. In this study a corpus of XML representations of several issues of a Greek newspaper is used in order to create and evaluate a semantic classifier of text formatting, based on Bayesian Networks.
KW - Document accessibility
KW - Document analysis
KW - Semantic labeling
UR - http://www.scopus.com/inward/record.url?scp=38149029061&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38149029061&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73283-9_34
DO - 10.1007/978-3-540-73283-9_34
M3 - Conference contribution
AN - SCOPUS:38149029061
SN - 9783540732822
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 299
EP - 308
BT - Universal Access in Human-Computer Interaction
PB - Springer Verlag
T2 - 4th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2007
Y2 - 22 July 2007 through 27 July 2007
ER -