TY - GEN
T1 - Automating document annotation using open source knowledge
AU - Singhal, Ayush
AU - Kasturi, Ravindra
AU - Srivastava, Jaideep
PY - 2013
Y1 - 2013
N2 - Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.
AB - Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.
KW - Document summarization
KW - Global context
KW - Google Scholar
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84893336774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893336774&partnerID=8YFLogxK
U2 - 10.1109/WI-IAT.2013.30
DO - 10.1109/WI-IAT.2013.30
M3 - Conference contribution
AN - SCOPUS:84893336774
SN - 9781479929023
T3 - Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
SP - 199
EP - 204
BT - Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
T2 - 2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
Y2 - 17 November 2013 through 20 November 2013
ER -