TY - GEN
T1 - Generating semantic annotations for research datasets
AU - Singhal, Ayush
AU - Srivastava, Jaideep
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.
AB - Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.
KW - search engines
KW - semantic annotation
KW - summarization of Web data
KW - web mining
UR - http://www.scopus.com/inward/record.url?scp=84903649753&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903649753&partnerID=8YFLogxK
U2 - 10.1145/2611040.2611056
DO - 10.1145/2611040.2611056
M3 - Conference contribution
AN - SCOPUS:84903649753
SN - 9781450325387
T3 - ACM International Conference Proceeding Series
BT - 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014
PB - Association for Computing Machinery
T2 - 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014
Y2 - 2 June 2014 through 4 June 2014
ER -