Research dataset discovery from research publications using web context

Ayush Singhal, Jaideep Srivastava

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Scientific datasets play a crucial role in data-driven research. While there are several repositories that curate public datasets, several more datasets and their usage is hidden in the research publications. Hence, discovering a relevant dataset for a research topic requires in-depth investigation of several publications, tracking dataset usage and in-exhaustive literature search. To this end, a search engine to directly handle the research dataset discovery problem is extremely useful for the scientific community. In this work, we define an important paradigm of dataset search known as dataset discovery in application context. Unlike dataset look-up type search where the user looks up for dataset in a repository, application context based search corresponds to search without information about the name of the dataset. Such searches arise when the user is looking a best fit dataset for his research problem. We show that in this paradigm of search, conventional methods of indexing the little text about the dataset description do not work due to lack of application text content within the description text for a dataset. To alleviate this problem we propose two models of search, namely, (1) a user profile based search and (2) a keyword based search. We show that in both these models the dataset discovery is done in the application context by leveraging information from open source web resources such as scholarly articles repositories and academic search engines. The performance of the proposed models were tested with simulated test queries (user profiles) as well as with real world user studies.

Original languageEnglish (US)
Pages (from-to)81-99
Number of pages19
JournalWeb Intelligence
Issue number2
StatePublished - 2017

Bibliographical note

Publisher Copyright:
© 2017 - IOS Press and the authors. All rights reserved.


  • Search engine
  • context generation
  • dataset search
  • text mining


Dive into the research topics of 'Research dataset discovery from research publications using web context'. Together they form a unique fingerprint.

Cite this