Utility-University Collaboration Publication Data

  • Andrew Butts (Creator)
  • Stephen Rose (Contributor)
  • Julia Wilber (Contributor)



This dataset is a collection of metadata describing the authors, their organizational affiliations, and locations associated with academic publications that result from collaborations between academic researchers and electric utilities. It is queried from the Scopus database by searching for publications where at least one author is affiliated with one of the 20 largest U.S. electric utilities.

We used this data set to better understand the nature of and factors in utility-university collaboration formation. In addition to understanding the role geography/proximity plays, we also conducted limited network analysis to identify high frequency collaborators at both the author and organizational scale. We identified some time series trends such as increasing numbers of publications and increasing distances between collaborators over time, but we did not determine their significance by controlling for external factors like funding, regulation, and technological changes. Future work could use the included classifications for each publication to understand the changing mix of research topics over time.

The interviews we conducted for the accompanying research suggest that several types of collaborations are not represented in the publication dataset, including unsuccessful collaborations, many types of student-driven practicum-style work, and for-hire work that may assist in regulatory filings, internal documents, or other non-academic publications.

We include four separate versions of this dataset at different stages of its refinement to better enable any reproductions, expansions or refinements of the dataset.

The first file (Initial Publication Queries By Utility.zip) is our raw output from the Scopus queries.

The second file (Author-Parsed Publication Queries By Utilities.zip) is the parsed output of the queries, where each author and affiliation are separated.

The third file (Publication Dataset with Duplicates and Erroneous Entries.csv) combines all utilities into a single file and includes many manual corrections to parsed or missing information, as well as some additional fields to classify data and identify duplicates and records erroneously included.

The fourth file (Final Utility-University Publication Dataset.csv) then removes some of those additional fields as well as all duplicates and erroneously included records. This was the file we used for our final analyses.
Date made available2019
PublisherMendeley Data

Cite this