psHarmonize: Facilitating reproducible large-scale pre-statistical data harmonization and documentation in R

John J. Stephen, Padraig Carolan, Amy E. Krefman, Sanaz Sedaghat, Maxwell Mansolf, Norrina B. Allen, Denise M. Scholtens

Research output: Contribution to journalArticlepeer-review

Abstract

Combining pertinent data from multiple studies can increase the robustness of epidemiological investigations. Effective “pre-statistical” data harmonization is paramount to the streamlined conduct of collective, multi-study analysis. Harmonizing data and documenting decisions about the transformations of variables to a common set of categorical values and measurement scales are time consuming and can be error prone, particularly for numerous studies with large quantities of variables. The psHarmonize R package facilitates harmonization by combining multiple datasets, applying data transformation functions, and creating long and wide harmonized datasets. The user provides transformation instructions in a “harmonization sheet” that includes dataset names, variable names, and coding instructions and centrally tracks all decisions. The package performs harmonization, generates error logs as necessary, and creates summary reports of harmonized data. psHarmonize is poised to serve as a central feature of data preparation for the joint analysis of multiple studies.

Original languageEnglish (US)
Article number101003
JournalPatterns
DOIs
StateAccepted/In press - 2024

Bibliographical note

Publisher Copyright:
© 2024 The Authors

Keywords

  • data harmonization
  • data integration
  • data management
  • data pooling
  • R package

Fingerprint

Dive into the research topics of 'psHarmonize: Facilitating reproducible large-scale pre-statistical data harmonization and documentation in R'. Together they form a unique fingerprint.

Cite this