Accounting for data heterogeneity in integrative analysis and prediction methods: An application to Chronic Obstructive Pulmonary Disease

J. Butts, C. Wendt, R. Bowler, C. P. Hersh, Q. Long, L. Eberly, S. E. Safo

Research output: Working paperPreprint

33 Downloads (Pure)


Epidemiologic and genetic studies in chronic obstructive pulmonary disease (COPD) and many complex diseases suggest subgroup disparities (e.g., by sex). We consider this problem from the standpoint of integrative analysis where we combine information from different views (e.g., genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. To address analytical challenges in the problem of our interest, we propose a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to males and females and that contribute to the variation in COPD, measured by airway wall thickness. HIP (Heterogeneity in Integration and Prediction) accounts for subgroup heterogeneity, allows for sparsity in variable selection, is applicable to multi-class and to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. We develop efficient algorithms in PyTorch. Our COPD findings have identified several proteins, genes, and pathways that are common and specific to males and females, some of which have been implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms.
Original languageUndefined/Unknown
StatePublished - Nov 12 2021


  • stat.ME

Cite this