Prediction models for network-linked data

Tianxi Li, Elizaveta Levina, Ji Zhu

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Prediction algorithms typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risktaking behaviors, information on the subjects’ social network is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account in prediction models should allow us to improve their performance. Here we propose a network-based penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that incorporating it into prediction leads to improvement over traditional models both theoretically and empirically when network cohesion is present. The penalty can be used with many loss-based prediction methods, such as regression, generalized linear models, and Cox’s proportional hazard model. Applications to predicting levels of recreational activity and marijuana usage among teenagers from the AddHealth study based on both demographic covariates and friendship networks are discussed in detail and show that our approach to taking friendships into account can significantly improve predictions of behavior while providing interpretable estimates of covariate effects.

Original languageEnglish (US)
Pages (from-to)132-164
Number of pages33
JournalAnnals of Applied Statistics
Volume13
Issue number1
DOIs
StatePublished - 2019
Externally publishedYes

Bibliographical note

Funding Information:
Acknowledgments. We thank the Associate Editor and two referees for many helpful suggestions that greatly improved the paper. This research uses data from Add Health, a program project designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris, and funded by a grant P01-HD31921 from the Eu-nice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 17 other agencies. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Persons interested in obtaining Data Files from Add Health should contact Add Health, University of North Carolina at Chapel Hill, Carolina Population Center, 206 W. Franklin St., Chapel Hill, NC 27516-2524 (addhealth_contractsunc.edu). No direct support was received from grant P01-HD31921 for this analysis.

Funding Information:
Received May 2017; revised June 2018. 1Supported in part by ONR Grant N000141612910, NSF Grants DMS-1159005, DMS-1407698, and NIH Grant R01GM096194. 2Supported in part by Rackham International Student Fellowship. The work was done when the author was at the University of Michigan. 3Supported by NSF Grant DMS-1521551 and ONR Grant N000141612910. 4Supported by NSF Grant DMS-1407698 and NIH Grant R01GM096194. Key words and phrases. Network cohesion, prediction, regression.

Funding Information:
Supported in part by ONR Grant N000141612910, NSF Grants DMS-1159005, DMS-1407698, and NIH Grant R01GM096194. Supported in part by Rackham International Student Fellowship. The work was done when the author was at the University of Michigan. Supported by NSF Grant DMS-1521551 and ONR Grant N000141612910. Supported by NSF Grant DMS-1407698 and NIH Grant R01GM096194. We thank the Associate Editor and two referees for many helpful suggestions that greatly improved the paper. This research uses data from Add Health, a program project designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris, and funded by a grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 17 other agencies. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Persons interested in obtaining Data Files from Add Health should contact Add Health, University of North Carolina at Chapel Hill, Carolina Population Center, 206 W. Franklin St., Chapel Hill, NC 27516-2524 (addhealth_contractsunc.edu). No direct support was received from grant P01-HD31921 for this analysis.

Publisher Copyright:
© Institute of Mathematical Statistics, 2019.

Keywords

  • Network cohesion
  • Prediction
  • Regression

Fingerprint

Dive into the research topics of 'Prediction models for network-linked data'. Together they form a unique fingerprint.

Cite this