A graph convolutional neural network for gene expression data analysis with multiple gene networks

Hu Yang, Zhong Zhuang, Wei Pan

Research output: Contribution to journalArticlepeer-review

Abstract

Spectral graph convolutional neural networks (GCN) are proposed to incorporate important information contained in graphs such as gene networks. In a standard spectral GCN, there is only one gene network to describe the relationships among genes. However, for genomic applications, due to condition- or tissue-specific gene function and regulation, multiple gene networks may be available; it is unclear how to apply GCNs to disease classification with multiple networks. Besides, which gene networks may provide more effective prior information for a given learning task is unknown a priori and is not straightforward to discover in many cases. A deep multiple graph convolutional neural network is therefore developed here to meet the challenge. The new approach not only computes a feature of a gene as the weighted average of those of itself and its neighbors through spectral GCNs, but also extracts features from gene-specific expression (or other feature) profiles via a feed-forward neural networks (FNN). We also provide two measures, the importance of a given gene and the relative importance score of each gene network, for the genes' and gene networks' contributions, respectively, to the learning task. To evaluate the new method, we conduct real data analyses using several breast cancer and diffuse large B-cell lymphoma datasets and incorporating multiple gene networks obtained from “GIANT 2.0” Compared with the standard FNN, GCN, and random forest, the new method not only yields high classification accuracy but also prioritizes the most important genes confirmed to be highly associated with cancer, strongly suggesting the usefulness of the new method in incorporating multiple gene networks.

Original languageEnglish (US)
Pages (from-to)5547-5564
Number of pages18
JournalStatistics in Medicine
Volume40
Issue number25
DOIs
StatePublished - Jul 14 2021

Bibliographical note

Funding Information:
information the National Natural Science Foundation for Distinguished Young Scholars of China, 71701223; the National Statistical Science Foundation of China, 2018LZ08; the Central University of Finance and Economics Young Talents Training Support Project, QYP2014We are grateful to the reviewers for many constructive and insightful comments. We thank Haoran Xue in the School of Statistics at the University of Minnesota for helpful discussions. H.Y. was supported by grants from the National Natural Science Foundation for Distinguished Young Scholars of China Project number 71701223, the National Statistical Science Foundation of China Project number 2018LZ08 and the Central University of Finance and Economics Young Talents Training Support Project QYP2014.

Funding Information:
We are grateful to the reviewers for many constructive and insightful comments. We thank Haoran Xue in the School of Statistics at the University of Minnesota for helpful discussions. H.Y. was supported by grants from the National Natural Science Foundation for Distinguished Young Scholars of China Project number 71701223, the National Statistical Science Foundation of China Project number 2018LZ08 and the Central University of Finance and Economics Young Talents Training Support Project QYP2014.

Funding Information:
the National Natural Science Foundation for Distinguished Young Scholars of China, 71701223; the National Statistical Science Foundation of China, 2018LZ08; the Central University of Finance and Economics Young Talents Training Support Project, QYP2014 Funding information

Publisher Copyright:
© 2021 John Wiley & Sons Ltd.

Keywords

  • Laplacian
  • deep learning
  • feed-forward neural network
  • gene expression data
  • spectral graph theory

Fingerprint

Dive into the research topics of 'A graph convolutional neural network for gene expression data analysis with multiple gene networks'. Together they form a unique fingerprint.

Cite this