A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data

Zhong Zhuang, Xiaotong Shen, Wei Pan

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Motivation: Enhancer-promoter interactions (EPIs) in the genome play an important role in transcriptional regulation. EPIs can be useful in boosting statistical power and enhancing mechanistic interpretation for disease-or trait-associated genetic variants in genome-wide association studies. Instead of expensive and time-consuming biological experiments, computational prediction of EPIs with DNA sequence and other genomic data is a fast and viable alternative. In particular, deep learning and other machine learning methods have been demonstrated with promising performance. Results: First, using a published human cell line dataset, we demonstrate that a simple convolutional neural network (CNN) performs as well as, if no better than, a more complicated and state-of-the-art architecture, a hybrid of a CNN and a recurrent neural network. More importantly, in spite of the well-known cell line-specific EPIs (and corresponding gene expression), in contrast to the standard practice of training and predicting for each cell line separately, we propose two transfer learning approaches to training a model using all cell lines to various extents, leading to substantially improved predictive performance.

Original languageEnglish (US)
Pages (from-to)2899-2906
Number of pages8
JournalBioinformatics
Volume35
Issue number17
DOIs
StatePublished - Sep 1 2019

Fingerprint

DNA sequences
Promoter
DNA Sequence
Cells
Neural Networks
Neural networks
Cell Line
Line
Prediction
Cell
Interaction
Genome
Genes
Transfer Learning
Transcriptional Regulation
Statistical Power
Recurrent neural networks
Genome-Wide Association Study
Recurrent Neural Networks
Boosting

PubMed: MeSH publication types

  • Journal Article

Cite this

A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data. / Zhuang, Zhong; Shen, Xiaotong; Pan, Wei.

In: Bioinformatics, Vol. 35, No. 17, 01.09.2019, p. 2899-2906.

Research output: Contribution to journalArticle

@article{833f2dfea8114896b7432cde1275dacf,
title = "A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data",
abstract = "Motivation: Enhancer-promoter interactions (EPIs) in the genome play an important role in transcriptional regulation. EPIs can be useful in boosting statistical power and enhancing mechanistic interpretation for disease-or trait-associated genetic variants in genome-wide association studies. Instead of expensive and time-consuming biological experiments, computational prediction of EPIs with DNA sequence and other genomic data is a fast and viable alternative. In particular, deep learning and other machine learning methods have been demonstrated with promising performance. Results: First, using a published human cell line dataset, we demonstrate that a simple convolutional neural network (CNN) performs as well as, if no better than, a more complicated and state-of-the-art architecture, a hybrid of a CNN and a recurrent neural network. More importantly, in spite of the well-known cell line-specific EPIs (and corresponding gene expression), in contrast to the standard practice of training and predicting for each cell line separately, we propose two transfer learning approaches to training a model using all cell lines to various extents, leading to substantially improved predictive performance.",
author = "Zhong Zhuang and Xiaotong Shen and Wei Pan",
year = "2019",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/bty1050",
language = "English (US)",
volume = "35",
pages = "2899--2906",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data

AU - Zhuang, Zhong

AU - Shen, Xiaotong

AU - Pan, Wei

PY - 2019/9/1

Y1 - 2019/9/1

N2 - Motivation: Enhancer-promoter interactions (EPIs) in the genome play an important role in transcriptional regulation. EPIs can be useful in boosting statistical power and enhancing mechanistic interpretation for disease-or trait-associated genetic variants in genome-wide association studies. Instead of expensive and time-consuming biological experiments, computational prediction of EPIs with DNA sequence and other genomic data is a fast and viable alternative. In particular, deep learning and other machine learning methods have been demonstrated with promising performance. Results: First, using a published human cell line dataset, we demonstrate that a simple convolutional neural network (CNN) performs as well as, if no better than, a more complicated and state-of-the-art architecture, a hybrid of a CNN and a recurrent neural network. More importantly, in spite of the well-known cell line-specific EPIs (and corresponding gene expression), in contrast to the standard practice of training and predicting for each cell line separately, we propose two transfer learning approaches to training a model using all cell lines to various extents, leading to substantially improved predictive performance.

AB - Motivation: Enhancer-promoter interactions (EPIs) in the genome play an important role in transcriptional regulation. EPIs can be useful in boosting statistical power and enhancing mechanistic interpretation for disease-or trait-associated genetic variants in genome-wide association studies. Instead of expensive and time-consuming biological experiments, computational prediction of EPIs with DNA sequence and other genomic data is a fast and viable alternative. In particular, deep learning and other machine learning methods have been demonstrated with promising performance. Results: First, using a published human cell line dataset, we demonstrate that a simple convolutional neural network (CNN) performs as well as, if no better than, a more complicated and state-of-the-art architecture, a hybrid of a CNN and a recurrent neural network. More importantly, in spite of the well-known cell line-specific EPIs (and corresponding gene expression), in contrast to the standard practice of training and predicting for each cell line separately, we propose two transfer learning approaches to training a model using all cell lines to various extents, leading to substantially improved predictive performance.

UR - http://www.scopus.com/inward/record.url?scp=85071517568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071517568&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty1050

DO - 10.1093/bioinformatics/bty1050

M3 - Article

C2 - 30649185

AN - SCOPUS:85071517568

VL - 35

SP - 2899

EP - 2906

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -