Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images

Mengli Xiao, Xiaotong T Shen, Wei Pan

Research output: Contribution to journalArticle

Abstract

Single-cell microscopy image analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large number of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. Contrarily, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep convolutional neural networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image data set. We show a consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pretrained models to facilitate CNN's applications in genetics and computational biology.

Original languageEnglish (US)
Pages (from-to)330-341
Number of pages12
JournalGenetic epidemiology
Volume43
Issue number3
DOIs
StatePublished - Apr 1 2019

Fingerprint

Microscopy
Proteins
Learning
Computational Biology
Datasets

Keywords

  • CNNs
  • deep learning
  • feature extraction
  • gradient boosting
  • random forests

PubMed: MeSH publication types

  • Journal Article
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

Cite this

@article{c49ad58814b04ac996e73e991e97112e,
title = "Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images",
abstract = "Single-cell microscopy image analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large number of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. Contrarily, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep convolutional neural networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image data set. We show a consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pretrained models to facilitate CNN's applications in genetics and computational biology.",
keywords = "CNNs, deep learning, feature extraction, gradient boosting, random forests",
author = "Mengli Xiao and Shen, {Xiaotong T} and Wei Pan",
year = "2019",
month = "4",
day = "1",
doi = "10.1002/gepi.22182",
language = "English (US)",
volume = "43",
pages = "330--341",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images

AU - Xiao, Mengli

AU - Shen, Xiaotong T

AU - Pan, Wei

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Single-cell microscopy image analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large number of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. Contrarily, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep convolutional neural networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image data set. We show a consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pretrained models to facilitate CNN's applications in genetics and computational biology.

AB - Single-cell microscopy image analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large number of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. Contrarily, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep convolutional neural networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image data set. We show a consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pretrained models to facilitate CNN's applications in genetics and computational biology.

KW - CNNs

KW - deep learning

KW - feature extraction

KW - gradient boosting

KW - random forests

UR - http://www.scopus.com/inward/record.url?scp=85059594835&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059594835&partnerID=8YFLogxK

U2 - 10.1002/gepi.22182

DO - 10.1002/gepi.22182

M3 - Article

C2 - 30614068

AN - SCOPUS:85059594835

VL - 43

SP - 330

EP - 341

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -