Identifying unproven cancer treatments on the health Web: Addressing accuracy, generalizability and scalability

Yin Aphinyanaphongs, Lawrence D. Fu, Constantin F. Aliferis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.

Original languageEnglish (US)
Title of host publicationMEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics
PublisherIOS Press
Pages667-671
Number of pages5
Edition1-2
ISBN (Print)9781614992882
DOIs
StatePublished - Jan 1 2013
Event14th World Congress on Medical and Health Informatics, MEDINFO 2013 - Copenhagen, Denmark
Duration: Aug 20 2013Aug 23 2013

Publication series

NameStudies in Health Technology and Informatics
Number1-2
Volume192
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Other

Other14th World Congress on Medical and Health Informatics, MEDINFO 2013
CountryDenmark
CityCopenhagen
Period8/20/138/23/13

Fingerprint

Oncology
Scalability
Health
Language
Neoplasms
Feature extraction
World Wide Web
Learning systems

Keywords

  • Artificial Intelligence
  • Consumer Product Safety
  • Information Storage and Retrieval
  • Internet
  • Neoplasms

Cite this

Aphinyanaphongs, Y., Fu, L. D., & Aliferis, C. F. (2013). Identifying unproven cancer treatments on the health Web: Addressing accuracy, generalizability and scalability. In MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics (1-2 ed., pp. 667-671). (Studies in Health Technology and Informatics; Vol. 192, No. 1-2). IOS Press. https://doi.org/10.3233/978-1-61499-289-9-667

Identifying unproven cancer treatments on the health Web : Addressing accuracy, generalizability and scalability. / Aphinyanaphongs, Yin; Fu, Lawrence D.; Aliferis, Constantin F.

MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics. 1-2. ed. IOS Press, 2013. p. 667-671 (Studies in Health Technology and Informatics; Vol. 192, No. 1-2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aphinyanaphongs, Y, Fu, LD & Aliferis, CF 2013, Identifying unproven cancer treatments on the health Web: Addressing accuracy, generalizability and scalability. in MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics. 1-2 edn, Studies in Health Technology and Informatics, no. 1-2, vol. 192, IOS Press, pp. 667-671, 14th World Congress on Medical and Health Informatics, MEDINFO 2013, Copenhagen, Denmark, 8/20/13. https://doi.org/10.3233/978-1-61499-289-9-667
Aphinyanaphongs Y, Fu LD, Aliferis CF. Identifying unproven cancer treatments on the health Web: Addressing accuracy, generalizability and scalability. In MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics. 1-2 ed. IOS Press. 2013. p. 667-671. (Studies in Health Technology and Informatics; 1-2). https://doi.org/10.3233/978-1-61499-289-9-667
Aphinyanaphongs, Yin ; Fu, Lawrence D. ; Aliferis, Constantin F. / Identifying unproven cancer treatments on the health Web : Addressing accuracy, generalizability and scalability. MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics. 1-2. ed. IOS Press, 2013. pp. 667-671 (Studies in Health Technology and Informatics; 1-2).
@inproceedings{700dc143eb0643fea608d5210fd51db7,
title = "Identifying unproven cancer treatments on the health Web: Addressing accuracy, generalizability and scalability",
abstract = "Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.",
keywords = "Artificial Intelligence, Consumer Product Safety, Information Storage and Retrieval, Internet, Neoplasms",
author = "Yin Aphinyanaphongs and Fu, {Lawrence D.} and Aliferis, {Constantin F.}",
year = "2013",
month = "1",
day = "1",
doi = "10.3233/978-1-61499-289-9-667",
language = "English (US)",
isbn = "9781614992882",
series = "Studies in Health Technology and Informatics",
publisher = "IOS Press",
number = "1-2",
pages = "667--671",
booktitle = "MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics",
address = "United States",
edition = "1-2",

}

TY - GEN

T1 - Identifying unproven cancer treatments on the health Web

T2 - Addressing accuracy, generalizability and scalability

AU - Aphinyanaphongs, Yin

AU - Fu, Lawrence D.

AU - Aliferis, Constantin F.

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.

AB - Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.

KW - Artificial Intelligence

KW - Consumer Product Safety

KW - Information Storage and Retrieval

KW - Internet

KW - Neoplasms

UR - http://www.scopus.com/inward/record.url?scp=84894327491&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894327491&partnerID=8YFLogxK

U2 - 10.3233/978-1-61499-289-9-667

DO - 10.3233/978-1-61499-289-9-667

M3 - Conference contribution

C2 - 23920640

AN - SCOPUS:84894327491

SN - 9781614992882

T3 - Studies in Health Technology and Informatics

SP - 667

EP - 671

BT - MEDINFO 2013 - Proceedings of the 14th World Congress on Medical and Health Informatics

PB - IOS Press

ER -