The choice of variable-selection methods to identify important variables for binary classification modeling is critical for producing stable statistical models that are interpretable, that generate accurate predictions, and have minimal bias. This work is motivated by the availability of data on clinical and laboratory features of dengue fever infections obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Our paper uses objective Bayesian method to identify important variables for dengue hemorrhagic fever (DHF) over the dengue data set. With the selected important variables by objective Bayesian method, we employ a Gaussian copula marginal regression model considering correlation error structure and a general method of semi-parametric Bayesian inference for Gaussian copula model to estimate, separately, the marginal distribution and dependence structure. We also carry out a receiver operating characteristic (ROC) analysis for the predictive model for DHF and compare our proposed model with the other models of Ju and Brasier (Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever. BMC Res Notes 6:365, 2013) tested on the basis of the ROC analysis. Our results extend the previous models of DHF by suggesting that IL-10, Days Fever, Sex and Lymphocytes are the major features for predicting DHF on the basis of blood chemistries and cytokine measurements. In addition, the dependence structure of these Days Fever, Lymphocytes, IL-10 and Sex protein profiles associated with disease outcomes was discovered by the semi-parametric Bayesian Gaussian copula model and Gaussian partial correlation method.
|Original language||English (US)|
|Number of pages||16|
|Journal||Annals of Data Science|
|State||Published - Dec 1 2020|
Bibliographical notePublisher Copyright:
© 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
- Variable selection