TY - JOUR

T1 - A Simulation Study to Investigate the Behavior of the Log-Density Ratio Under Normality

AU - Scrucca, Luca

AU - Weisberg, Sanford

PY - 2004/2

Y1 - 2004/2

N2 - For a logistic regression model the log-odds depend on the log of the ratio of the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted by investigating the inverse problem of the predictors given the response. For binary responses, assuming certain parametric distributions, it is possible to obtain which terms are needed, and how they should be included in a logistic regression model. In the one predictor case, and under the normality assumption, a known result shows that a linear and a quadratic term are needed in a logistic regression model, with the quadratic term not required if the two conditional distributions have the same variance. However, the quadratic component may not be needed if the linear term is sufficient to discriminate between the two groups, that is if the two conditional distributions are far enough apart. A simulation study is presented which shows that if the ratio of variances is between 2/3 and 1.5 the quadratic term is less likely to be useful; this also happens when the mean difference scaled by the variance ratio tends to be large. Graphically, if the conditional distributions of x|y for the two groups are well separated a linear term should contain all the relevant statistical information available in the data. On the contrary, if they overlap significantly, and the variances are clearly not equal, then the quadratic term is likely to be needed. Minor deviations from normality should not be worrisome, particularly outside the range in which the empirical distributions overlap.

AB - For a logistic regression model the log-odds depend on the log of the ratio of the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted by investigating the inverse problem of the predictors given the response. For binary responses, assuming certain parametric distributions, it is possible to obtain which terms are needed, and how they should be included in a logistic regression model. In the one predictor case, and under the normality assumption, a known result shows that a linear and a quadratic term are needed in a logistic regression model, with the quadratic term not required if the two conditional distributions have the same variance. However, the quadratic component may not be needed if the linear term is sufficient to discriminate between the two groups, that is if the two conditional distributions are far enough apart. A simulation study is presented which shows that if the ratio of variances is between 2/3 and 1.5 the quadratic term is less likely to be useful; this also happens when the mean difference scaled by the variance ratio tends to be large. Graphically, if the conditional distributions of x|y for the two groups are well separated a linear term should contain all the relevant statistical information available in the data. On the contrary, if they overlap significantly, and the variances are clearly not equal, then the quadratic term is likely to be needed. Minor deviations from normality should not be worrisome, particularly outside the range in which the empirical distributions overlap.

KW - Binary response

KW - Log-density ratio

KW - Logistic regression

KW - Monte Carlo simulation

KW - Regression graphics

UR - http://www.scopus.com/inward/record.url?scp=1642380961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1642380961&partnerID=8YFLogxK

U2 - 10.1081/SAC-120028439

DO - 10.1081/SAC-120028439

M3 - Article

AN - SCOPUS:1642380961

VL - 33

SP - 159

EP - 178

JO - Communications in Statistics Part B: Simulation and Computation

JF - Communications in Statistics Part B: Simulation and Computation

SN - 0361-0918

IS - 1

ER -