Models of species ecological niches and geographic distributions now represent a widely used tool in ecology, evolution, and biogeography. However, the very common situation of species with few available occurrence localities presents major challenges for such modeling techniques, in particular regarding model complexity and evaluation. Here, we summarize the state of the field regarding these issues and provide a worked example using the technique Maxent for a small mammal endemic to Madagascar (the nesomyine rodent Eliurus majori). Two relevant model-selection approaches exist in the literature (information criteria, specifically AICc; and performance predicting withheld data, via a jackknife), but AICc is not strictly applicable to machine-learning algorithms like Maxent. We compare models chosen under each selection approach with those corresponding to Maxent default settings, both with and without spatial filtering of occurrence records to reduce the effects of sampling bias. Both selection approaches chose simpler models than those made using default settings. Furthermore, the approaches converged on a similar answer when sampling bias was taken into account, but differed markedly with the unfiltered occurrence data. Specifically, for that dataset, the models selected by AICc had substantially fewer parameters than those identified by performance on withheld data. Based on our knowledge of the study species, models chosen under both AICc and withheld-data-selection showed higher ecological plausibility when combined with spatial filtering. The results for this species intimate that AICc may consistently select models with fewer parameters and be more robust to sampling bias. To test these hypotheses and reach general conclusions, comprehensive research should be undertaken with a wide variety of real and simulated species. Meanwhile, we recommend that researchers assess the critical yet underappreciated issue of model complexity both via information criteria and performance on withheld data, comparing the results between the two approaches and taking into account ecological plausibility.
Bibliographical noteFunding Information:
Acknowledgements – Robert A. Boria, Ana C. Carnaval, Maria Gavrutenko, Beth Gerstner, Michael J. Hickerson, Jamie M. Kass, and Mariano Soley-Guardia provided feedback and insight during the course of this project. Funding – This research was made possible by funding from the National Science Foundation grants DEB-1119918 to SAJ, and DEB-1119915 to RPA, including a Research Experiences for Undergraduates supplement to support BA. BA obtained additional support from the City College Academy for Professional Preparation). RM was supported by NSF DBI-1401312. Conflicts of interest – The authors declare no conflicts of interest in this submission.