Abstract
Directional statistics deals with data that can be naturally expressed in the form of vector directions. The von Mises-Fisher distribution is one of the most fundamental parametric models to describe directional data. Mixtures of von Mises-Fisher distributions represent a popular approach to handling heterogeneous populations. However, components of such models can be affected by the presence of mild outliers or cluster tails heavier than what can be accommodated by means of a von Mises-Fisher distribution. To relax these model limitations, a mixture of contaminated von Mises-Fisher distributions is proposed. The performance of the proposed methodology is tested on synthetic data and applied to text and genetics data. The obtained results demonstrate the importance of the proposed procedure and its superiority over the traditional mixture of von Mises-Fisher distributions in the presence of heavy tails.
Original language | English (US) |
---|---|
Pages (from-to) | 527-551 |
Number of pages | 25 |
Journal | Journal of Classification |
Volume | 40 |
Issue number | 3 |
DOIs | |
State | Published - Nov 2023 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2023, The Author(s) under exclusive licence to The Classification Society.
Keywords
- Directional data
- EM algorithm
- Mixture model
- Von Mises-Fisher distribution