Abstract
In many applications, multilabel classification involves time-series predictors, as in multilabel video classification. How to account for the temporal dependencies with respect to input variables remains an issue, especially in action learning from videos.Motivated by the problem of video categorization and captioning, we propose a nonlinear multilabel classifier based on a hiddenMarkovmodel and a weighted loss separating false positive and negative classification errors. This allows us to account for label dependence and temporal dependencies of input variables in classification. Computationally, we derive a decomposable algorithm based on block-wise coordinate descent for non-convex minimization, where it permits not only to block-wise updates but also label-wise updates, leading to scalable computation. Theoretically, we derive the Bayes rule and prove that the proposedmethod consistently recovers the optimal performance of the Bayes rule. In simulations, the proposed method compares favorably with its competitors ignoring either label dependence or time-dependence. Finally, the utility of the proposed method is demonstrated by an application to ActivityNet Captions dataset for understanding a video's contents.
Original language | English (US) |
---|---|
Pages (from-to) | 5696-5705 |
Number of pages | 10 |
Journal | IEEE Transactions on Signal Processing |
Volume | 68 |
DOIs | |
State | Published - 2020 |
Bibliographical note
Publisher Copyright:© 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Keywords
- Hidden markov models
- Label dependence
- Nonconvex minimization
- Scalability
- Video sequence