Abstract
In this paper, hum of a person (instead of normal speech) is used to design a voice biometric system for person recognition. In addition, a recently proposed static feature set, viz., Variable length Teager energy based Mel Frequency Cepstral Coefficients (VTMFCC), is found to capture source-like information of a hum signal. Effectiveness of VTMFCC over linear prediction (LP) residual to capture the complementary information than MFCC is demonstrated in a hum signal. Person recognition performance is found to be better when a score-level fusion is used by combining evidences from static and dynamic features for MFCC (system) and VTMFCC (source-like) features than MFCC alone. Experiments are validated on two types of dynamic features, viz., delta cepstrum and shifted delta cepstrum. In addition, for score-level fusion using static and dynamic features % identification rate and % Equal Error Rate are observed to outperform by 7.9 % and 0.27 %, respectively than MFCC alone. Furthermore, we have observed that person recognition system gives better performance for larger frame duration 69.6 ms as opposed to traditional 10-30 ms frame duration.
Original language | English (US) |
---|---|
Pages (from-to) | 393-406 |
Number of pages | 14 |
Journal | International Journal of Speech Technology |
Volume | 15 |
Issue number | 3 |
DOIs | |
State | Published - Sep 2012 |
Bibliographical note
Copyright:Copyright 2012 Elsevier B.V., All rights reserved.
Keywords
- Delta and shifted delta features
- Humming
- Polynomial classifier
- Score-level fusion
- VTMFCC