TY - JOUR
T1 - Static and dynamic information derived from source and system features for person recognition from humming
AU - Patil, Hemant A.
AU - Madhavi, Maulik C.
AU - Parhi, Keshab K.
PY - 2012/9
Y1 - 2012/9
N2 - In this paper, hum of a person (instead of normal speech) is used to design a voice biometric system for person recognition. In addition, a recently proposed static feature set, viz., Variable length Teager energy based Mel Frequency Cepstral Coefficients (VTMFCC), is found to capture source-like information of a hum signal. Effectiveness of VTMFCC over linear prediction (LP) residual to capture the complementary information than MFCC is demonstrated in a hum signal. Person recognition performance is found to be better when a score-level fusion is used by combining evidences from static and dynamic features for MFCC (system) and VTMFCC (source-like) features than MFCC alone. Experiments are validated on two types of dynamic features, viz., delta cepstrum and shifted delta cepstrum. In addition, for score-level fusion using static and dynamic features % identification rate and % Equal Error Rate are observed to outperform by 7.9 % and 0.27 %, respectively than MFCC alone. Furthermore, we have observed that person recognition system gives better performance for larger frame duration 69.6 ms as opposed to traditional 10-30 ms frame duration.
AB - In this paper, hum of a person (instead of normal speech) is used to design a voice biometric system for person recognition. In addition, a recently proposed static feature set, viz., Variable length Teager energy based Mel Frequency Cepstral Coefficients (VTMFCC), is found to capture source-like information of a hum signal. Effectiveness of VTMFCC over linear prediction (LP) residual to capture the complementary information than MFCC is demonstrated in a hum signal. Person recognition performance is found to be better when a score-level fusion is used by combining evidences from static and dynamic features for MFCC (system) and VTMFCC (source-like) features than MFCC alone. Experiments are validated on two types of dynamic features, viz., delta cepstrum and shifted delta cepstrum. In addition, for score-level fusion using static and dynamic features % identification rate and % Equal Error Rate are observed to outperform by 7.9 % and 0.27 %, respectively than MFCC alone. Furthermore, we have observed that person recognition system gives better performance for larger frame duration 69.6 ms as opposed to traditional 10-30 ms frame duration.
KW - Delta and shifted delta features
KW - Humming
KW - Polynomial classifier
KW - Score-level fusion
KW - VTMFCC
UR - http://www.scopus.com/inward/record.url?scp=84864592537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864592537&partnerID=8YFLogxK
U2 - 10.1007/s10772-012-9161-5
DO - 10.1007/s10772-012-9161-5
M3 - Article
AN - SCOPUS:84864592537
SN - 1381-2416
VL - 15
SP - 393
EP - 406
JO - International Journal of Speech Technology
JF - International Journal of Speech Technology
IS - 3
ER -