Static and dynamic information derived from source and system features for person recognition from humming

Hemant A. Patil, Maulik C. Madhavi, Keshab K Parhi

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

In this paper, hum of a person (instead of normal speech) is used to design a voice biometric system for person recognition. In addition, a recently proposed static feature set, viz., Variable length Teager energy based Mel Frequency Cepstral Coefficients (VTMFCC), is found to capture source-like information of a hum signal. Effectiveness of VTMFCC over linear prediction (LP) residual to capture the complementary information than MFCC is demonstrated in a hum signal. Person recognition performance is found to be better when a score-level fusion is used by combining evidences from static and dynamic features for MFCC (system) and VTMFCC (source-like) features than MFCC alone. Experiments are validated on two types of dynamic features, viz., delta cepstrum and shifted delta cepstrum. In addition, for score-level fusion using static and dynamic features % identification rate and % Equal Error Rate are observed to outperform by 7.9 % and 0.27 %, respectively than MFCC alone. Furthermore, we have observed that person recognition system gives better performance for larger frame duration 69.6 ms as opposed to traditional 10-30 ms frame duration.

Original languageEnglish (US)
Pages (from-to)393-406
Number of pages14
JournalInternational Journal of Speech Technology
Volume15
Issue number3
DOIs
StatePublished - Sep 1 2012

    Fingerprint

Keywords

  • Delta and shifted delta features
  • Humming
  • Polynomial classifier
  • Score-level fusion
  • VTMFCC

Cite this