Abstract
|
In this paper, we analyze the application of the sparse representation of frames of the speech signal for the speaker verification. It is lately shown that Sparse Representation Classification (SRC), is promising for speaker recognition. We bring evidence that the frame level sparse representation classification resembles process of speech recognition in human sensory system. Since the recognition of different voices (noises) helps individuals to immediately distinguish between the noise and the original speech signal, a noise aware system was designed. As a principal in the sparse representation, we argued the mutual coherence of the dictionary columns, called dictionary atoms, which is not efficiently considered in the already published SRC base speaker verification researches. To suppress the mutual coherence, we use a dictionary learning method to construct a dictionary with effective atoms. Our proposed Frame Level Sparse Representation Classification (FSRC), provides new insights to the SRC based speaker verification. We demonstrate that, in the SRC based speaker verification, using a dictionary whose atoms are orthogonal can be more extensible than a dictionary whose atoms are highly correlated, and that the mutual coherence suppression is even more effective than imposing strong orthogonality on the dictionary atoms. We consider the performance of state-of-the-art speaker recognition systems and the proposed method on NIST SRE 2004 data. Experimental results show that in comparison to baseline methods, when we have enough amount of information in the registration of targets, the proposed method improves the performance of speaker verification system in noisy conditions.
|