< Back to previous page
NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain
Journal Contribution - Journal Article
Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.
Journal: EURASIP Journal on Audio Speech, and Music Processing
Number of pages: 1
Keywords:Electrical & electronic engineering, Classical physics