< Back to previous page

Publication

Selection and enhancement of Gabor filters for automatic speech recognition

Journal Contribution - Journal Article

Motivated by neurophysiological studies, the use of Gabor filters as acoustic feature extractors for speech recognition purposes has received increasing attention in the new millenium. As the optimal parametrization of these filters is not obvious, many researchers employ different feature selection methods to find the best filter set. In this study, however, we argue that these kinds of feature selection methods cannot fulfill this task, as we demonstrate this with results obtained from experiments. We show that one can easily construct a better filter set manually, using simple heuristic rules. Then, as an alternative to the usual filter selection methods, we propose a training method that can jointly optimize the spectro-temporal filters and the neural net acoustic model built on them. In this special neural network achitecture, the filters are incorporated into the network and employed as the lowest layer of it. This allows us to tune the filters using backpropagation, and to manipulate them directly and not through their parameters. This method also has the advantage of reducing the task of filter set enhancement to that of a simple neural net training. Next, we show that we can enhance our manually selected filter set with this novel neural net architecture using the filter coefficients as initial values for the backpropagation training. The resulting filter sets were evaluated on the phone recognition task of the TIMIT corpus, using both clean and noise contaminated data; while cross-database phone recognition performance was evaluated on the “Szeged” Hungarian broadcast news database. The results we get demonstrate that the proposed filter optimization algorithm can outperform the usual feature selection-based methods, and that the filter set obtained by fine tuning the manual filters with the neural net algorithm performs even better, beating all the other methods in terms of performance.
Journal: International Journal of Speech Technology
ISSN: 1381-2416
Issue: 1
Volume: 18
Pages: 1 - 16
Publication year:2015
Accessibility:Open