| Research > Audio content analysis |
IntroductionDevelopments in Internet and broadcast technology enable users to enjoy large amounts of multimedia content. With this rapidly increasing amount of data, users require automatic methods to filter, process and store incoming data. Some of these functions will be aided by attached metadata, which provide information about the content. However, due to the fact that metadata are not always provided, and because local processing power has increased tremendously, interest in local automatic multimedia analysis has increased. A major challenge in this field is the automatic classification of audio and music than building new classification schemes. Most audio classification systems combine two processing stages: feature extraction followed by classification. A variety of signal features have been used for this purpose, including low-level parameters such as the zero-crossing rate, signal bandwidth, spectral centroid, and signal energy. Another set of features used, inherited from automatic speech recognizers, is the set mel-frequency cepstral coefficients (MFCC). Several different classification strategies have been employed in the past, including multivariate Gaussian models, Gaussian mixture models, self-organizing maps, neural networks, k-nearest neighbor schemes and hidden Markov models. In some cases, the the classification scheme does not influence the classification accuracy, suggesting that the topology of the feature space is relatively simple. An important implication of these findings is that, perhaps further advances could be made by developing more powerful features or at least understanding the feature space, rather than building new classification schemes. More informationProc. ISMIR (Baltimore, 2003) Algorithms in ambient intelligence, Kluwer, 2004 (c) 2007 www.jeroenbreebaart.com |