Efficient implementation techniques of an SVM-based speech/music classifier in SMV

Verfasser / Beitragende:
[Chungsoo Lim, Joon-Hyuk Chang]
Ort, Verlag, Jahr:
2015
Enthalten in:
Multimedia Tools and Applications, 74/15(2015-08-01), 5375-5400
Format:
Artikel (online)
ID: 60544790X
LEADER caa a22 4500
001 60544790X
003 CHVBK
005 20210128100134.0
007 cr unu---uuuuu
008 210128e20150801xx s 000 0 eng
024 7 0 |a 10.1007/s11042-014-1859-8  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s11042-014-1859-8 
245 0 0 |a Efficient implementation techniques of an SVM-based speech/music classifier in SMV  |h [Elektronische Daten]  |c [Chungsoo Lim, Joon-Hyuk Chang] 
520 3 |a For real-time speech and audio encoders used in various multimedia applications, low-complexity encoding algorithms are required. Indeed, accurate classification of input signals is the key prerequisite for variable bit rate encoding, which has been introduced in order to effectively utilize limited communication bandwidth. This paper investigates implementation issues with a support vector machine (SVM)-based speech/music classifier in the selectable mode vocoder (SMV) framework, which is a standard codec adopted by the Third-Generation Partnership Project 2 (3GPP2). While a support vector machine is well known for its superior classification capability, it is accompanied by a high computational cost. In order to achieve a more realizable system, we propose two techniques for the SVM-based speech/music classifier, aimed at reducing the number of classification requests to the classifier. The first technique introduces a simpler classifier that processes some of the input frames instead of the SVM-based classifier, and the second technique skips a portion of input frames based on strong inter-frame correlation in speech and music frames. Our experimental results show that the proposed techniques can reduce the computational cost of the SVM-based classifier by 95.4 % with negligible performance degradation, making it plausible for integration into the SMV codec. 
540 |a Springer Science+Business Media New York, 2014 
690 7 |a Speech/music classification  |2 nationallicence 
690 7 |a Support vector machine  |2 nationallicence 
690 7 |a Selectable mode vocoder  |2 nationallicence 
690 7 |a Embedded system  |2 nationallicence 
700 1 |a Lim  |D Chungsoo  |u Korea National University of Transportation, 50 Daehak-ro, Chungbuk, Choungju-si, Republic of Korea  |4 aut 
700 1 |a Chang  |D Joon-Hyuk  |u Hanyang University, 222 Wangsimni-ro, Seoul, Seongdong, Republic of Korea  |4 aut 
773 0 |t Multimedia Tools and Applications  |d Springer US; http://www.springer-ny.com  |g 74/15(2015-08-01), 5375-5400  |x 1380-7501  |q 74:15<5375  |1 2015  |2 74  |o 11042 
856 4 0 |u https://doi.org/10.1007/s11042-014-1859-8  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s11042-014-1859-8  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Lim  |D Chungsoo  |u Korea National University of Transportation, 50 Daehak-ro, Chungbuk, Choungju-si, Republic of Korea  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Chang  |D Joon-Hyuk  |u Hanyang University, 222 Wangsimni-ro, Seoul, Seongdong, Republic of Korea  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Multimedia Tools and Applications  |d Springer US; http://www.springer-ny.com  |g 74/15(2015-08-01), 5375-5400  |x 1380-7501  |q 74:15<5375  |1 2015  |2 74  |o 11042