A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization

Verfasser / Beitragende:
[Nurfadhlina Sharef, Trevor Martin, Khairul Kasmiran, Aida Mustapha, Md. Sulaiman, Masrah Azmi-Murad]
Ort, Verlag, Jahr:
2015
Enthalten in:
Soft Computing, 19/6(2015-06-01), 1701-1714
Format:
Artikel (online)
ID: 605468605
LEADER caa a22 4500
001 605468605
003 CHVBK
005 20210128100317.0
007 cr unu---uuuuu
008 210128e20150601xx s 000 0 eng
024 7 0 |a 10.1007/s00500-014-1358-x  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s00500-014-1358-x 
245 0 2 |a A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization  |h [Elektronische Daten]  |c [Nurfadhlina Sharef, Trevor Martin, Khairul Kasmiran, Aida Mustapha, Md. Sulaiman, Masrah Azmi-Murad] 
520 3 |a Several methods have been studied in text categorization and mostly are inspired by the statistical distribution features in the texts, such as the implementation of Machine Learning (ML) methods. However, there is no work available that investigates the performance of ML-based methods against the text expression-based method, especially for incident and medical case categorization. Meanwhile, these two domains are becoming ever more popular, due to a growing interest of automation in security intelligence and health services. This paper presents a text expression-based method called Evolving Fuzzy Grammar (EFG) and evaluates its performance against the conventional ML methods of Naïve Bayes, support vector machine, $$k$$ k -nearest neighbor, adaptive booting, and decision tree. The incident dataset used is a real dataset that was taken from the World Incidents Tracking System, while ImageCLEF 2009 was used as the source for radiology case reports. The results suggested variations of strength and weakness of each method in both categorization tasks, where a standard evaluation technique (i.e., recall, precision, and $$F$$ F -measure) was used. In both domains, the SMO and IBk methods were the best, while AdaBoost was the worst. It was also observed that the medical dataset was easier to categorize than the incident. Although EFG was ranked second lowest, it obtained the highest precision score in the bombing categorization, the highest score in armed attack recall, and was averagely ranked in the top three for the medical case categorization. It was also noted that the text expression-based method used in EFG was the most verbose and expressive, when compared to the ML methods. This indicates that EFG is a viable method in text categorization and may serve as an alternative approach to such a task. 
540 |a Springer-Verlag Berlin Heidelberg, 2014 
690 7 |a Text categorization  |2 nationallicence 
690 7 |a Text expression  |2 nationallicence 
690 7 |a Evolving fuzzy grammar  |2 nationallicence 
690 7 |a Machine learning  |2 nationallicence 
690 7 |a Incidents  |2 nationallicence 
690 7 |a Medical  |2 nationallicence 
700 1 |a Sharef  |D Nurfadhlina  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
700 1 |a Martin  |D Trevor  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
700 1 |a Kasmiran  |D Khairul  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
700 1 |a Mustapha  |D Aida  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
700 1 |a Sulaiman  |D Md  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
700 1 |a Azmi-Murad  |D Masrah  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
773 0 |t Soft Computing  |d Springer Berlin Heidelberg  |g 19/6(2015-06-01), 1701-1714  |x 1432-7643  |q 19:6<1701  |1 2015  |2 19  |o 500 
856 4 0 |u https://doi.org/10.1007/s00500-014-1358-x  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s00500-014-1358-x  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Sharef  |D Nurfadhlina  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Martin  |D Trevor  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Kasmiran  |D Khairul  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Mustapha  |D Aida  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Sulaiman  |D Md  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Azmi-Murad  |D Masrah  |u Faculty of Computer Science and Information Technology, University of Putra Malaysia, 43400, Serdang, Selangor, Malaysia  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Soft Computing  |d Springer Berlin Heidelberg  |g 19/6(2015-06-01), 1701-1714  |x 1432-7643  |q 19:6<1701  |1 2015  |2 19  |o 500