Probabilistic combination of classification rules and its application to medical diagnosis

Verfasser / Beitragende:
[Jakub Tomczak, Maciej Zięba]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 101/1-3(2015-10-01), 105-135
Format:
Artikel (online)
ID: 605477957
LEADER caa a22 4500
001 605477957
003 CHVBK
005 20210128100403.0
007 cr unu---uuuuu
008 210128e20151001xx s 000 0 eng
024 7 0 |a 10.1007/s10994-015-5508-x  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-015-5508-x 
245 0 0 |a Probabilistic combination of classification rules and its application to medical diagnosis  |h [Elektronische Daten]  |c [Jakub Tomczak, Maciej Zięba] 
520 3 |a Application of machine learning to medical diagnosis entails facing two major issues, namely, a necessity of learning comprehensible models and a need of coping with imbalanced data phenomenon. The first one corresponds to a problem of implementing interpretable models, e.g., classification rules or decision trees. The second issue represents a situation in which the number of examples from one class (e.g., healthy patients) is significantly higher than the number of examples from the other class (e.g., ill patients). Learning algorithms which are prone to the imbalance data return biased models towards the majority class. In this paper, we propose a probabilistic combination of soft rules, which can be seen as a probabilistic version of the classification rules, by introducing new latent random variable called conjunctive feature. The conjunctive features represent conjunctions of values of attribute variables (features) and we assume that for given conjunctive feature the object and its label (class) become independent random variables. In order to deal with the between class imbalance problem, we present a new estimator which incorporates the knowledge about data imbalanceness into hyperparameters of initial probability of objects with fixed class labels. Additionally, we propose a method for aggregating sufficient statistics needed to estimate probabilities in a graph-based structure to speed up computations. At the end, we carry out two experiments: (1) using benchmark datasets, (2) using medical datasets. The results are discussed and the conclusions are drawn. 
540 |a The Author(s), 2015 
690 7 |a Probabilistic combination  |2 nationallicence 
690 7 |a Classification rules  |2 nationallicence 
690 7 |a Imbalanced data  |2 nationallicence 
690 7 |a Medical diagnosis  |2 nationallicence 
700 1 |a Tomczak  |D Jakub  |u Department of Computer Science, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland  |4 aut 
700 1 |a Zięba  |D Maciej  |u Department of Computer Science, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 101/1-3(2015-10-01), 105-135  |x 0885-6125  |q 101:1-3<105  |1 2015  |2 101  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-015-5508-x  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-015-5508-x  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Tomczak  |D Jakub  |u Department of Computer Science, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Zięba  |D Maciej  |u Department of Computer Science, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 101/1-3(2015-10-01), 105-135  |x 0885-6125  |q 101:1-3<105  |1 2015  |2 101  |o 10994