A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

Verfasser / Beitragende:
[Karthik Devarajan, Guoli Wang, Nader Ebrahimi]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 99/1(2015-04-01), 137-163
Format:
Artikel (online)
ID: 605478430
LEADER caa a22 4500
001 605478430
003 CHVBK
005 20210128100405.0
007 cr unu---uuuuu
008 210128e20150401xx s 000 0 eng
024 7 0 |a 10.1007/s10994-014-5470-z  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-014-5470-z 
245 0 2 |a A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing  |h [Elektronische Daten]  |c [Karthik Devarajan, Guoli Wang, Nader Ebrahimi] 
520 3 |a Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix $$V$$ V into the product of two nonnegative matrices, $$W$$ W and $$H$$ H , such that $$V \sim WH$$ V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for $$W$$ W and $$H$$ H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. 
540 |a The Author(s), 2014 
690 7 |a Nonnegative matrix factorization  |2 nationallicence 
690 7 |a Probabilistic latent semantic indexing  |2 nationallicence 
690 7 |a Renyi's divergence  |2 nationallicence 
690 7 |a $$\lambda $$ λ -log-likelihood  |2 nationallicence 
690 7 |a EM algorithm  |2 nationallicence 
690 7 |a Biomedical informatics  |2 nationallicence 
700 1 |a Devarajan  |D Karthik  |u Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, Temple University Health System, 19111, Philadelphia, PA, USA  |4 aut 
700 1 |a Wang  |D Guoli  |u 3M Health Information Systems, 20814, Bethesda, MD, USA  |4 aut 
700 1 |a Ebrahimi  |D Nader  |u Division of Statistics, Northern Illinois University, 60115, DeKalb, IL, USA  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 99/1(2015-04-01), 137-163  |x 0885-6125  |q 99:1<137  |1 2015  |2 99  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-014-5470-z  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-014-5470-z  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Devarajan  |D Karthik  |u Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, Temple University Health System, 19111, Philadelphia, PA, USA  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Wang  |D Guoli  |u 3M Health Information Systems, 20814, Bethesda, MD, USA  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Ebrahimi  |D Nader  |u Division of Statistics, Northern Illinois University, 60115, DeKalb, IL, USA  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 99/1(2015-04-01), 137-163  |x 0885-6125  |q 99:1<137  |1 2015  |2 99  |o 10994