Statistical word sense aware topic models

Verfasser / Beitragende:
[Guoyu Tang, Yunqing Xia, Jun Sun, Min Zhang, Thomas Zheng]
Ort, Verlag, Jahr:
2015
Enthalten in:
Soft Computing, 19/1(2015-01-01), 13-27
Format:
Artikel (online)
ID: 605468265
LEADER caa a22 4500
001 605468265
003 CHVBK
005 20210128100315.0
007 cr unu---uuuuu
008 210128e20150101xx s 000 0 eng
024 7 0 |a 10.1007/s00500-014-1372-z  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s00500-014-1372-z 
245 0 0 |a Statistical word sense aware topic models  |h [Elektronische Daten]  |c [Guoyu Tang, Yunqing Xia, Jun Sun, Min Zhang, Thomas Zheng] 
520 3 |a LDA has been proved effective in modeling the semantic relation between surface words. This semantic information in the document collection is useful to measure the topic distribution for a document. In general, a surface word may significantly contribute to several topics in a document collection. LDA measures the contribution of a surface word to each topic and considers a surface word to be identical across all documents. However, a surface word may present different signatures in different contexts, i.e., polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the baselines significantly in document clustering and improves the word sense induction as well against a standalone non-parametric model. 
540 |a Springer-Verlag Berlin Heidelberg, 2014 
690 7 |a Topic modeling  |2 nationallicence 
690 7 |a Word sense induction  |2 nationallicence 
690 7 |a Document representation  |2 nationallicence 
690 7 |a Document clustering  |2 nationallicence 
700 1 |a Tang  |D Guoyu  |u Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China  |4 aut 
700 1 |a Xia  |D Yunqing  |u Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China  |4 aut 
700 1 |a Sun  |D Jun  |u Institute for Infocomm Research, A-STAR, Singapore, Singapore  |4 aut 
700 1 |a Zhang  |D Min  |u Soochow University, Suzhou, China  |4 aut 
700 1 |a Zheng  |D Thomas  |u Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China  |4 aut 
773 0 |t Soft Computing  |d Springer Berlin Heidelberg  |g 19/1(2015-01-01), 13-27  |x 1432-7643  |q 19:1<13  |1 2015  |2 19  |o 500 
856 4 0 |u https://doi.org/10.1007/s00500-014-1372-z  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s00500-014-1372-z  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Tang  |D Guoyu  |u Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Xia  |D Yunqing  |u Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Sun  |D Jun  |u Institute for Infocomm Research, A-STAR, Singapore, Singapore  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Zhang  |D Min  |u Soochow University, Suzhou, China  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Zheng  |D Thomas  |u Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Soft Computing  |d Springer Berlin Heidelberg  |g 19/1(2015-01-01), 13-27  |x 1432-7643  |q 19:1<13  |1 2015  |2 19  |o 500