I found an interesting paper today on Probabilistic Latent Semantic Indexing.
"We have presented a novel method for automated indexing based on a statistical latent class model. This approach has important theoretical advantages over standard LSI, since it is based on the likelihood principle, defines a generative data hmodel, and directly minimizes word perplexity. It can also take advantage of statistical standard methods for model fitting, overfitting control, and model combination. The empirical evaluation has clearly confirmed the benefits of Probabilistic Latent Semantic Indexing which achieves significant gains in precision over both, standard term matching and LSI."
This paper even provides a convenient algorithm to use. Built into the parser I am working on, some interesting results should be obtained.
Remember Me
Powered by: newtelligence dasBlog 1.8.5223.1
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
E-mail
Theme design by Jelle Druyts