Learning the Importance of Latent Topics to Discover Highly Influential News Items
Online news is a major source of information for many people. The overwhelming
amount of new articles published every day makes it necessary to filter out
unimportant ones and detect ground breaking new articles.
In this paper, we propose the use of Latent Dirichlet Allocation (LDA) to
find the hidden factors of important news stories. These factors are then
used to train a Support Vector Machine (SVM) to classify new news items as
they appear. We compare our results with SVMs based on a bag-of-words
approach and other language features. The advantage of a LDA processing is
not only a better accuracy in predicting important news, but also a better
interpretability of the results. The latent topics show directly the
important factors of a news story.