The availability of large volumes of granted patent documents and patent applications, all publicly available on the Web, enables the use of sophisticated text mining and information retrieval methods to facilitate access and analysis of patents. A key task when dealing with patents is to find related or similar patents. This usually requires domain experts who are also familiar with existing patents. In this paper, we investigate techniques to automatically assess the similarity of patents, which is critical for a variety of patent-related tasks. We propose the use of latent Dirichlet allocation and Dirichlet multinomial regression to represent documents and to compute similarity scores. We show how these scores can be used to provide assistance in typical patent mining scenarios such as prior art recommendation and citation prediction. We compare our methods with state-of-the-art document representations and retrieval techniques and demonstrate the effectiveness of our approach on a collection of US patent publications.
Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).
Our work on Measuring and Comparing Dimensionality Reduction Algorithms for Robust Visualisation of Dynamic Text Collections will be presented at CHIIR 2021.
I added some photos from my trip to Hildesheim.