Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Description

This corpus contains the documents used for training and testing our company focused named entity recognition system. It contains records for 1,000 documents presented in a JSON format and is structured as follows for each article:

  • annotations - the companies we annotated within the article
  • url - the url where the article can be found
  • title - the title of the article

For legal reasons, we cannot provide the text of the articles.

Download