Prof. Dr. Felix Naumann


This corpus contains the documents used for training and testing our company focused named entity recognition system. It contains records for 1,000 documents presented in a JSON format and is structured as follows for each article:

  • annotations - the companies we annotated within the article
  • url - the url where the article can be found
  • title - the title of the article

For legal reasons, we cannot provide the text of the articles.