Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Zhe Zuo

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam

Phone: +49 331 5509 177
Fax: +49 331 5509 287
Room: G-3.2.09
Email:  Zhe Zuo

 


Research Interests

  • Information Extraction
  • Text Mining
  • Data Mining

Publications

Improving Company Recognition from Unstructured Text by using Dictionaries

Loster, Michael; Zuo, Zhe; Naumann, Felix; Maspfuhl, Oliver; Thomas, Dirk in 2017 .

While named entity recognition is a much addressed research topic, recognizing companies in text is of particular difficulty. Company names are extremely heterogeneous in structure, a given company can be referenced in many different ways, their names include person names, locations, acronyms, numbers, and other unusual tokens. Further, instead of using the official company name, quite different colloquial names are frequently used by the general public. We present a machine learning (CRF) system that reliably recognizes organizations in German texts. In particular, we construct and employ various dictionaries, regular expressions, text context, and other techniques to improve the results. In our experiments we achieved a precision of 91.11% and a recall of 78.82%, showing significant improvement over related work. Using our system we were able to extract 263,846 company mentions from a corpus of 141,970 newspaper articles.
Further Information
Tags CRF NER companies company_names conditional_random_fields isg named_entity_recognition recognition