Introduction
Many people have always been interested in celebrities. News and reports, magazines and books are dealing with life and achievements of persons who are well-known in domains such as sports, science, music, film, television, etc.
Celebrities are perfect candidates for advertisement companies. They want to employ celebrities for marketing and reputation reasons to create relationships between well-known persons and the products they want to sell. Everyone knows TV ads, where a testimonial recommends a certain product. The concept of linking a person to a product works only when the connection seems plausible and in addition nobody wants their product to be linked to a well-known but poorly prestigious person.
Connections between two celebrities and those between celebrities and companies are influencing their respective image. Information about these connections is readily available on the web in the form of news and gossip. We plan to build a system that automatically harvests such information from the web and presents it in an intuitively understandable way.
Description
The goal of the project is to harvest web data to add connections between people and/or companies into a given database of German celebrities. The project partner provides a database with basic information about many well-known people in Germany. This version of the database is populated manually and it only contains factual information about celebrities. During the project this database will be expanded automatically in order to represent connections of the celebrities. Key tasks of the project contain:
- Data collection: News pages offer much information concerning celebrities. Crawling them is essential for information retrieval and extracting data.
- Data cleansing: News pages contain many boilerpipe around the main content. It is important to distinguish the real content from advertisements and linked information.
- Entity linking: Texts are scanned for celebrities and companies and these get linked to the associated entities in the database. Personal pronouns are also resolved to their entities.
- Relationship extraction: By analyzing word clouds around entities, relationships between two celebrities and between celebrities and companies are found and added to the database.
- Relationship visualization: The relationships found are visualized in a graphical way to present connections and relations between entities in the database.
Supervision
The project runs from 01 October 2012 to 05 July 2013 and is advised by Prof. Dr. Felix Naumann and Dr. Gjergji Kasneci.
For further information about the project please reach out to: gjergji.kasneci@hpi.uni-potsdam.de
Cooperation
VIP 2.0 Celebrity Exploration is a joint project between the Hasso Plattner Institute and cpi Celebrity Performance GmbH. CPI has developed a unique index to evaluate celebrities based on their media impact potential.
On the one side CPI supports marketing decision-makers in finding the perfect testimonial when marketing new products. On the other side, they offer celebrities and their management the opportunity to make all relevant information on their public perception and advertising effect potential available, in order to improve self-marketing strategies. For the perfect result CPI wants to develop a new research method that combines available information from the Web with accurate web-based-analysis on the perceived image of celebrities.