Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

This page presents some educational case studies in practical Twitter mining by combining millions of tweets with different data sources to gain new insights, information, and political opinions.

Case 1 - Tracking Locations of Candidates during the U.S. Election Campaigns

Four trails extracted for Februrary 29th 2016
Google Maps overview of the four trails of Donald J. Trump (red), Hillary Rodham Clinton (blue), Bernie Sanders (green), and Ted Cruz (yellow) one day prior to the Super Tuesday 2016.

The goal of this case study is to track the physical locations of U.S. presidential candidates during the U.S. election campaigns based on twitter data. To do so, we make use of tweet content (i.e., not the geolocation) and analyze it for the location of the politicians (and in the future possibly other high profile users). The following results are based on the 'U.S. Candidates on Twitter' dataset.

The applied method is described in more details in the publication "What was Hillary Clinton doing in Katy, Texas?". The results contain the daily politician-location-pairs and the IDs of all found tweets referring to this location fact. On the right side, you can see an excerpt of the results for four politicians from February 29th 2016. The estimated times are based on EST. The complete dataset can be found here.

Datasets

To retrieve relevant tweets for our datasets, we were constantly harvesting Twitter using the Public API. Because the API is limited to a maximum of 1% of the overall traffic on Twitter, we have to make sure to retrieve as many relevant tweets, without exceeding the rate limits.

U.S. Candidates on Twitter

Due to the  the 1% traffic limit of the Public Twitter API, we had to make sure retrieve as many relevant tweets for the candidates, without exceeding the limit. Hence, a query term like Ben for the candidate Ben Carson is not appropriate, because it yields to many false positive tweets. To maintain the rate limits, we manually selected a set of 241 queries. Based on these queries, we collected a set of over 975M tweets by over 31M users mentioning candidates and other persons relevant for the U.S. presidential election during the 15-month period starting on November 2015 ending in February 2017 after the inauguration of Donald J. Trump as the 45th President of the United States.

An overview of the number of daily extracted tweets can is depicted on the right side. The number of collected tweets continuously grows with peaks around Super Tuesday 2016 , the Republican and Democratic National Conventions, the presidential debates, until the election day, followed by a decline that leads to another local maximum around the inauguration day of Donald J. Trump.

Publications

What was Hillary Clinton doing in Katy, Texas?

Gruetze, Toni and Krestel, Ralf and Lazaridou, Konstantina and Naumann, Felix
In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 3-7 April, 2017, 2017 ACM.
hpi.de/fileadmin/user_upload/fachgebiete/naumann/publications/2017/wwhcdikt_www.pdf
accepted

Abstract:

During the last presidential election in the United States of America, Twitter drew a lot of attention. This is because many leading persons and organizations, such as U.S. president Donald J. Trump, showed a strong affection to this medium. In this work we neglect the political contents and opinions shared on Twitter and focus on the question: Can we determine and track the physical location of the presidential candidates based on posts in the Twittersphere?

BibTeX file

@inproceedings{GruetzeClintonTexas2017,
author = { Gruetze, Toni and Krestel, Ralf and Lazaridou, Konstantina and Naumann, Felix },
title = { What was Hillary Clinton doing in Katy, Texas? },
year = { 2017 },
month = { 0 },
abstract = { During the last presidential election in the United States of America, Twitter drew a lot of attention. This is because many leading persons and organizations, such as U.S. president Donald J. Trump, showed a strong affection to this medium. In this work we neglect the political contents and opinions shared on Twitter and focus on the question: Can we determine and track the physical location of the presidential candidates based on posts in the Twittersphere? },
url = { hpi.de/fileadmin/user_upload/fachgebiete/naumann/publications/2017/wwhcdikt_www.pdf },
publisher = { {ACM} },
booktitle = { Proceedings of the 26th International Conference on World Wide Web, {WWW} 2017, Perth, Australia, 3-7 April, 2017 },
priority = { 0 }
}

Copyright Notice

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

last change: Thu, 02 Mar 2017 12:01:25 +0100