Prof. Dr. h.c. mult. Hasso Plattner

Real-time Analysis of Public Transport Social Media Data

Despite careful planning, disruption of local public transport services is unavoidable. Unexpected incidents, such as medical emergencies or police interventions, can cause blockings or long delays. For a number of years, the S-Bahn Berlin has been using the short message service Twitter to inform its passenger about travel events. Our S-Bahn Analyzer app extracts relevant events in the S-Bahn Berlin rail network from the Twitter messages. Relevant events details are, amongst others, affected lines, reasons, and involved stations. Thus, our S-Bahn Analyzer provides powerful exploration methods to specific user groups.

The Hasso Plattner Institute (HPI) is working together with the S-Bahn Berlin to make precise information available to passengers regarding current traffic situations before and during their journey. By analyzing the Twitter messages, it can be determined how frequently which events have occurred at what time and location. Apart from better planning of the public transport services, it also permits actual infrastructural interventions.


Since mid 2013 more than 25k Tweets were published. These human-readable short messages are transformed into machine-readable data using latest text-mining algorithms. Messages are classified if they relate to the same event. Events are the basis for more detailed analysis.




Aggregated statistics about reasons, involved train lines, and involved stations provide a general understanding of the stability of the rail network. For example, the identification of most affected stations can help to detect hot spots in the rail network.



Interactive Visualization Tool

Our S-Bahn Analyzer provides an interactive data exploration tool. Thus, more detailed analysis are possible, e.g. to answer specific hypotheses of service planners, maintainers, or service operators. The analysis includes the complete history of events and the user can filter the data using individual criteria. As there is no dedicated database expertise required, even inexperienced users can gain new insights easily.

Through the real-time analysis of historic event data, the duration and impact of a current event can be predicted. For example, the following diagram helps to predict the duration of a current event at Station “Schöneweide” based on details about historic events. This builds the foundation for timely and more accurate communication of the expected impact to passengers, which is crucial to identify appropriate alternatives at an early stage.



Real-time Event Map

Passengers get a real-time overview of the complete rail network when accessing our S-Bahn Analyzer prior to traveling. It shows latest events and therefore help to select an appropriate route based on the current network state. The map can also be used to explore historic events in a time lapse resulting in an animation of all events.



The In-Memory Database (IMDB) technology builds the technology foundation of the S-Bahn analyzer. It has proved to be particularly valuable in the analysis of large data volume. We use it as a data integration platform for the extraction of relevant information from unstructured twitter news. It also builds the basis for real-time analyses of unstructured text documents, their combination with historical data and their interactive exploration. Relevant text entities are extracted from the Tweet messages using regular expressions and predefined dictionaries. Afterwards, a normalization of extracted entities is performed to eliminate eventual spelling mistakes. In addition, we perform a fuzzy search to include relevant search results even if they do not match completely with the search term.