BlackSwan (*)
Automated annotation of global statistics
Contact
Johannes Lorey
Organizational Info
- website: http://www.blackswanevents.org
- Monday, 15:15 - 16:45 (exceptions will be announced)
- room A-2.2
- Wiki (internal)
- Dates and Material
Statistical Data
There are numerous sources for statistical data sets: companies, government agencies, international organizations, etc. Statistical data is usually
- of numerical type
- collected in fixed intervals over a certain time period, and
- reveals certain short-term and long-term trends.
Event Data
Event data can be gathered from various (mostly unstructured) sources, such as Wikipedia, Freebase, News Archives, and so on. In the context of this seminar, an event is described by its
- type,
- location, and
- (starting) point in time.
Augmenting Statistical Data
Our goal is to detect certain trends in statistical data and automatically relate them to the historical events that triggered these events based on specific rules previously learned by our system.
Proposed Architecture
The following diagram depicts a possible architecture for the system. The green boxes each represents a specific component a team of students will be working on.
Team structure and tasks
| Event Extractor | Time Series Analyzer | Rule System | Visualizer | |
|---|---|---|---|---|
| Tasks |
Important Dates and Material
| Date | Event | Slides | |
|---|---|---|---|
| |||
23:59 CEST |
| n.a. | |
20:00 CEST |
| n.a. | |
| |||
| |||
| |||
| n.a. | ||
| n.a. | ||
| |||
| |||
| n.a. | ||
| |||
| |||
| n.a. | ||
| n.a. | ||
| n.a. | ||
| n.a. | ||
17:00 |
| ||
| 15.04.11, 15:00 |
|
Literature
Data Mining / Association Rules
- Agrawal et al.: Mining association rules between sets of items in large databases, SIGMOD 1993
- Agrawal et al.: Fast Algorithms for Mining Association Rules, VLDB 1994
- Tan et al.: Selecting the right objective measure for association analysis, Information Systems 2004
- Knorr et al.: Algorithms for Mining Distance-Based Outliers in Large Datasets, VLDB 1998
- Han et al.: Data Mining: Concepts and Techniques, Morgan Kaufmann 2005 (can be found in our group library)
- Segaran: Kollektive Intelligenz: analysieren, programmieren und nutzen, O'Reilly 2008 (can be found in our group library)
- Witten, Frank: Data Mining. Practical Machine Learning Tools and Techniques, Morgan Kaufmann 2005 (can be found in our group library)
Information extraction
- Sarawagi: Information Extraction, Now Publishers Inc. 2008
Statistical analysis
- Adler: R in a nutshell, O'Reilly 2010 (can be found in our group library)
Links
(*) http://en.wikipedia.org/wiki/Black_swan_theory