Prof. Dr. Emmanuel Müller


Smart representations (such as embeddings, graphical models, discretizations) are useful models that allow the abstraction of data within a well-defined mathematical formalism. The representations we aim at are conceptual abstractions of real world phenomena (such as sensor reading, causal dependencies, social interactions) into the world of statistics and discrete mathematics in such a way that the powerful tools developed in those areas are available for complex analyses in a simple and elegant manner.  

Usually data is transformed explicitly or implicitly from raw data representation (as it was measured or collected) into a smart data representation (more useful for data analysis). One goal of such smart representations, e.g. with a higher level of abstraction, is to enable the application of data mining techniques and theory developed in different areas. Smart data representations in many cases also induce a reduction of the original data mining problem into a more tractable or more compact problem formulation that can be solved by an algorithm (e.g. with lower worst case complexity, scalable to larger data sizes, more robust to data artifacts, etc.).

In this seminar we will focus on four smart data representations with the aim of understanding the analytical properties of different data mining tasks:

  • representations + similarities of graphs for classification [1][2][3]

  • representations of natural time series baselines for outlier interpretation [4][5]

  • representing dependencies in event series and time series [6][7]

  • representations of similarities in the case of missing values [8][9]

The main focus in each of these three areas will be the understanding and comparison of smart representations and their explicit/implicit data transformation methods. By transforming the data we will study limitations or advantages of each technique and how the data representation changes the problem setup, reduces complexity, introduces robustness, or other valuable properties for big data analytics.


  1. Introduction to the concepts of smart data representations

  2. Understanding limitations and advantages of state-of-the-art techniques

  3. Implementation of techniques in research prototypes

  4. Designing of experiments to prove the effective quality of each technique in a set of traditional tasks where the representation is used

  5. Running the experiments on real and synthetic datasets

  6. Writing and submitting a scientific publication (more information below)

  7. Presentation of scientific results during seminar and potentially at international conferences


Kickoff-Meeting: Monday April 9th 2017 (09:15 - 10:45) in room D-E.9/10


[1] Verma, Saurabh, and Zhi-Li Zhang. "Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs." NIPS, 2017.

[2] Tsitsulin, Anton, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. "VERSE: Versatile Graph Embeddings from Similarity Measures." WWW, 2017.

[3] Yanardag, Pinar, and S. V. N. Vishwanathan. "Deep graph kernels." KDD, 2015.

[4] Sundararajan et al.  "Axiomatic Attribution for Deep Networks" ICLM 2017

[5] Riberio et al., "Why should I trust you?" Explaining the Predictions of Any Classifier, KDD 2016

[6] N. Du, H. Dai et al.: "Recurrent Marked Temporal Point Processes: Embedding Event History to Vector." KDD, 2016.

[7] S. Agrawal, G. Atluri, et al.: "Tripoles: A New Class of Relationships in Time Series Data." KDD, 2017.

[8] S. Günnemann, E. Müller, S. Raubach and T. Seidl: “Flexible Fault Tolerant Subspace Clustering for Data with Missing Values”. IEEE International Conference on Data Mining (ICDM), 2011.

[9] R. Leibrandt and S. Günnemann: “Making Kernel Density Estimation Robust towards Missing Values in Highly Incomplete Multivariate Data without SIAM International Conference on Data Mining (SDM), 2018.