Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Smart Representations for Big Data Analytics (Sommersemester 2018)

Dozent: , Thomas Goerttler (Knowledge Discovery and Data Mining)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 20.04.2018
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Maximale Teilnehmerzahl: 9

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • BPET: Business Process & Enterprise Technologies
    • HPI-BPET-K Konzepte und Methoden
  • BPET: Business Process & Enterprise Technologies
    • HPI-BPET-S Spezialisierung
  • BPET: Business Process & Enterprise Technologies
    • HPI-BPET-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge

Beschreibung

Smart representations (such as embeddings, graphical models, discretizations) are useful models that allow the abstraction of data within a well-defined mathematical formalism. The representations we aim at are conceptual abstractions of real world phenomena (such as sensor reading, causal dependencies, social interactions) into the world of statistics and discrete mathematics in such a way that the powerful tools developed in those areas are available for complex analyses in a simple and elegant manner.  

Usually data is transformed explicitly or implicitly from raw data representation (as it was measured or collected) into a smart data representation (more useful for data analysis). One goal of such smart representations, e.g. with a higher level of abstraction, is to enable the application of data mining techniques and theory developed in different areas. Smart data representations in many cases also induce a reduction of the original data mining problem into a more tractable or more compact problem formulation that can be solved by an algorithm (e.g. with lower worst case complexity, scalable to larger data sizes, more robust to data artifacts, etc.).

In this seminar we will focus on four smart data representations with the aim of understanding the analytical properties of different data mining tasks:

  • representations + similarities of graphs for classification [1][2][3]

  • representations of natural time series baselines for outlier interpretation [4][5]

  • representing dependencies in event series and time series [6][7]

  • representations of incomplete data via imputation [8][9]

The main focus in each of these three areas will be the understanding and comparison of smart representations and their explicit/implicit data transformation methods. By transforming the data we will study limitations or advantages of each technique and how the data representation changes the problem setup, reduces complexity, introduces robustness, or other valuable properties for big data analytics.

Literatur

[1] Verma, Saurabh, and Zhi-Li Zhang. "Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs." NIPS, 2017.

[2] Tsitsulin, Anton, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. "VERSE: Versatile Graph Embeddings from Similarity Measures." WWW, 2017.

[3] Yanardag, Pinar, and S. V. N. Vishwanathan. "Deep graph kernels." KDD, 2015.

[4] Sundararajan et al.  "Axiomatic Attribution for Deep Networks" ICLM 2017

[5] Riberio et al., "Why should I trust you?" Explaining the Predictions of Any Classifier, KDD 2016

[6] N. Du, H. Dai et al.: "Recurrent Marked Temporal Point Processes: Embedding Event History to Vector." KDD, 2016.

[7] S. Agrawal, G. Atluri, et al.: "Tripoles: A New Class of Relationships in Time Series Data." KDD, 2017.

[8] Lovedeep Gondara, Ke Wang: “MIDA: Multiple Imputation using Denoising Autoencoders”. Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018.

[9] Stef van Buuren: “Flexible Imputation of Missing Data" March 29, 2012 by Chapman and Hall/CRC.

Lern- und Lehrformen

Further information will be found here.

Leistungserfassung

Präsentation and Paper

Termine

Kickoff-Meeting: Monday 23. April 9:15 in room D-E.9/10

Zurück