Summer Semester 2017

03.05.2017 - Mirela Alistar

Hands-on digital microfluidics

Biochips are arrays of electrodes on a printed circuit board that can transport and process droplets, such as performing an in-vitro diagnosis on a droplet of human blood. During this session, I will demonstrate live the principles behind digital microfluidics, and we will design together an application to be executed on biochips.

10.05.2017 - Johannes Wolf

Mobile Mapping Point Cloud Analysis

3D point clouds have evolved as an effective digital representation of geospatial data. Technical advances in capturing technologies made scans of complete cities and countries affordable and allow to create large digital archives of sites or environments. Traditional airborne data is more and more complemented with mobile mapping scans, adding information from a ground perspective. The resulting 3D point clouds are used for documentation, urban planning and inspection.

This talk gives an overview about research topics in the field of mobile mapping point cloud analysis and the challenges arising from the highly detailed, but large and unordered data sets.

17.05.2017 - Christiane Hagedorn

Exploring the Potential of Game-Based Learning in Large Scale e-Learning Environments

A wide range of Massive Open Online Courses (MOOCs) on various topics has been offered since 2008. In recent years, gamification was implemented on many platforms to better engage participants in MOOCs. To foster the participants' understanding of the learning materials, the use of Digital Game-Based Learning (DGBL) should be considered as well. Although educational games are currently being used more often in classrooms, they have rarely been implemented in large scale e-learning environments until now.

In this talk, previous and current implementations of gamification, DGBL, and their role within MOOCs will be evaluated. It is also explained why and how playing can be beneficial for learning and how gamification and game-based learning are connected. Possible approaches how to integrate DGBL to the openHPI MOOC platform as well as advantages, disadvantages, and different needs will be discussed.

24.05.2017 - Robert Kovács

TrussFab: Fabricating Sturdy Large-Scale Structures on Desktop 3D Printers

TrussFab is an integrated end-to-end system that allows users to fabricate large scale structures that are sturdy enough to carry human weight. TrussFab achieves the large scale by complementing 3D print with plastic bottles. It does not use these bottles as "bricks" though, but as beams that form structurally sound node-link structures, also known as trusses, allowing it to handle the forces resulting from scale and load. TrussFab embodies the required engineering knowledge, allowing non-engineers to design such structures and to validate their design using integrated structural analysis. We have used TrussFab to design and fabricate tables and chairs, a 2.5 m long bridge strong enough to carry a human, a functional boat that seats two, and a 5 m diameter dome.

31.05.2017 - Christian Adriano

Human in the loop mechanism for self-adaptive exploratory services

Much of the current software development work involves exploring large data sets, for instance, finding a software fault in millions of lines of code. Software engineers usually face three decision problems: where to start exploring, when to move on to the next data set, and when to finally stop the exploration. My approach is to make greedy decisions that look only at the next exploration step. For that I adopted the concept of expected utility which is the product of the probability of the data and the respective utility value.

I am evaluating my approach with an existing data set from two large crowdsourcing studies. These studies involved hundreds of workers helping to locate faults in popular open source software. Workers answered questions about the source of programs which were failing their unit tests. The goal was to locate the corresponding faults within the fewest number of lines of code and having asked the fewest number of questions.

07.06.2017 - Tobias Bleifuß

Enabling Change Exploration

Data and metadata suffer many different kinds of change: values are inserted, deleted or updated; entities appear and disappear; properties are added or re-purposed, etc. Explicitly recognizing, exploring, and evaluating such change can alert to changes in data ingestion procedures, can help assess data quality, and can improve the general understanding of the dataset and its behavior over time.

In this talk I will introduce change exploration: For a given dynamic dataset, we want to efficiently capture and summarize changes at instance-, and schema-level, and enable users to effectively explore this change in an interactive and graphical fashion. We propose a data model-independent framework — the change-cube — to formalize such change. Our change-cube enables exploration and discovery of changes to reveal dataset behavior over time. In combination with a set of query primitives, the change-cube already manifested in a web-based tool called DbChEx to allow first exploration steps. The talk will cover both the change-cube as well as a short demo on DbChEx.

14.06.2017 - Lan Jiang

Discovering Primary Keys and Foreign Keys From Metadata

Primary keys (PKs) and foreign keys (FKs) are important key constraints in relational databases for keeping data consistency and optimizing queries. In addition, these key constraints also help users to understand structure of schemata. However, in many real world datasets, PKs/FKs are not defined for the consideration of efficiency or simply lost during distributions. For large scale datasets, it is almost impossible to label the key constraints manually. It leads to an interesting topic of discovering them automatically.

In this talk I will introduce a few useful features to help distinguish PKs/FKs from non-key unique column combinations and inclusion dependencies, respectively, as well as a holistic approach using these features to detect them. Results on the datasets in different domains are displayed to show the precise and recall of our approach.

21.06.2017 - Thorsten Papenbrock (external)

Data Profiling – Efficient Discovery of Dependencies

Data profiling is the computer science discipline of analyzing a given dataset for its metadata. The most important types of metadata are arguably inclusion dependencies (INDs), unique column combinations (UCCs), and functional dependencies (FDs). If present, these dependencies serve to efficiently store, query, change, and understand the data. Most datasets, however, do not provide their metadata explicitly so that data scientists need to profile them.

In this talk, we discuss a novel, hybrid profiling algorithm for the automatic discovery of functional dependencies in relational instances. FDs are structural metadata that can be used for schema normalization, data integration, data cleansing, and many other data management tasks. Due to the importance of FDs, database research has proposed various algorithms for their discovery. None of these algorithms is, however, able to process datasets of typical real-world size, e.g., datasets with more than 50 attributes and a million records.

Our algorithm HyFD combines fast approximation and sophisticated validation techniques to efficiently discover all minimal FDs in relational datasets. The hybrid approach not only outperforms all existing discovery algorithm, it also scales to much larger datasets. HyFD and further metadata discovery algorithms have been implemented for the Metanome data profiling platform, which is the overall contribution of my PhD thesis.

28.06.2017 - Lukas Pirl

Pulling plugs: dependability stress testing of complex distributed systems

In the context of complex, fast-evolving, distributed systems, the approach of software fault injection for experimental dependability assessments does not seem to be unfolded to its full potential yet.
We propose a structured method to derive software fault injection campaigns from user-provided dependability models. Such a campaign tests all combinations of as many concurrently tolerable faults as possible (i.e., "dependability stress") and thus tests for synergistic effects. Additionally, we present a flexible framework to automate the aforementioned derivation and to coordinate the campaigns' exercise.
In a case study, we assess the dependability of OpenStack, a framework to build IaaS platforms, accordingly. Finding an adequate granularity for the dependability model, setting up a virtualized instance of OpenStack and automating its restoration from snapshots are especially challenging aspects. Results from exercising the campaign show that the chosen use case takes on average 1.8 times longer in the presence of injected faults.

05.07.2017 - Anton Tsitsulin

Towards principled graph representations through similarity distributions

Learning a latent low-dimensional vector representation of a large graph (i.e. graph embedding) has recently attracted vigorous attention. Such embeddings allow us to use data mining methods depended on vector spaces for graph analytics. Yet, while the problem of extracting such embeddings has attracted vigorous research attention, past works have adopted word embedding methods to graphs, without due attention to the different nature of graph data; crucially, the straightforward adaptation of word frequency counts to graphs via random walks leads to an undue bias towards nodes of high degree, which, as we show, compromises embedding quality.

In this talk, we will discuss VERtex Similarity Embeddings (VERSE), a simple and elegant method that computes graph embeddings based on graph-native principles. VERSE receives distributions of any vertex similarity measure per vertex as input, and trains a single-layer neural network to learn an embedding that preserves those distributions equitably for all vertex representations

12.07.2017 - Ahmed Shams

MOOCs in Poor Quality Networks

MOOCs - Massive Open Online Courses provide online courses designed for a large number of participants that can be accessed by anyone anywhere, as long as they have an Internet connection. As of today, MOOCs are loaded with a huge-sized educational material such as HD format videos that require a good deal of Internet connection when accessing the content effectively. However, in many regions, the connection is not as always consistent and reliable as expected.

This talk gives an overview of various approaches aiming to improve MOOCs in the regions with "poor" or "low" quality network bandwidth, particular in reference to Africa.

19.07.2017 - Sven Köhler

Indirect Live Analysis of Timing Effects on Memory Hierarchies: Tools for Adaptive Work-package Choice and Cross-LPAR Covert Channels

Hardware performance counters are a popular means to evaluate software behavior in regards of, for instance, consumed time, issued instructions, and failed branch prediction.

We investigate how measurement jitters—usually avoided in experiments—can provide detailed information on the overall system load and properties of currently running processes on the same hardware as our probe program.
At the example of memory access times on an IBM POWER8 processor, we present a set of probing tools. These tools enable not only a better understanding of how black-box processes' resource usage changes over time, but allow further the construction of middlewares that can dynamically start or defer annotated work packages based on the current cache utilization.

As a third application, we show how cache timing delays can be employed for the construction of cross-process covert channels. They allow for breaking the isolation of kernel process groups and potentially logical partitions (LPARs) at a current signal rate of 300 bauds.

26.07.2017 - Matthias Wenzel (external)

A Web Browser-Based Real-Time Remote 3D Prototyping Space

Prototypes help people to externalize their ideas and are a basic element for gathering feedback on an early product design. Prototyping is oftentimes a team-based method traditionally involving physical and analog tools. At the same time, collaboration among geographically dispersed team members becomes more and more standard practice for companies and research teams.

I present a standards compliant, web browser-based real-time remote 3D modeling system. It is a cross-platform application with focus on building volumetric early stage prototypes over distances in real-time. Furthermore, having appropriate Virtual Reality hardware in place, people can interact with three dimensional artifacts in a convenient way. For increased awareness, i.e. to be able to gain some level of shared knowledge about each other's activities, the system positions users' avatars and audio in the virtual room.

Hence, the presented application provides a new way of cooperative work in a collaborative virtual environment.