The Hasso Plattner Institute offers a practically-oriented computer science study program at an internationally recognized institute. This study includes the Germany-wide unique IT-Systems Engineering program and the five master programs Cybersecurity, Data Engineering, Digital Health, IT-Systems Engineering and Software Systems Engineering.

Our researchers at HPI benefit from an inspiring scientific environment as well as a collaborative and inclusive atmosphere. In this environment, they obtain insights and findings that achieve societal impact. Our scientific work is structured within research clusters. In addition, we work together with scientific institutions, companies, and public institutions in numerous research programs worldwide.

The Hasso Plattner Institute in Potsdam is unique on the German academic landscape. The institute's program continues to grow with the support of its founder Hasso Plattner and through international cooperation. Find out more about the founder, events and studies at HPI.

The Hasso Plattner Institute has educational programs for both high school students and working professionals. It operates its own IT learning platform - openHPI - which provides free online courses. The Youth Academy organizes computer science camps and events for high school students. Professionals can take advantage of educational opportunities in the field of Design Thinking at the HPI Academy.

The press area of the Hasso Plattner Institute provides you with the latest press material, news, information on our social media channels and contact details.

Nuhad Shaabani

On Discovering and Incrementally Updating Inclusion Dependencies

In today's world, many applications produce large amounts of data at an enormous rate. Analyzing such datasets for metadata is indispensable for effectively understanding, storing, querying, manipulating, and mining them. Metadata summarizes technical properties of a dataset which rang from basic statistics to complex structures describing data dependencies. One type of dependencies is inclusion dependency (IND), which expresses subsetrelationships between attributes of datasets. Therefore, inclusion dependencies are important for many data management applications in terms of data integration, query optimization, schema redesign, or integrity checking. So, the discovery of inclusion dependencies in unknown or legacy datasets is at the core of any data profiling effort.

For exhaustively detecting all INDs in large datasets, we developed SINDD++, a new algorithm that eliminates the shortcomings of existing IND-detection algorithms and significantly outperforms them. S-INDD++ is based on a novel concept for the attribute clustering for efficiently deriving INDs. Inferring INDs from our attribute clustering eliminates all redundant operations caused by other algorithms. S-INDD++ is also based on a novel partitioning strategy that enables discording a large number of candidates in early phases of the discovering process. Moreover, S-INDD++ does not require to fit a partition into the main memory-this is a highly appreciable property in the face of ever-growing datasets. S-INDD++ reduces up to 50 % of the runtime of the state-of-the-art approach.

None of the approach for discovering INDs is appropriate for application on dynamic datasets; they can not update the INDs after an update of the dataset without reprocessing it entirely. I developed the first approach for incrementally updating INDs in frequently changing datasets. I achieved that by reducing the problem of incrementally updating INDs to the incrementally updating the attribute clustering from which all INDs are efficiently derivable. I realized the update of the clusters by designing new operations to be applied to the clusters after every data update. The incremental update of INDs reduces the time of the complete rediscovery by up to 99.999 %.

All existing algorithms for discovering n-ary INDs are based on the principle of candidate generation-they generate candidates and test their validity in the given data instance. The major disadvantage of this technique is the exponentially growing number of database accesses in terms of SQL queries required for validation. I proposed MIND2, the first approach for discovering n-ary INDs without candidate generation. MIND2 is based on a new mathematical framework developed in this thesis for computing the maximum INDs from which all other n-ary INDs are derivable. M1ND2 is significantly more scalable and effective than hypergraph-based algorithms.

Ombudsperson

Ombudspersons serve as neutral and qualified advisors in questions of good scientific practice and in suspected cases of scientific misconduct.

As far as possible, they contribute to solution-oriented conflict mediation.

If you have any questions, please contact:

Prof. Dr. Tilmann Rabl

Tel.: +49 (0)331 5509-280
E-Mail: tilmann.rabl(at)hpi.de

Future SOC Lab

The “HPI Future SOC Lab” is a cooperation of the Hasso-Plattner-Institut (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners.

Further Information

Research Schools

The HPI Research Schools for "Service-Oriented Systems Engineering" and "Data Science and Engineering" have branches in Cape Town, Haifa, Irvine and Nanjing.

Further Information

Digital Health Cluster

The Digital Health Cluster of the Hasso Plattner Institut (HPI) brings together individuals from health sciences, human sciences, data sciences, digital engineering and society with a shared goal to improve health and wellbeing.

Further Information