Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

Thesis Martin Lorenz

Leveraging In-Memory Technology for High-Performance Data Mining in Massive Moving Object Databases

Supply Chain Management (SCM) deals with the coordination of product-, financial-, and information flow between participants of a supply chain network. With the increasing maturation of Auto-ID technologies, e.g. RFID, 2D barcodes, companies are able to dramatically increase the quantity and quality of information captured from supply chain processes. Objects moving through transportation networks are equipped with RFID tags, which are scanned by RFID readers, positioned at strategic points in the supply chain. That way it is possible to provide a real-time view on supply chain processes. The automated capturing of an object's movement data incorporates the transition from real world events into the digital world of enterprise information systems. These digitalized events will be stored in large database systems, also known as read event repositories. The size of these databases can reach easily up to several tera bytes of pure data. In my dissertation, I focus on data mining in the context of moving object databases. The aforementioned read event repositories contain a huge potential, regarding the information that can be derived by analyzing the data. Traditional data mining approaches for massive data sets contains the following four steps:

  • Selection / Extraction
  • Preparation
  • Processing (e.g., Pattern Matching, Clustering, Classification, etc.)
  • Communication

With the arrival of high performance analytical databases such as SAP's NewDB, it becomes necessary to re-think this process. Especially the way business intelligence (BI) professionals interact with data mining applications can be improved. Expert observations showed that data mining should be a highly interactive process, because most users of data mining application cannot determine exactly what they are looking for. However, in most cases they know better what they are not looking for. Given the massively increased speed of analytical processing the overall data mining process needs to become interactively and dynamic. As part of my research, I am analyzing different business processes in SCM, regarding their potential for improvement by applying data mining applications. Having determined the respective processes, I will investigate data mining algorithms that can contribute to the improvement of the supply chain processes. Finally I am looking to break down these algorithms to push processing as far down to the database as possible to create data mining performance that allows a dynamic interaction scenario that lets BI experts "play" with the data and explore their solution to a degree that solves their problem.