Hasso-Plattner-Institut
Hasso-Plattner-Institut
  
Login
  • de
 

Jürgen Müller

A Real-Time In-Memory Discovery Service

The dissertation with the title “A Real-Time In-Memory Discovery Service” discusses how to efficiently retrieve track and trace information for an item of interest that took a certain path through a complex network of manufacturers, wholesalers, retailers and consumers.

An example that is used throughout this dissertation is the European pharmaceutical supply chain, which faces the challenge that more and more counterfeit medicinal products are being introduced. Between October and December 2008, more than 34 million fake drug pills were detected at customs control at the boarders of the European Union. These fake drugs can put lives in danger as they were supposed to fight cancer, take effect as painkiller or antibiotics, among others. The European Commission is aware of this problem and wants to leverage supply chain validity in order to determine whether a certain package of pharmaceuticals is genuine or not. While using this example, the work is applicable to many other scenarios as well.

This will be possible in the future because all products are equipped with a unique identifier. At strategic points in the supply chain, readers are installed. If a uniquely identified item is in the perimeter of the reader, a read event is created. Read events are stored in read event repositories, which are operated by the company where the read event took place. As items can be packed into uniquely identified boxes and containers, hierarchical packaging relationships are created and have to be considered. The supply chain for an item of interest comprises all companies the item passed in it’s lifecycle.

As read events are company-sensitive data, they are not shared with the public. Thus, a special information system is necessary to identify all relevant read events for a certain item of interest – this is the discovery service. Put in relation to read event repositories, the discovery service is a superordinate entity that supports and coordinates inter-organizational collaboration and information retrieval in a so called unique identifier network.

In the course of this dissertation, a discovery service is designed that explicitly includes hierarchical packaging relationships. That way, it differentiates from all existing discovery service approaches. This innovation allows for a new communication concept between requestor, discovery service, and relevant read event repositories, such that a minimal number of messages has to be exchanged. Furthermore, requestors get the response in real-time and the discovery service usage is more simplistic. Thus, also thin devices such as point-of-sale terminals and mobile devices can easily submit queries to the discovery service. Only re- quired data is transferred from read event repositories to the discovery service. The companies that own the read event repositories remain full data ownership about their read event repositories. The resulting complexity at the discovery service is dealt with by two algorithms. The first algorithm is an efficient and heuristic search algorithm. The second algorithm is called filter algorithm as it processes all read events returned by the search algorithm and evaluates whether they are valid for a given item of interest or not.

In addition to the communication protocol, the search algorithm, and the filter algorithm, the data management, which is developed in the dissertation is optimized for column-oriented in-memory databases with dictionary encoding. This opens the opportunity to handle the data volume that occurs in so called “Unique Identifier Networks”. In the example of the European pharmaceutical supply chain, approximately 15 billion packages of prescription-only pharmaceuticals are sold per year. These packages are subject to unique identification, which results in about 35 billion read events that have to be processed by the discovery service.

In the present dissertation, the discovery service was prototypically implemented using JAVA as the programming language and SAP HANA as the database of choice. The evaluation shows that the data volume of a large unique identifier network could be reduced to a manageable size. For the complete European pharmaceutical supply chain, only 600 GB of main memory is necessary – a size, which is commercially available for enterprise servers. The compression factor is approximately 17. Furthermore, it is shown that

  • the data can be loaded into the discovery service fast enough, i.e. the required 8,000 read events per second can be processed,
  • the throughput of the discovery service is high enough, i.e. about 20 servers are sufficient to cope with 2,000 discovery service queries in the European pharmaceutical supply chain,
  • the developed discovery service is scalable.

Selected further areas of application are the effective support of recalls, company- spanning supply chain optimization, and pattern recognition in supply chains. As the presented discovery service approach explicitly integrates changes in packaging hierarchies, this approach can easily be mapped to bill of material problems, e.g. to identify all parts of an airplane and their history at an arbitrary point in time.