Hasso-Plattner-Institut
 
    • de
Hasso-Plattner-Institut
Prof. Dr. Holger Karl
  
 

Open Bachelor and Master Theses Topics

Topic areas: 

BA/MA: Topics suitable primarily for Bachelor or Master thesis; usually possible to use otherwise as well but needs to be discussed. 


 

Orchestration and Management

Orchestration and management refers to handling pieces of software, individually or combined, when they are deployed into a distributed system. Typical example is handling microservices. 

MA: Terraform goes IoT

Cloud computing happens in a competitive environment: Multiple vendors offer cloud resources under incompatible APIs. On the other hand, spreading an application (e.g., a microservice-based chain of components) over multiple clouds from different vendors can have commercial and technical advantages. Terraform [1] is a popular tool to bridge API gaps between vendors and hide them under a uniform interface.

The goal of this thesis is to extend this idea to also incorporate IoT devices and very slim-lined "far cloud" scenarios into Terraform: Extend Terraform in such a fashion that components can be deployed in such contexts as well. This entails obtaining an understanding of options to run software on such devices, of Terraform. A subgoal is to design a proper extension, and the proof of concept lies in demonstrating the ability to run a cloud application via Terraform on either a conventional or an IoT/far cloud.


Prerequisites: 

  • Good knowledge of Linux OS, shell scripts, OS API. 
  • Experience with cloud computing and typical toolchains clearly a plus.
     

MA: Orchestrate microservice chains with WebAssemblies

Deploying microservices in a complex environment comprising core, edge, and far clouds requires so-called orchestration functions: decide how many instances of a component are needed to deal with load, where which component runs, which instances deals with which traffic flows, etc. This entails lifecycle management of these components: starting, stopping, migrating, state transfer, etc. Typically, components are realized as virtual machine images or containers, which are relatively easy to manage but heavy-weight.

An alternative idea is to use WebAssemblies [1]. As they come from a browser context, it is not clear whether they are suitable to act as components in such chains. The goal of this thesis is to develop a concept how to integrated WebAssemblies in such chains, which lifecycle management approaches are suitable, and how they can be orchestrated. As a proof-of-concept, this orchestration functionality should be integrated into a common open-source orchestrator, e.g., Open-Source MANO [2]. 

Prerequisites: 

  • Familiar with microservices, virtual function chains, or similar concepts
  • Good software engineering skills   
  • Good knowledge of Linux OS, shell scripts, OS API.
  • Familiarity with cloud computing and typical toolchains clearly a plus 

  

MA: Build chain for multi-version executables

Services are being deployed into conventional clouds, but more and more also into new systems like edge clouds, "far" clouds, or so-called fog computing setups. These systems feature highly heterogeneous devices of very different capabilities, with very different connectivity.  Dealing with data flow is a problem in such contexts, but so is deadline with software distribution and deployment. One idea is to flexibly distribute different versions of software artefacts, ranging from full-fledged virtual machine images  down to mere source code. When deploying such a generalized form of a component, it needs to be built on an edge device: possibly compiling from source code, possibly just downloading Docker layers, etc.

To address this idea, this these has two goals. First, design and prototype a build toolchain that is capable of building artefacts based on generalized descriptions of software; this toolchain should leverage and encompass existing CI/CD concepts as much as useful. Second, obtain an understanding and performance characteristics of using this toolchain on different types of devices for representative examples of typical microservice software. 

Prerequisites:

  • Familiar with build toolchains (Make, Maven, etc.) and microservice software engineering concepts
  • Good knowledge of Linux OS, shell scripts, OS API.
  • Familiarity with cloud computing and typical toolchains clearly a plus 
     

MA: Placement / Scaling with moveable infrastructure

When deploying and running microservices (or closely related network function chains) to and in edge or core clouds, typical assumptions about these kinds of infrastructure prevail: it is dependable, does not fail, does not move. On that basis, many so-called orchestration algorithms have been designed; these algorithms decide, e.g., how many instances of a service to run, where each instance runs, and which instance serves which data flow.

This mindset, however, changes with new types of infrastructure: vehicles can be seen as a moving cloud, but only vehicles in the vicinity of a particular intersection can be of interest. Fleets of drones similarly can act as (very simple, very specialized) service providers; but they need to handover service execution once they run out of battery power and have to be replaced by another drone, for a few minutes. For such volatile, evolving infrastructures, there is very little in the literature about suitable orchestration concepts.

The goal of this thesis is hence to identify a suitable model for volatile infrastructure, to cast some typical orchestrations problems into that model and to design and to evaluate their performance. As this is a fairly open area, the topic is also fairly open and evolving the concept is clearly part of the thesis assignment. 

Prerequisites: 

  • Familiar with cloud computing concepts and microservices / network function virtualization 
  • Ideally also familiar with vehicle-to-anything   
  • Good modeling skills
  • Some experience in one of: optimization problems, heuristic design, machine learning is useful 
     

 

Profiling

In the context here, profiling is the process of obtaining quantitative data about a piece of software. For example, to handle how much load, how many resources are necessary? Profiling appears in various contexts and profiling data is usually a stepping stone for management and orchestration systems. 

 

BA: Profile containers on IoT devices

When managing microservices (or related concepts like network function chains), it is useful to know how many requests per time unit a microservice can handle, given how many resources. This is typically of interest in a cloud context, where microservices are typically run in containers or virtual machines. For those environments, we have developed, in prior work, a system that automates the profiling of such services: measures their performance when exposed to varying load levels.

With the growing relevance of microservices for Internet-of-Things scenarios and edge clouds or far clouds, profiling on embedded devices becomes interesting. The goal of this Bachelor thesis is to extend our existing profiling platform to incorporate Containers running on devices like Raspberry Pis. This entails extending the profiling platform to interface with Raspberry Pis, find a representative collection of containers, and proof the concept by automatically profiling those containers. A comparison with functionally equivalent containers can be useful as well. Also, resource-efficient profiling (how to get best results out of a limited number of Raspberry Pis) is an interesting extension. 

Prerequisites:

  • Good knowledge of Linux OS, shell scripts, OS API. 
  • Some experience with Docker and Raspberry Pis is a plus. 
     

BA: Profiling WebAssemblies

Microservices  (or related concepts like network function chains) are a popular way to structure complex applications into simpler components. Typically, each component runs separately, e.g., in a Docker container or in a virtual machine. While these techniques have advantages, they are also resource hungry. An alternative idea is to use WebAssembly-based components in microservices, transferring to WebAssembly from a browser to a server context. 

The goal of this thesis is to understand performance characteristics of WebAssembly-based microservices. Such characteristics can be expressed in performance profiles [1], which can be created automatically using proper toolchains [2]. The goal here is to extend our existing toolchain to handle WebAssemblies and to proof that concept by choosing suitable examples and generate such performance profiles. Ideally, this should take place in different contexts (in VMs, possibly on "bare metal", ...).

Prerequisites: 

  • Good knowledge of Linux OS, shell scripts, OS API. 
  • Experience with WebAssembly clearly a plus 

 

Machine Learning


Topics here pertain both to the application of machine learning to operate a network / a distribtued system as well as to running machine-learning applications inside a distributed system. 

BA/MA: Treating machine learning workflows as microservices

Machine-learning workflows often comprise individual components, acting separately on their input data. In that sense, they are similar to conventional microservices (e.g., three-tier web applications). In other senses, they are quite different in their data flows, computational patterns, etc.

It is the goal of this thesis to investigate if and to what degree ML workflows can be managed (at runtime) similar to microservice chains. An ideal outcome would be to identify relevant examples, cast them as MS chains, and show that they can be orchestrated by an existing orchestrator like Open-Source MANO (OSM). A possible approach could be to automatically generated description files used by such an orchestrator. 

Extension to Master thesis: Extending orchestration logic of such an orchestrator to better deal with ML workflows and showing performance benefits.

Prerequisites:

  • Familiar with software-engineering concepts like microservices
  • Familiar with machine-learning concepts
  • Good practical development skills 

  

MA: Workload characterization of ML workloads

When trying to run machine-learning workloads in resource-limited environments, and understanding of their performance characteristics is useful: how much does an ML workload profit from additional cores or additional memory? What does that mean, specifically, for inference or training? If, e.g., training can be split over multiple machines, what data flows ensue, what are performance impacts? Overall, how malleable are these workloads? 

The goal of this thesis is to develop the notion of a performance profile (so-far used mostly for conventional applications) to ML workloads. Then, an existing profiling environment should be extended to deal with such workloads and for characteristic examples, the concept should be proven by example characterizations.

Prerequisites:

  • Very good understanding of machine-learning techniques and practical implementations
  • Good understanding of concepts like microservices
  • Good implementation and system skills (e.g., scripting) are a clear plus
     

MA: Manage competing ML workflows


Suppose there are limited resources available, for example, in an edge cloud environment. Suppose further that these resources should be shared among conventional applications (e.g., web services), machine-learning inference applications, and machine-learning training applications. In a limited environment, tradeoffs between these applications will be necessary. 

The goal of this thesis is to devise a resource management approach that assigns resources to these competing applications and takes their varying requirements into account. E.g., an ML training application might well be postponed somewhat, but at some point, model accuracy will deteriorate rapidly. Hence, a new concept of fair resource allocations are necessary; that concept needs to be developed and realized by the resource management approach. A proof-of-concept realization should demonstrate that desired goals are indeed achieved, using representative examples for both conventional as well as ML applications.

Perquisites:

  • Very good understanding of machine-learning techniques and practical implementations
  • Very good modeling skills  
  • Good understanding of resource management concepts, e.g., various fairness concepts like max-min fairness
  • Good implementation and practical system skills
     

MA: Line-rate ML

Machine-learning applications can be used in situations where very fast decisions are necessary, e.g., when operating on individual packets in a router or switch. A conventional approach - receive the packet, copy the packet into user space, let an ML inference application work on it, and inject the packet back into the network stack - work fine if there is ample time. But when packets need to be processed at the speed at which they arrive without causing delay - so-called "at line rate processing" - then such simplistic approaches do not suffice.

The goal of this thesis is thereof to investigate techniques how ML-based inference (possibly also input into learning) can be achieved at line rate. The thesis entails concept development, prototypical implementation, example selection, and demonstration of a proof-of-concept. 

      
Perquisites:

  • Very good understanding of machine-learning techniques and practical implementations
  • Very good operation system-level implementation skills (e.g., device drivers) are a real plus! 
  • Experience with low-level hardware (e.g., network drivers, P4) are a real plus! 
     

BA: Quality of Learning

For conventional applications (e.g., three-tier Web servers, video streaming, gaming), the notion of Quality of Service and Quality of Experience are well understood: They describe quantitative, low-level measurable metrics (like data rate) or user-perceived metrics (like mean opinion score about a video quality).

For machine-learning applications, specifically when training happens in a distributed fashion, such metrics are not well developed and not tied in with these conventional metrics. It is the goal of this thesis to develop concepts for suitable machine-learning metrics (e.g., Quality of Learning, Model accuracy, ...) and to characterize example applications using these metrics.

Perquisites:

  • Very good understanding of machine-learning techniques and practical implementations

BA: Distributed ML over Bundle

Distributed machine learning has received considerable attention, e.g., in the form of Federated Learning (Google). Most schemes assume constant network connectivity to exchange data or model updates. But what happens if distributed learning  takes place in an environment where devices are only intermittently connected?

For such environments, protocols to exchange data do exist, for example, the Bundle protocol from the Delay-Tolerant Networking community. The goal of this thesis is to take scenarios for distributed ML and check what happens if the underlying network is intermittently connected and a protocol like Bundle is used. How does this affect learning progress, can data forwarding be prioritized meaningfully, knowing that this is machine-learning related?

Perquisites:

  • Good understanding of machine-learning techniques and practical implementations
  • Good understanding of networking basics
  • Good practical implementation skills