• de
Prof. Dr. Holger Karl

Open Bachelor and Master Theses Topics

Topic areas: 

BA/MA: Topics suitable primarily for Bachelor or Master thesis; usually possible to use otherwise as well but needs to be discussed. 

Orchestration and Management

Orchestration and management refers to handling pieces of software, individually or combined, when they are deployed into a distributed system. Typical example is handling microservices. 

MA: Orchestrate microservice chains with WebAssemblies (RESERVED)

Deploying microservices in a complex environment comprising core, edge, and far clouds requires so-called orchestration functions: decide how many instances of a component are needed to deal with load, where which component runs, which instances deals with which traffic flows, etc. This entails lifecycle management of these components: starting, stopping, migrating, state transfer, etc. Typically, components are realized as virtual machine images or containers, which are relatively easy to manage but heavy-weight.

An alternative idea is to use WebAssemblies [1]. As they come from a browser context, it is not clear whether they are suitable to act as components in such chains. The goal of this thesis is to develop a concept how to integrated WebAssemblies in such chains, which lifecycle management approaches are suitable, and how they can be orchestrated. As a proof-of-concept, this orchestration functionality should be integrated into a common open-source orchestrator, e.g., Open-Source MANO [2]. 


  • Familiar with microservices, virtual function chains, or similar concepts
  • Good software engineering skills   
  • Good knowledge of Linux OS, shell scripts, OS API.
  • Familiarity with cloud computing and typical toolchains clearly a plus 


MA: Build chain for multi-version executables

Services are being deployed into conventional clouds, but more and more also into new systems like edge clouds, "far" clouds, or so-called fog computing setups. These systems feature highly heterogeneous devices of very different capabilities, with very different connectivity.  Dealing with data flow is a problem in such contexts, but so is deadline with software distribution and deployment. One idea is to flexibly distribute different versions of software artefacts, ranging from full-fledged virtual machine images  down to mere source code. When deploying such a generalized form of a component, it needs to be built on an edge device: possibly compiling from source code, possibly just downloading Docker layers, etc.

To address this idea, this these has two goals. First, design and prototype a build toolchain that is capable of building artefacts based on generalized descriptions of software; this toolchain should leverage and encompass existing CI/CD concepts as much as useful. Second, obtain an understanding and performance characteristics of using this toolchain on different types of devices for representative examples of typical microservice software. 


  • Familiar with build toolchains (Make, Maven, etc.) and microservice software engineering concepts
  • Good knowledge of Linux OS, shell scripts, OS API.
  • Familiarity with cloud computing and typical toolchains clearly a plus 

MA: Placement / Scaling with moveable infrastructure

When deploying and running microservices (or closely related network function chains) to and in edge or core clouds, typical assumptions about these kinds of infrastructure prevail: it is dependable, does not fail, does not move. On that basis, many so-called orchestration algorithms have been designed; these algorithms decide, e.g., how many instances of a service to run, where each instance runs, and which instance serves which data flow.

This mindset, however, changes with new types of infrastructure: vehicles can be seen as a moving cloud, but only vehicles in the vicinity of a particular intersection can be of interest. Fleets of drones similarly can act as (very simple, very specialized) service providers; but they need to handover service execution once they run out of battery power and have to be replaced by another drone, for a few minutes. For such volatile, evolving infrastructures, there is very little in the literature about suitable orchestration concepts.

The goal of this thesis is hence to identify a suitable model for volatile infrastructure, to cast some typical orchestrations problems into that model and to design and to evaluate their performance. As this is a fairly open area, the topic is also fairly open and evolving the concept is clearly part of the thesis assignment. 


  • Familiar with cloud computing concepts and microservices / network function virtualization 
  • Ideally also familiar with vehicle-to-anything   
  • Good modeling skills
  • Some experience in one of: optimization problems, heuristic design, machine learning is useful 

MA: A Simian Army meets Machine Learning - Introducing errors into learning and inference

In conventional distributed software systems, the deliberate introduction of faults has proven to be a powerful tool to ensure that programmers prepare for actual malfunction of such systems. A popular example for this approach is the so-called "Simian Army" concept, developed by Netflix: So-called "monkeys" are little programmers that inject misbehavior, for simply killing components of a microservice to disconnecting an entire data center from a network. This Simian Army is (accordinly to Neflix claims) part of their operational system and has substantially improved the resilience and dependability of their systems by ensure programmers are actually preparing for the worst since they experience it everyday. 

The idea for this thesis is to evaluate whether this idea can be translated to the case when the components are not developed by programmers but realized as machine-learning agents. While introducing random variation in learning input is a standard technique, we want to check here whether more substantial perturbations, akin to this Simian Army, make sense in ML-controlled or ML-reliazed environments as well. This can pertain to service components that are part of a user-facing application; it can also be pertain to control and management software of a platform itself. These options should be explored in the thesis. Fault injection could happen during training, during inference, or during continuous training, at different system level. 

The ideal outcome of the thesis is (1) a characterization which types of services can profit from what type of fault injection and (2) a prototypical implementation of a subset of such services plus fault injection, with a demonstration of improved dependability. 


  • Familiar with cloud computing concepts and microservices / network function virtualization 
  • Some software engineering experience, in the sense of DevOps and continuous integration/delivery
  • Some machine-learning background 
  • Good programming skills; distributed platforms a plus  

MA: Robustness for ML components - Approximate consensus

A typical way to make a system robust is to execture key components redundantly, for example, three times, and then take a simple majority vote on the result - so-called tripple modular redundancy. This approach protects well against simple misbehaivor, assuming the additional voting instance is much more robust than the actual components. But it cannot immediately protect, for example, against systematic errors include in the programm of the original components. For very high dependability requirements (like in aircrafts), three different versions of a component might be developed, which are then running in parallel and feeding their output into a voting component. The voter is still simple if the components' output is discrete and there is a clear differentiation between correct or incorrect results and clear majorities. 

It gets more complicated, however, when the output is continuous - then, more complex approaches like approximate consensus are needed. Additional complexity ensues when the componenuts are not generated by a conventional, well-understood algorithm, but by a component based on machine-learning results. This thesis shall investigate such scenarios: components are trained on different input (to induce diversity and robustness) and are then combined to deliver a single result - e.g. by tripple modular redundancy or by approximate consensus. 

The ideal result is a classification of component behavior and voting schemes, as well as their possible combinations, and a characterization how they increase dependability. 


  • Some background in theoretical computer science, distributed algorithms, and in particular consensus 
  • Some background in distributed systems 
  • Knowledge in machine learning 
  • Good programming skills; distributed platforms a plus  



In the context here, profiling is the process of obtaining quantitative data about a piece of software. For example, to handle how much load, how many resources are necessary? Profiling appears in various contexts and profiling data is usually a stepping stone for management and orchestration systems. 


BA: Profile containers on IoT devices

When managing microservices (or related concepts like network function chains), it is useful to know how many requests per time unit a microservice can handle, given how many resources. This is typically of interest in a cloud context, where microservices are typically run in containers or virtual machines. For those environments, we have developed, in prior work, a system that automates the profiling of such services: measures their performance when exposed to varying load levels.

With the growing relevance of microservices for Internet-of-Things scenarios and edge clouds or far clouds, profiling on embedded devices becomes interesting. The goal of this Bachelor thesis is to extend our existing profiling platform to incorporate Containers running on devices like Raspberry Pis. This entails extending the profiling platform to interface with Raspberry Pis, find a representative collection of containers, and proof the concept by automatically profiling those containers. A comparison with functionally equivalent containers can be useful as well. Also, resource-efficient profiling (how to get best results out of a limited number of Raspberry Pis) is an interesting extension. 


  • Good knowledge of Linux OS, shell scripts, OS API. 
  • Some experience with Docker and Raspberry Pis is a plus. 

BA: Profiling WebAssemblies

Microservices  (or related concepts like network function chains) are a popular way to structure complex applications into simpler components. Typically, each component runs separately, e.g., in a Docker container or in a virtual machine. While these techniques have advantages, they are also resource hungry. An alternative idea is to use WebAssembly-based components in microservices, transferring to WebAssembly from a browser to a server context. 

The goal of this thesis is to understand performance characteristics of WebAssembly-based microservices. Such characteristics can be expressed in performance profiles [1], which can be created automatically using proper toolchains [2]. The goal here is to extend our existing toolchain to handle WebAssemblies and to proof that concept by choosing suitable examples and generate such performance profiles. Ideally, this should take place in different contexts (in VMs, possibly on "bare metal", ...).


  • Good knowledge of Linux OS, shell scripts, OS API. 
  • Experience with WebAssembly clearly a plus 


Machine Learning

Topics here pertain both to the application of machine learning to operate a network / a distribtued system as well as to running machine-learning applications inside a distributed system. 

BA/MA: Treating machine learning workflows as microservices

Machine-learning workflows often comprise individual components, acting separately on their input data. In that sense, they are similar to conventional microservices (e.g., three-tier web applications). In other senses, they are quite different in their data flows, computational patterns, etc.

It is the goal of this thesis to investigate if and to what degree ML workflows can be managed (at runtime) similar to microservice chains. An ideal outcome would be to identify relevant examples, cast them as MS chains, and show that they can be orchestrated by an existing orchestrator like Open-Source MANO (OSM). A possible approach could be to automatically generated description files used by such an orchestrator. 

Extension to Master thesis: Extending orchestration logic of such an orchestrator to better deal with ML workflows and showing performance benefits.


  • Familiar with software-engineering concepts like microservices
  • Familiar with machine-learning concepts
  • Good practical development skills 


MA: Workload characterization of ML workloads

When trying to run machine-learning workloads in resource-limited environments, and understanding of their performance characteristics is useful: how much does an ML workload profit from additional cores or additional memory? What does that mean, specifically, for inference or training? If, e.g., training can be split over multiple machines, what data flows ensue, what are performance impacts? Overall, how malleable are these workloads? 

The goal of this thesis is to develop the notion of a performance profile (so-far used mostly for conventional applications) to ML workloads. Then, an existing profiling environment should be extended to deal with such workloads and for characteristic examples, the concept should be proven by example characterizations.


  • Very good understanding of machine-learning techniques and practical implementations
  • Good understanding of concepts like microservices
  • Good implementation and system skills (e.g., scripting) are a clear plus

MA: Manage competing ML workflows (RESERVED)

Suppose there are limited resources available, for example, in an edge cloud environment. Suppose further that these resources should be shared among conventional applications (e.g., web services), machine-learning inference applications, and machine-learning training applications. In a limited environment, tradeoffs between these applications will be necessary. 

The goal of this thesis is to devise a resource management approach that assigns resources to these competing applications and takes their varying requirements into account. E.g., an ML training application might well be postponed somewhat, but at some point, model accuracy will deteriorate rapidly. Hence, a new concept of fair resource allocations are necessary; that concept needs to be developed and realized by the resource management approach. A proof-of-concept realization should demonstrate that desired goals are indeed achieved, using representative examples for both conventional as well as ML applications.


  • Very good understanding of machine-learning techniques and practical implementations
  • Very good modeling skills  
  • Good understanding of resource management concepts, e.g., various fairness concepts like max-min fairness
  • Good implementation and practical system skills

MA: Line-rate ML

Machine-learning applications can be used in situations where very fast decisions are necessary, e.g., when operating on individual packets in a router or switch. A conventional approach - receive the packet, copy the packet into user space, let an ML inference application work on it, and inject the packet back into the network stack - work fine if there is ample time. But when packets need to be processed at the speed at which they arrive without causing delay - so-called "at line rate processing" - then such simplistic approaches do not suffice.

The goal of this thesis is thereof to investigate techniques how ML-based inference (possibly also input into learning) can be achieved at line rate. The thesis entails concept development, prototypical implementation, example selection, and demonstration of a proof-of-concept. 


  • Very good understanding of machine-learning techniques and practical implementations
  • Very good operation system-level implementation skills (e.g., device drivers) are a real plus! 
  • Experience with low-level hardware (e.g., network drivers, P4) are a real plus! 

BA: Quality of Learning

For conventional applications (e.g., three-tier Web servers, video streaming, gaming), the notion of Quality of Service and Quality of Experience are well understood: They describe quantitative, low-level measurable metrics (like data rate) or user-perceived metrics (like mean opinion score about a video quality).

For machine-learning applications, specifically when training happens in a distributed fashion, such metrics are not well developed and not tied in with these conventional metrics. It is the goal of this thesis to develop concepts for suitable machine-learning metrics (e.g., Quality of Learning, Model accuracy, ...) and to characterize example applications using these metrics.


  • Very good understanding of machine-learning techniques and practical implementations

BA: Distributed ML over Bundle

Distributed machine learning has received considerable attention, e.g., in the form of Federated Learning (Google). Most schemes assume constant network connectivity to exchange data or model updates. But what happens if distributed learning  takes place in an environment where devices are only intermittently connected?

For such environments, protocols to exchange data do exist, for example, the Bundle protocol from the Delay-Tolerant Networking community. The goal of this thesis is to take scenarios for distributed ML and check what happens if the underlying network is intermittently connected and a protocol like Bundle is used. How does this affect learning progress, can data forwarding be prioritized meaningfully, knowing that this is machine-learning related?


  • Good understanding of machine-learning techniques and practical implementations
  • Good understanding of networking basics
  • Good practical implementation skills 


Wireless networks

Wireless networks have characteristic challenges not present in wired networks: Users move around, wireless channel quality can change rapidly, the transmission techniques are considerably more complex and have substantially more control options. In particular, the resource management problems become harder but must be solved faster. 

Machine learning for Resource Management in CoMP networks

Cooperative multi-point (CoMP) is a cellular transmission technique where a mobile user is supported by multiple base stations simultaneously, for example, to stabilize throughput for users that are at the edge of wireless cells. This entails complex resource management and scheduling problems (which PRBs of which cell to use; how to schedule these users across multiple cells, depending on the specific CoMP technique, ...). In prior work, we have tackled the downlink CoMP problem by using Reinforcement Learning, in a simplified model. Goal of this thesis would be to make the wireless model more precise (possibly restructuring the learning problem in multiple ways) or to look at the uplink case. To undertake this thesis, you should have at least some prior exposure to machine learning and networking (wireless networking strongly recommended). 

Assigned Topics and ongoing theses

MA: Terraform goes IoT (ASSIGNED)

Cloud computing happens in a competitive environment: Multiple vendors offer cloud resources under incompatible APIs. On the other hand, spreading an application (e.g., a microservice-based chain of components) over multiple clouds from different vendors can have commercial and technical advantages. Terraform [1] is a popular tool to bridge API gaps between vendors and hide them under a uniform interface.

The goal of this thesis is to extend this idea to also incorporate IoT devices and very slim-lined "far cloud" scenarios into Terraform: Extend Terraform in such a fashion that components can be deployed in such contexts as well. This entails obtaining an understanding of options to run software on such devices, of Terraform. A subgoal is to design a proper extension, and the proof of concept lies in demonstrating the ability to run a cloud application via Terraform on either a conventional or an IoT/far cloud.


  • Good knowledge of Linux OS, shell scripts, OS API. 
  • Experience with cloud computing and typical toolchains clearly a plus.