Software-Defined Data Protection: Low Overhead Policy Compliance at the Storage Layer is Within Reach!

Zsolt István, TU Darmstadt

Abstract

This talk by Prof. Dr. Zsolt István is about the problems and challenges of making systems GDPR-compliant. It also discusses a possible solutions to some of those challenges. This solution is about reaching low-overhead policy compliance at the storage layer using specialized hardware. To achieve this goal, the processing inside the storage layer is decoupled into enforcement and decision making actions. That way the benefits of the specialized hardware can be used without suffering from the drawbacks of handling complex and nested problems.

Biography

Since October 2021, Zsolt István is Professor in Distributed Systems and Networking at the Technical University of Darmstadt, Germany. His academic career began with an engineering degree in computer science at the University of Cluj-Napoca (2011). Afterwards he received his master’s degree in Distributed Systems from ETH Zurich (2013). As he entered research more deeply with his PhD, he was then accompanied by Prof. Prof. Dr. Gustavo Alonso at ETH Zurich. Before his current position, he was an Associate Professor at the IT University of Copenhagen, Denmark, and an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain.

Summary

written by Florian Krummrey, Lasse Jahn

At first the talk goes over some technical terms and concepts. Afterwards István talks about the assumptions he made during his research and then concludes challenges, requirements and how to solve these challenges. Then he proposes the full idea and the results of his research. In the end he summarizes the problems his idea solves and the challenges that still remain.

Background

FPGA

A FPGA (Field Programmable Gate Array) is a type of specialized hardware, that allows you to have code directly implemented inside your hardware. This is accomplished by turning code into logical blocks and structuring these into modules. These than can be used to create the corresponding circuits. HDL(Hardware Description Language) or something similar can be used to visualize and communicate the design of the FPGA. Of course the ability to have code implemented inside hardware, comes at the cost of code occupying chip space. This is especially prominent for conditional statements, since in hardware both paths will be built. But since we have both paths built there is at least a performance boost for conditional statements, when using FPGA's. In general, small specific tasks can be accelerated with FPGA's. However, depicting complex, large problems in hardware is often associated with space problems.

GDPR

The GDPR(General Data Protection Regulation) contains, like the name says, articles regarding the regulation of data protection. Simplified these are:

Right to be Forgotten(Being able to completely delete data)
Detect potential data breach
All data is associated with a purpose
Users can object selectively to the use of their data
Protection against accidental loss or damage
Adhere to standards regardless of physical data location
Resist and detect malicious activities that compromise data

Building systems to comply by the regulations is no trivial task. Violations against these can be really expensive. Therefor many companies want to have systems in place, to automatically fulfill the regulations. But current solutions come at the cost of a significant performance overhead.

Rethink Processing in Storage Layer

As explained in the introduction the general goal of the research was to make policy compliance available without / with less overhead costs. Therefore István encourages to rethink the processing inside the storage layer. Traditionally the overhead cost is attempted to be overcome with more specialized and more expensive hardware. In this section a software based approach will be shown as presented by István. Since the research was done in the field of policy compliance and GDPR the approach is called Software-Defined Data Protection (SDP) Before going into the more detailed solution of István, the first subsections will give an overview about context and assumptions and the heterogeneous hardware. These will lead to the implications and decoupling inside the storage layer. Finally, the SDP pipeline and the challenges of SDP are considered.

Context and Assumptions

To understand the following approach better, the setup / context with its general assumptions will be shortly described in this subsection. In general the context is a company which stores it's data distributed. Further as shown in the figure (see Figure 1) the company tries to generate insights from this data by using analytics and other. The general goal is to support the processing in a GDPR compliant and more secure manner. The solution of István, which will be discussed in detail in the further section, is based on the following assumptions.

Company it self is trusted and tries to do it correct.
Storage nodes their self are not trusted, they might be tampered physical.
Analytics and Applications can be malicious.
A remote "controller" exists, which is trusted.

Heterogeneous Hardware

According to István, the processing overhead can be in general avoided by the usage of specialized / heterogeneous hardware. This is especially possible due to the pipelining of processing for example via programmable logic as FPGAs. Additional functionality uses extra space in FPGAs rather than time. That is why simple but perhaps very calculation intense operations might benefit much more by the usage, than complex logical decision problems. These facts lead to a design, which solves the dissent of adding policy compliance without adding much performance overhead. The idea of the general design will be described in the following subsection.

Decouple Decision from Enforcement

Since being GDPR-compliant might get complex very fast and might overload the FPGAs the "Divide and Conquer" methodology is used, to split up the complex problem into 2 smaller sub problems. The enforcement is getting decoupled from the decision problem. That way "simple" tasks might be done in specialized hardware to keep the performance benefit. The more complex decision problems are often much more individual and need flexible solutions. Their complexity does not directly correlate with their time consumption, but rather space, due to the states and edge cases which all have to been thought through. To enrich the terms of enforcement and decision, the figure (see Figure 2) of the presentation with the different examples in the context of GDPR is included here.

SDP Pipeline

Decoupling supports a simpler interface without adding a performance overhead. This can be more generalized into the presented Software-Defined Data Protection (SDP) pipeline (see Figure 3). The different GDPR requirements can be mapped to hardware implementations. These do access certain areas of the memory which is also used by the controller. That way the interface between enforcement and decision making is established.

Due to this separation via the interface the hardware implementations do get more independent from the controller in general and only need to implement the common interface. The same applies to the controller. So the decoupling adds the following 3 main benefits:

That even complex problems such as GDPR compliance can be supported by specialized hardware without blowing up the costs and also having no performance overhead.
Simplifies the specialized hardware logic and their implementations, due to the usage of an interface. The enforcement layer can always callback the controller, if the next action would be unclear.
Might offer the opportunity to develop the idea of an interface between these components further so functionality of distributed smart storage can be standardized and possibly reused or partially exchanged.

Challenges

Even if SDP represents a well designed interface some challenges still remain. Especially challenges independent the hardware implementation, but rather the controller. Some might also be beyond the scope and purpose the SDP approach try to solve. The following challenges were explained in more detail by István.

Translation from law into bits: In general laws are not designed for application logic. That is why the translation into hardware rules etc. is a complex task. The decoupling of decision and enforcement might help in terms of reduction regarding redundant work, but not the problem itself.

Trust Firmware: The assumptions only considered tampering with offline storages. If they are online the leakage of keys or other sensitive data remains a challenge. Especially while talking over distributed storages in the cloud an Trusted Execution Environment (TEE) might be necessary.

Tracking data beyond storage: The question of tracking the data beyond the storage is more about a long term question. More concrete specialized hardware might be helpful in this scenario but was not explained in more detail.

Conclusion

Having Privacy and ensuring policy compliance is a time consuming process. With SDP there now is a way to offer policy compliance with no performance overhead, because it's possible to split decisions from enforcement, which than can be used in state of the art hardware. But there are still challenges that remain. István gave the prospect combined with a wish, that in the future a unified interface could lead to an huge positive impact. SDP might be one possible candidate, but István explicitly did not claim it as the one solution. The unified interface might an opportunity to support the work around policy compliance. It would be conceivable to have a framework and avoid always starting from the scratch.

Software-Defined Data Protection: Low Overhead Policy Compliance at the Storage Layer is Within Reach!

Zsolt István, TU Darmstadt

Abstract

Biography

Summary

written by Florian Krummrey, Lasse Jahn

Background

FPGA

GDPR

Rethink Processing in Storage Layer

Context and Assumptions

Heterogeneous Hardware

Decouple Decision from Enforcement

SDP Pipeline

Challenges

Conclusion

Chair

News

09.08.2024 | Paper on Query Compilation for GPUs accepted at LWDA '24

18.07.2024 | Stork paper accepted at DATAI '24

08.03.2024 | CXL Buffer Management Paper Accepted at HardBD & Active '24

01.02.2024 | InferDB paper accepted at VLDB '24

01.02.2024 | POLAR paper accepted at VLDB '24

Events

24.03.2022 | FG DB Symposium

Directions