Ethical Data Engineering

Birgit Beck, TU Berlin

Abstract

Is there actually anything like “ethical data engineering”? At first sight, ethics and data engineering do not appear to have much in common, both with regards to content and from a methodological point of view. Ethics as an academic philosophical discipline applies theoretical, conceptual and analytic methods of enquiry, and is not concerned with the empirical task of collecting, structuring and interpreting (big) data, in the first place. However, data engineering obviously raises ethically relevant concerns. Moreover, since ethics as philosophical discipline is based on conceptual analysis, it appears reasonable to start with scrutinising how the notion of “ethical data engineering” can be properly understood. In this regard, I will propose three different possible meanings of the notion “ethical data engineering”:

(1) Data engineering with the aim of supporting or facilitating ethical reasoning and moral decision making, respectively.

(2) Data engineering with the aim of pursuing ethically warranted and morally laudable or even mandatory objectives.

(3) Data engineering which takes into account ethically warranted moral principles from the outset.

In my talk, I will explain the three proposed readings of the notion “ethical data engineering”, and provide examples for each of them in turn in order to stimulate discussion.

Biography

Prof. Dr. Birgit Beck (born on 27.02.1979 in Passau) studied philosophy and history (Ancient History /Medieval and Modern History) at the Universities of Regensburg and Passau. In 2008 she received her Magistra Artium at the University of Passau, where she also promoted in the field of philosophy in 2012.

Beck worked as a research assistant at Research Centre Jülich, Institute for Ethics in the Neurosciences. Since September 2017, Beck is an Assistant Professor (Juniorprofessorin) and Head of Department for „Ethics and Philosophy of Technology“ at TU Berlin at the Institute of History and Philosophy of Science, Technology, and Literature.^[1]

Amongst others she is currently researching ethical and societal consequences of technical innovations; she recently published an edited volume on bioethics^[2], the volume “Technology, Anthropology, and Dimensions of Responsibility”^[3] and the articles “The ART of Authenticity”^[4] and “Infantilisation Through Technology”^[5].

Summary

At first glance ethics and Data Engineering do not seem to have much in common. While ethics is theoretical and conceptual, engineering is something practical and empirical. How does that fit together?

In this lecture, Prof. Dr. Birgit Beck is aiming to determine if there is something like Ethical Data Engineering and if yes, to define it.

In the following, we are trying to summarize the lecture and to outline her main ideas and some thoughts brought up in the discussion. At first we are giving an overview of the used terminology and frequently occurring confusions. Then follows a section about the three approaches Beck uses to describe Ethical Data Engineering. We close with some thoughts about challenges for Ethical Data Engineering.

Terminology

To be able to define the term Ethical Data Engineering, we first need to be clear about the meaning of the different subterms.

Data Engineering describes the process of working with Big Data and developing (self-learning) algorithms for particular applications. It consists of collecting, structuring and validating data and thus is an empirical task.

When speaking of ethics, it is necessary to differentiate between ethics and morality. Morality encompasses the actual moral rules or practices in a given society as opposed to rules of courtesy and politeness which are based in mere societal conventions, or legal rules defined by law. Ethics on the other hand is concerned with the theory of morality, split into a descriptive and a normative view. The descriptive view on moral rules is actually an empirical task; it investigates people’s beliefs about what is right and what is wrong. Sociologists and psychologists for example work in this field of ethics. In contrast, philosophers work in the field of normative ethics, which is about the assessment and justification of moral rules. They try to answer the question on how people should react and what is morally good. This is a theoretical, conceptual and reflective task.

So “ethical” does not mean “morally good”, instead it only states that something refers to ethical theory. So Data Engineering itself is no ethical task because it has nothing to do with normative ethics. But we can assess whether Data Engineering is moral, for example reviewing its actions and their consequences.

Summarizing, Ethical Data Engineering should actually be called morally good Data Engineering.

Data engineering with the aim of supporting or facilitating ethical reasoning and (individual) moral decision making

Even though ethics is concerned with the theory of morality and not with the empirical task of collecting, structuring or validating data, these Data Engineering methods can be used to digitise ethical theorising. So Ethical Data Engineering could mean using Data Engineering to support or facilitate ethical reasoning and moral decision making. An example is the idea of an “Artificial Moral Advisor”^[6] or short “AMA”. The AMA could be programmed with a person’s morals so that it could reflect these morals in deciding on how the person should behave in certain day to day situations. Generally in life, we want to comply with our own moral standards, yet that is not possible in all situations we encounter daily. Take for example a person who wants to help save the environment by separating waste correctly. He or she may want to bin an empty cup, but does not know which materials the cup is made of or which recycle mechanism it needs to go through, maybe even which company will recycle the cup in the best way. The person would need to research all these questions in order to make a fact-based decision on where to throw the empty cup. But in everyday life, most people do not have the time to do all this research and to weigh all their options to reach a decision which reflects their own morals. That is where the AMA comes in. With its computing resources, it could investigate all important and realistic options and give a recommendation on what to do based on its training data. So it could function as a moral advisor or even persuader in a way of “If these are your morals, then you should do this and that” and help people behave better.

But how do we make sure the advisor has good morals? We cannot solely rely on the morals of single people because they have faults and will not behave morally in every decision. For example imagine an immoral person who wants to cause chaos and suffering in the world. Could the AMA advise this person to kill herself? What do we do if the AMA produces advice that strongly conflicts with our intuition or our need for happiness?
The question of good morals also depends on the underlying school of ethics. If the AMA comes from a background of negative utilitarianism, it might advise humanity to become extinct, firstly to end suffering of humans, because living means suffering, and secondly to end suffering of animals, plants, and the planet itself. But if the AMA was programmed to be hedonistic, it could propose to exploit the planet even more for momentary pleasure and indulgence regardless of the long-term consequences, because people deserve to be happy.

To overcome these difficulties, experts in ethics need to set constraints as basic filters for the AMA to rule out certain options.

Another question that emerges is: “What are the consequences of having such an artificial advisor?” If people start relying on the suggestions of the advisor, the field of ethics would shrink because it would kill the discussion about ethics. Also, would we allow the advisor to advance on its own, let it learn from the consequences of its own actions? And if we do that, could humanity still progress and develop further? People then would be tempted to give up responsibility of their own actions and lose autonomy.

Also, imagining that we as IT Systems Engineers could use this advisor, it would be nice if it was able to tell us whether we should even compile certain source code or not. Whether using or providing a certain program to others is actually morally compliant with our set of standards. But this raises the question, whether there even can be such an artificial intelligence that knows whether code is good or bad, because we actually cannot assess which value the code will provide or what results it will bring because that depends on the data that is put into the program and the purpose it is used for.

Regarding all these questions and probably many more, there needs to be a thoughtful and elaborate discussion on the ideal ethical foundation for such an artificial advisor if it is ever going to be used in real life.

Data Engineering with the aim of pursuing morally good goals

The second approach to define Ethical Data Engineering is to consider its aim as the realization of ethically justified and morally good goals. Data Engineering might offer solutions to achieve this with more and/or different ways. Beck analyzes four diversified pursued objectives that are widely regarded as ethically justified and morally good: justice, health, safety, sustainability.

Regarding justice, an often discussed topic is automated decision making. To maximize fairness of the process and its outcome it could be tried to implement those applications with the help of Data Engineering. Thereby use cases like the selection of applicants as employees could be improved.
Automating human decision-making using algorithms is often rumoured to provide an absence of bias. A lot of literature stands against this theory and shows the opposite. It is said that algorithms inevitably make biased decisions, because they reflect (if only in small quantities) their designer’s values and intended uses.^[7]

The medical sector is expected to be improved with the use of Big Data. Prof. Beck mentions personalised health care, preventive measures, tailored diagnostics and therapy exemplary. Data Engineering could also advance biomedical research and support clinical studies.

Most of accidents in Germany happen at home, as of 2016.^[8] To prevent these domestic accidents, instances of ambient intelligence, e.g. smart homes, are considered a possible (small) solution. They would also provide an opportunity to elderly people to stay in their homes instead of moving to residential care homes.
The second highest reason of accidents are due to traffic^[9] and might also be reduced using Data Engineering. Traffic safety is expected to be increased by the introduction of autonomous transportation and thelike.

Sustainability is closely combined with efforts regarding climate protection. Climate engineering technologies like Carbon Dioxide Removal^[10] and Radiation Management^[11] are being tried to improve through the use of Big Data applications. Data Engineering is most visible in the field of energy while the “[...] exploitation of renewable and distributed energy generation is in fact a critical element in reducing CO2 emissions.”^[12]

Data Engineering which integrates ethical deliberation in the process

A third proposition for the meaning of Ethical Data Engineering is pursuing a kind of Data Engineering that integrates ethical deliberation in the process.

Technology is never neutral, there always is a human in the loop, so values and norms are implicitly implemented. This may be through wrong assumptions when building algorithms, non-representative data selections or discarding unexpected results as mistakes.

At the same time, the engineering process often mostly focuses on efficiency, output or monetization. If alongside this, ethical deliberation is part of the process, you get a kind of Ethical Data Engineering that includes actively thinking about which values and norms are inscribed into the product and what other ethical deliberation there could be during data collection, structuring or validation.

This includes questioning who wants to use what data for which purpose as well as asking ourselves who is taking responsibility for ethical or legal consequences. A good start might be to ask whether justified ethical values like privacy or autonomy are guaranteed by design, and what ethical theory the developers might be following.

Beyond that, ethics can’t help to decide what’s right or wrong, but spark reflection on which morals to follow. That’s why teaching data engineers about ethics might be important.

Epistemic and Normative Challenges

To become better at determining the potential and actual impact of algorithms, the authors of “The ethics of algorithms”^[13] identify several challenges for Ethical Data Engineering such as

Inconclusive evidence, meaning that results may come with such a large uncertainty that they don’t offer any actionable insight.
Inscrutable evidence, meaning that the way how the results actually came to be might not be accessible or comprehensible.
Misguided evidence, where the input data is flawed in a way that the result is bad (“garbage in - garbage out”).
Unfair outcomes, meaning the already mentioned bias against groups or individuals (unintentionally) inscribed into the system by its designers or users.
Transformative effects, where the algorithm’s output influences how we see the world and therefore maybe also what we expect from the algorithm itself. In case of the Artificial Moral Advisor, in an extreme case, this effect might stop the development of human morality altogether.
Traceability, meaning that in case of tracing the problem back to the cause and determining someone responsible might be impossible.

These are all challenges that must be kept in mind as a developer and as a user of decision-making-systems.

Summary

Prof. Beck presented three approaches to define Ethical Data Engineering in this lecture. The first was to use Data Engineering to support moral decision making in the form of an Artificial Moral Advisor that reflects a person’s morals. The advisor could research realistic options in situations where people do not know what option complies best with their own moral rules in order to help them decide how to behave correctly. On the other side, you can also practice Data Engineering with the aim to pursue something morally good. This means that the term ethical would refer to the anticipated consequences. At last, deliberately considering ethical consequences in the engineering process and assessing the product’s morality is a form of Ethical Data Engineering.

We can summarize that there is something like Ethical Data Engineering. It is not only necessary to know about the challenges of Ethical Data Engineering, but also to teach it. Prof. Beck points out that we have to change the way people study Data Engineering. We need people to reflect about these questions and start doing this earlier. In matters of studying, Prof. Beck also brought up the aspect of interdisciplinary teams. Put lawyers, ethicists, and engineers together from the beginning to provide more perspectives.
We should discuss these topics publicly. It seems that most people are not aware of all facts, more debates might help. But at the end we need to find out what is more valuable to us and decide for ourselves.

References:

[1] “Prof. Dr. Birgit Beck.” Last accessed on 28.01.2020. www.philosophie.tu-berlin.de/menue/fachgebiete/ethik_und_technikphilosophie/prof_dr_birgit_beck/

[2] Kurreck, Jens; Beck, Birgit (Eds.). Kursbuch Bioethik. Berlin: TU Universitätsverlag 2019.

[3] Beck, Birgit; Kühler, Michael (Eds.). “Technology, Anthropology, and Dimensions of Responsibility.” Techno:Phil – Aktuelle Herausforderungen der Technikphilosophie, Vol. 1. Stuttgart: J.B. Metzler 2020.

[4] Beck, Birgit. “The ART of Authenticity”. In: Kühler, Michael; Mitrović, Veselin (Eds.). Theories of the Self and Autonomy in Medical Ethics. Springer 2020.

[5] Beck, Birgit. “Infantilisation Through Technology”. In: Beck, Birgit; Kühler, Michael (Eds.). Technology, Anthropology, and Dimensions of Responsibility. Stuttgart: J.B. Metzler 2020.

[6] Giubilini, A., Savulescu, J. The Artificial Moral Advisor. The “Ideal Observer” Meets Artificial Intelligence. Philos. Technol. 31, 169–188 (2018)

[7] Mittelstadt, Brent Daniel; Allo, Patrick; Taddeo, Mariarosaria; Wachter, Sandra; Floridi, Luciano. “The ethics of algorithms: Mapping the debate.” Big Data & Society, July–December 2016, p. 7.

[8] Radtke, Rainer. “Anzahl der Todesfälle in Deutschland aufgrund von Unfällen nach Unfallkategorie in den Jahren 2014 bis 2016”. Published on 01.04.2019, last accessed on 29.01.2020. de.statista.com/statistik/daten/studie/182904/umfrage/todesfaelle-in-deutschland-aufgrund-von-unfaellen/.

[9] ibid.

[10] Carbon Dioxide Removal describes a bundle of technologies with the objective to remove carbon dioxide from the atmosphere in a large scale.

[11] (Solar) Radiation Management is a type of climate engineering seeking to reflect sunlight and thus reduce global warming.

[12] Giest, Sarah. “Big data analytics for mitigating carbon emissions in smart cities: opportunities and challenges.” European Planning Studie, Volume 25 Nr. 6 / 2017, p. 942.

[13] Mittelstadt 2016, p. 4.

Ethical Data Engineering

Birgit Beck, TU Berlin

Abstract

Biography

Summary

Terminology

Data engineering with the aim of supporting or facilitating ethical reasoning and (individual) moral decision making

Data Engineering with the aim of pursuing morally good goals

Data Engineering which integrates ethical deliberation in the process

Epistemic and Normative Challenges

Summary

References:

Chair

News

20.11.2024 | Paper on Ecological Efficiency of Database Servers Accepted at CIDR 2025

09.08.2024 | Paper on Query Compilation for GPUs accepted at LWDA '24

18.07.2024 | Stork paper accepted at DATAI '24

08.03.2024 | CXL Buffer Management Paper Accepted at HardBD & Active '24

01.02.2024 | InferDB paper accepted at VLDB '24

Events

24.03.2022 | FG DB Symposium

Directions