I am researching the physiological responses and cognitive load that software developers exhibit when comprehending source code. For this purpose, I use cost-effective body sensors that can be easily deployed in software development settings and for empirical software engineering research.
If you worked on a software development project before, you probably experienced that some source code can be straightforward to understand while other, semantically similar sections of code can be confusing and tedious to comprehend. Code comprehension is one of the major tasks software developers spend their time on [6, 10]. How understandable the source code of a software is, influences, among other things, its quality , and the work performance of developers working on it .
However, our knowledge of the cognitive processes and demands during code comprehension is limited. In the past, methods like interviews, think-aloud protocols, or behavioral measures such as the time taken to solve a task have been used to gain respective insights. Those methods have downsides, such as bias introduced during interviews, the influence of additional developer activities, such as speaking, on cognitive load during code comprehension, or only receiving a very limited number of data points .
Lowering costs of body sensors and improvements regarding their usability in diverse environments made them increasingly attractive to be applied in a variety of fields. Software engineering and more specifically code comprehension research is not an exception . Evaluating and researching the comprehensibility of code using physiological measurements yields the advantage of being able to continuously and passively record the reactions of developers in an objective way.
This research could inform decisions on how to write code, which programming languages to use, or how to teach programming in the most effective way. It could lead to a better understanding of why and where coding errors happen and thus increase the quality of software, saving trillions of dollars  (as cited in ).
In the following, the mentioned concepts of code comprehension and cognitive load will be explained in more detail.
If you want to change any source code, you first have to understand it. Thus, code comprehension is one of the activities software developers spend the most time on [6, 10].
There exist different theories of how programs are comprehended. According to Détienne , the one that explains the cognitive processes most completely states that developers create mental models of the program (elementary operations and control flow) and of the domain (entities and their relationships) during the process of understanding. According to this theory, constructing those models and switching between them is crucial to understanding and working with source code. These processes are constrained by the limited capacity of working memory. Thus, measuring code cognitive load (see below) provides us with relevant information about the comprehensibility of source code.
Assessing and quantifying the understandability of source code can also be approached by using software metrics.
A variety of metrics exist to evaluate software in different dimensions. In the context of this research topic metrics focusing on comprehensibility and cognitive complexity are relevant. However, the validity of such metrics is often questioned .
In , Détienne gives an example by considering the Halstead complexity measures. The original idea of using insights from psychology about the number of items one can usually hold in working memory to create a metric sounds intriguing. However, Détienne notes, that the metric fails to take other relevant insights into account such as chunking. The process of chunking reduces the number of items one needs to keep in working memory by grouping items into larger units.
Apart from such theoretical considerations there exist several ways of validating software metrics . In this context, evaluating physiological measurements adds another validation approach. This addition could lead to more a comprehensive validation of metrics that assess the understandability and cognitive complexity of source code, as well as to the creation of novel metrics.
In general, cognitive load can be described as the amount of working memory resources used. It can be divided into intrinsic, extraneous, and germane load. How much intrinsic load is evoked by a task depends on the inherent complexity of the information to be processed and the processers level of expertise. Extraneous load on the other hand is caused by the form in which the task is presented. Germane load is evoked by processes of creating permanent knowledge. While intrinsic load and germane load are unavoidable or desirable, reducing extraneous load leads to freeing up working memory capacities that can be distributed to the other two kinds of load, increasing task performance.  (as cited in )
In , Müller mentions that various studies have shown that high cognitive load is associated with a decrease in performance and an increased error rate. As described above, it can also indicate impeded comprehension of source code.
Relationships between cognitive load and physiological measurements have been researched in several studies. Below some examples will be given.
Electroencephalography (EEG) measures the electrical activity of populations of neurons in the brain. For this purpose, electrodes are placed on the scalp and voltage fluctuations are measured between pairs of electrodes. The corresponding measurements can be split into different frequency bands.
Prior studies found relationships between the power of certain frequency bands and cognitive load. Especially an increase in theta band power and a decrease in alpha band power have been discussed in this regard .
Recently, low-cost, wearable EEG devices have entered the consumer market.
Eye Tracking and Pupillometry
Eye tracking provides information such as where somebody is looking (gaze position), for how long (fixation length), how fast the eyes move between two fixation points (saccade velocity), or how frequently blinks occur (blink rate). While information is mostly taken in during fixations, saccades are rapid eye movements that happen in between fixations. Pupillometry deals with the size of the pupil and its reactions to changing demands.
In , Zagermann et al. provide an overview of the relationships between such measurements and cognitive load. They state that the higher the cognitive load, the:
- longer the fixations,
- lower the fixation rate,
- longer the saccades,
- higher the saccade velocity,
- larger the pupils,
- lower the blink rate,
- higher the blink latency.
Today, monitor-mounted devices and wearable glasses are available for eye tracking and pupillometry measurements.
Several other physiological measurements have been shown to be linked to cognitive load. Examples are: blood flow in the brain (measurable using, for example, fMRI or fNIRS devices), electrodermal activity, skin temperature, heart rate, heart rate variability, the heart's electrical activity, and respiratory rate. [4, 13]
Currently, I am working on the ethics-approved CogniPro study. It builds on the considerations made above. The goals of the study are to measure the cognitive load induced by reading and understanding source code using a setup of several cost-effective physiological sensors and to analyze the corresponding measured values (e.g., regarding their meaningfulness). Furthermore, the relationship between physiologically measured cognitive load and code complexity metrics will be investigated. Additionally, this study will also consider the use and utility of low-cost physiological sensors in empirical software engineering research. Last but not least, the collected data will be made available to the research community to increase the amount of data available to researchers in the field.
Code comprehension as well as control tasks will be presented on a monitor to the participants during the study. The comprehension tasks will ask the participants to provide the output of a code snippet given a certain input. During these tasks and during baseline recordings, the physiological measurements will be taken.
Participants of different levels of expertise are asked to take part in the study. Their expertise will be assessed using a validated questionnaire.
A multi-modal setup consisting of cost-effective EEG, eye tracking, pupillometry, and heart rate sensors will be used to collect the physiological data. The measures and derived cognitive load will be analyzed and compared to software metric scores, behavioral data (e.g. how long did a participant take to comprehend a code snippet), and subjective ratings of complexity by the participants.
- Visiting the HPI Research School at the University of California, Irvine (UCI).
- Contributing to the design and development of SensorHub.
 Antonenko, P., Paas, F., Grabner, R., & Van Gog, T. (2010). Using electroencephalography to measure cognitive load. Educational psychology review, 22(4), 425-438.
 Détienne, F. (2001). Software design–cognitive aspect. Springer Science & Business Media.
 Meneely, A., Smith, B., & Williams, L. (2013). Validating software metrics: A spectrum of philosophies. ACM Transactions on Software Engineering and Methodology (TOSEM), 21(4), 1-28.
 Müller, S. (2016). Using Biometric Sensors to Increase Developers' Productivity (Doctoral dissertation, University of Zurich).
 Peitek, N. (2022). A Neuro-Cognitive Perspective of Program Comprehension (Doctoral dissertation, Chemnitz University of Technology).
 Schröter, I., Krüger, J., Siegmund, J., & Leich, T. (2017, May). Comprehending studies on program comprehension. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC) (pp. 308-311). IEEE.
 Schankin, A., Berger, A., Holt, D. V., Hofmeister, J. C., Riedel, T., & Beigl, M. (2018, May). Descriptive compound identifier names improve source code comprehension. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) (pp. 31-3109). IEEE.
 Siegmund, J., Brechmann, A., Apel, S., Kästner, C., Liebig, J., Leich, T., & Saake, G. (2012, November). Toward measuring program comprehension with functional magnetic resonance imaging. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (pp. 1-4).
 Siegmund, J., Kästner, C., Apel, S., Brechmann, A., & Saake, G. (2013). Experience from measuring program comprehension-toward a general framework. Software Engineering 2013.
 Siegmund, J., & Schumann, J. (2015). Confounding parameters on program comprehension: a literature survey. Empirical Software Engineering, 20(4), 1159-1192.
 Sweller, J. (2010). Element interactivity and intrinsic, extraneous, and germane cognitive load. Educational psychology review, 22(2), 123-138.
 Tricentis (2017). Software Fail Watch: 5th Edition.
 Weber, B., Fischer, T., & Riedl, R. (2021). Brain and autonomic nervous system activity measurement in software engineering: A systematic literature review. Journal of Systems and Software, 178, 110946.
 Zagermann, J., Pfeil, U., & Reiterer, H. (2016, October). Measuring cognitive load using eye tracking technology in visual computing. In Proceedings of the sixth workshop on beyond time and errors on novel evaluation methods for visualization (pp. 78-85).