Analysis of Software Developers' Cognitive Load with Machine Learning

Annemarie Uhlig, Supervisors: Charlotte Brandebusemeyer (HPI), Dr.-Ing. Frédéric Li (DFKI), Falco Lentzsch (DFKI)

Master's Thesis

Software developers engage in a wide range of cognitively demanding tasks throughout
the workday, each requiring varying levels of mental effort. In addition to task
complexity, frequent changes in requirements and tight deadlines further elevate cognitive
load, making effective management of mental resources essential for maintaining
productivity and ensuring successful task execution. Therefore, this thesis examines
the cognitive load of software developers during their regular workday by utilizing
psycho-physiological sensor data collected by the Empatica E4 wristband. Following
an extensive data preprocessing and handcrafted feature extraction of the sensor modalities
electrodermal activity (EDA), blood volume pulse (BVP), acceleration, and skin
temperature, this thesis explores the prediction of cognitive load. The cognitive load
prediction, translated into a regression task, utilizes traditional machine learning, a
deep learning model, and various hybrid approaches. The traditional machine learning
models eXtreme Gradient Boosting (XGBoost) and Support Vector Machine (SVM)
utilized handcrafted features and were evaluated using k-Fold cross-validation (k-Fold)
(k = 5, with no overlap of participant data across folds) and leave-one-subject-out
cross-validation (LOSO) as a generalized approach. The deep learning model Long-
Short-Term-Memory (LSTM) and hybrid models, consisting of either an Temporal
Convolutional Network (TCN) or LSTM autoencoder, combined with either SVM or
XGBoost, were evaluated using k-Fold as another generalized approach. In addition,
XGBoost and SVM served as personalized models, evaluated with k-Fold (k = 5). The
personalized and generalized models were compared to identify which approach best
predicts cognitive load in a naturalistic work environment. Furthermore, the thesis
examines how contextual information (e.g., task categories) enhances cognitive load
prediction for XGBoost and SVM models. Besides the cognitive load prediction, traditional
machine learning models categorize the software developer’s task. For the task
categorization and cognitive load prediction, the Recursive Feature Elimination (RFE)
analyzes the importance of the handcrafted features for the generalized traditional machine
learning models. Finally, the effect of the sampling size on the performance of the
traditional cognitive load prediction model is analyzed. Personalized models, particularly
XGBoost, achieved the best performance across both regression and classification
tasks. On average, the personalized XGBoost models achieve an Mean Absoulte Error
(MAE) of 0.8987 for the regression and a F1 Score of 0.8940 for classification.
Overall, the results indicate that cognitive load is highly subjective and varies between
individuals. The feature importance analysis revealed that EDA is a key predictor of
cognitive load. These findings underscore the importance of personalized modeling and
its potential to inform cognitive load and task predictions in the real world based on
physiological data.