Code comprehension is an important part of the software development process. As most of the software development takes place in collaborative environments, programmers must read and understand code written by others. Because of the prominence of code comprehension in software developers' daily lives, there has been a continuous increase in research concerning the cognitive processes that take place during code comprehension and various ways of measuring those processes. In fact, a whole field of study has been born that looks into neurological data collection and analysis in the software development context - NeuroSE.
One of the popular ways to measure cognitive activity is electroencephalogram (EEG) - a non-invasive, temporally dense way of measuring the electrical activity of the brain. However, while traditional machine learning methods such as support vector machines were utilized to classify EEG data, they require a lengthy preprocessing step to be effectively used. Deep learning models, such as convolutional neural networks (CNN) and recurrent neural networks (RNN), do not require extensive preprocessing and can work on raw, temporally dense data. However, despite that fact, deep learning approaches see almost no use in the NeuroSE field.
Therefore, this Master's thesis aims to test CNN and RNN models against traditional machine-learning approaches on two EEG datasets with and without preprocessing. This would allow other practitioners to see whether there are any benefits in using deep learning models and whether preprocessing aids in the classification tasks or not. Additionally, this work will conduct an interpretability study to better understand the way models behave on the dataset and what features of the EEG data they consider important in the classification of code comprehension tasks.