My PhD thesis
Understanding Computer Programs: Computational and Cognitive Perspectives
Armando Solar-Lezama, Ev Fedorenko, Sijia Liu
9 May 2023
In this thesis, I study the understanding of computer programs (code) from two perspectives: computational and cognitive. I ask what the human bases of understanding code are, and attempt to determine whether computational models trained on code corpora (also known as code models) share similar bases.
From the computational perspective, I start by proposing a framework to test the robustness of the information learned by code models (chapter 2). This establishes a baseline measure for how well models comprehend code. I then describe techniques for improving the robustness of these models while retaining their accuracy (chapter 3). In an attempt at furthering the field of code model understanding, I propose a way forward for code models to learn and reason about concurrent programs from their execution traces (chapter 4). In doing so, I also demonstrate the limitations of heuristics developed over the past four decades for detecting data races in concurrent programs, highlighting the need for evaluating these heuristics further.
In the cognitive aspect, I study how our brains comprehend code using fMRI to analyze programmers’ brains (chapter 5). I show that our brains encode information about comprehended code similar to how code models encode that information (chapter 6). I show how the framework I develop in chapter 2 can be used to automatically generate stimuli for experiments in psycholinguistics and cognitive neuroscience (chapter 7), which can improve our understanding of how our minds and brains comprehend programs. Finally, I propose a probabilistic framework which models the mechanism of finding important parts of a program when comprehending it (chapter 8).
PDF (13 MB) - officially hosted on DSpace
PDF (1 MB) - Acknowledgements and Introduction sections of the thesis - These set the context and provide an overview of the questions I address in my work.