Semantic Code Models for Concept-aware Programming Environments
To design and implement a program, programmers choose analogies and metaphors to explain and understand programmatic concepts. In source code, they manifest themselves as a particular choice of names. The apparent imprecision that comes with names drawn from natural language is greatly outweighed by their role in program comprehension: reading such names is a starting point to understand the role of each software module in the domain and build testable hypotheses what the code at hand is doing.
On the one hand, understanding a program by looking for names that suggest a particular analogy can be a time-consuming process. On the other hand, a lack of awareness which concepts are present can lead to modularity issues, such as redundancy and architectural drift if concepts are misaligned with respect to the current module decomposition. If programming environments were aware of the meaning of names, tools could not only help programmers understand a code base and its evolution in terms of high-level concepts, but offer a wide range of concept-aware assistance while editing and debugging.
The challenge we are addressing is to automatically detect and relate high-level semantic concepts or metaphors from code. Recent approaches made use of topic models from the field of text mining, however, topic models are designed to mine concepts from collections of natural-language documents but not from programs. In this talk, we discuss graph-based concept models designed to explain static, dynamic, and evolutionary aspects of programs and to address common information needs in programming environments.