Prof. Dr. Jürgen Döllner

# Software Analysis and Visualization

Automated software analysis and visualization is concerned with giving insight into complex, large-scale, existing software systems ("legacy systems") and their static structure, dynamic behavior, and evolution. We address this need in industry by developing methods and techniques that support and automate the analysis and visualization of software systems.

Various technologies have been developed by the computer graphics systems research group of the Hasso-Plattner-Institute. The technologies tackle core challenges in the development and maintenance of complex software systems and aim at creating transparency during development, reducing costs, and decreasing quality-related risks.

## Software Mining Technology

Software Mining aims at extracting, analyzing, and interpreting information extracted about complex IT systems and their implementations in order to provide explorative and analytical insights into structure, evolution, and dynamics of these systems.

Software Mining facilitates processes in software engineering, contributes to cope with the significant risks of developing new and maintaining existing IT systems. Research in that field develops essential tools, techniques, and methods to effectively manage the complexity of future IT-based systems in our society.

## Software Quality Assessment and Monitoring Technology

Software Quality Assessment and Monitoring are specialized Software Mining Technologies being applied on quality-related information on complex software systems. Such information include quality-related code metrics, metrics about code modifications, i.e., about the system evolution, and on the code coverage of test suites executed at runtime. Quality Assessment is applied to obtain an initial view on the software system’s quality. Quality Monitoring serves as a “quality cockpit” that permits project managers to identify problematic development situations at an early stage.

Software Quality Assessment and Monitoring are essential technologies that help project managers to speed up software development and maintenance of complex software systems in the long run. It permits to effectively analyze which parts of the source code unnecessarily produces costs and increase risk due to low quality. Hence, the technologies permit project managers to focus resources and tackle the real quality problems, that is, those that are error-prone and slow down the development.

## Software Architecture Recovery Technology

Software Architecture Recovery aims at revealing the architecture of long-living (legacy) software systems as it is implemented in the code. Hence, Software Architecture Recovery is a specialized variant of Software Mining Technology, where architecture-related information is extracted from the source code, from runtime events, or from the repository that keeps track of code changes. This implicit architectural information is visualized in a way such that both modular structures and dependency relations become visible.

The architecture as it is implemented in the code, typically, differs significantly from models contained in design time documents. However, developers need to understand the implemented architecture before they modify code. Otherwise they introduce design anomalies and couplings that make it difficult to maintain the system. Software Architecture Recovery shows the architecture as it is, and by this, it speeds up development and reduces the risk of introducing bugs due to misunderstood architectural guidelines.

## Software Tracing Technology

Software Tracing is a technology to collect information about the behavior of a software system at execution time. Such information includes runtime data such as control flow information, e.g., function entry and exit events, or data on the state of the system, e.g., values being written into or being transferred between variables. Trace data is obtained by instrumenting the binary code of the software system. That is, the binary code is augmented with code that generates runtime events and serializes the data for further processing.

The tracing techniques having been developed by us use different instrumentation mechanisms that depend on the programming language and the runtime environment of the analyzed software system, e.g., native C/C++ on the Windows OS or Java-based components being executed within an application server.

Software tracing helps developers and maintainers to understand the runtime behavior of a software system, which would otherwise be difficult to be inspected. As an understanding of the runtime behavior is a prerequisite for a variety of software engineering tasks and software tracing can speed up the understanding process, the technology is helpful for software engineering tasks such as debugging, adapting existing features, or performance tuning. Software tracing is particularly helpful when long-living systems are concerned due to the typically existing lack of up-to-date documentation that developers could consult for speeding up the comprehension process.

## C/C++ Software Tracing Technology

C/C++ software tracing enables developers and also end-users to create traces, i.e., time-ordered collections of runtime events, from a C/C++ software system running natively on the operation system (e.g., Windows). The technology performs a combination of compile-time and execution-time instrumentation to enable users to switch the runtime event collection mechanism on and off at any point in time during execution. During the switching step, which takes less than some seconds, binary code of the running software system is augmented with additional code that creates and collects runtime events. While the mechanism is switched off, no performance overhead is experienced.

This technology is useful in kinds of situations: when the software system is executed in a test environment by developers and when it is executed on the customer’s site in the production environment.

During development, the technology enables developers to observe what is happening within the “black box” while the system executes. As the technology collects information about the complete execution history, it is the basis for powerful tools as complements to usual “point-in-time” debuggers.

In the production environment the technology can be helpful to reduce the typically cost-intensive procedure of finding out why a system fails in the production environment on the customer’s site even if the system behaves correctly in the test environment on the developers’ site. With the technology, customers of the software create a trace when the system does not behave as expected. This trace is sent to the developers, which enables them to quickly find the origin for the misbehavior without extensive “problem communication” between customer and development or even having to travel to the customer’s site and debug the problem there.

## Java Software Tracing Technology

Similar to C/C++ software tracing, Java software tracing enables developers and also end-users to capture the runtime behavior of a software system on code granularity. To inject event generating code into the Java system, aspect-orientation technology is exploited. When classes are loaded into the virtual machine, the classes’ byte code is augmented with event generating code. The Java software tracing technology can be used for analyzing desktop applications and applications running within a web/application server.

Web/application servers provide a runtime environment for Java components. For these systems, often the requirement exists that they should be kept operational without downtimes. Hence, when an incident occurs, such as the performance of the system goes down or a specific function of the system works not as expected, it is not possible to restart the system in “debug mode” and perform extensive runtime analysis of the system in the production environment. Instead, the developers need to reproduce the failure in the test environment --- typically without sufficient information on the internal processes of the system in the production environment, which results in large costs for such incidents.

With the Java software tracing technology, however, a trace is created that describes the internal system behavior on such a fine-granular way that developers can efficiently identify the reason for the unexpected behavior.

## Multithreaded System Analysis Technology

This technology collects information about the runtime behavior of multithreaded software systems and visualizes that data to support developers in solving multithreading-related software engineering problems, e.g., related to debugging, performance analysis, or system understanding in general. For each thread, a trace is created that permits developers to precisely understand (a) what each thread executes and (b) how the threads interact.

Multithreaded software systems pose difficult problems to developers that do not exist in singlethreaded systems. Examples of these problems include race conditions and deadlocks. A key to solve multithreading-related programming problems is knowledge about the exact timing behavior of threads. State-of-the-art tools, such as debuggers, provide only little support in understanding the timing behavior because --- due to their heavy-weight approach --- they typically affect the timing too much. A common solution for developers to gain insight into system behavior is therefore to manually introduce console output code that informs the developer about the threads’ execution states. With this approach, however, developers obtain only punctual information about system behavior.

The multithreading systems analysis technology integrates runtime data collection in a light-weight way without manual intervention and, by this, provides automatically created views that show in high detail how threads execute and interact.

## Distributed Systems Mining Technology

This technology enables the analysis and visualization of distributed software systems. It traces the internal behavior of the system's components and the messages exchanged among them. Automated trace analysis improves heuristic layout generation and provides additional information for the visualization.

Developers generally cannot cope with the inherent complexity of large distributed software systems. This technology aims to improve comprehension of the structure and behavior of these systems.

## Object-Oriented Traces Mining Technology

OO-Trace Mining is a technology that analyzes the behavior in object-oriented system that arise from communication between object. OO Trace Mining collects data about object creation and destruction and monitors the inter- and intra-object communication. This object-related data is analyzed, mined, and visualized afterwards to support developers in understanding the runtime behavior of the OO system.

Before a developer of an object-oriented software system can change any code line, e.g., when fixing a bug or when implementing a system feature, the developer needs to understand deeply the objects’ lifetimes and their communication patterns occurring at runtime. Furthermore, the developer needs to understand the state transitions of an object during its lifetime. OO Trace Mining helps developers to speed up gathering an understanding of the objects’ runtime characteristics. It reveals outlier behavior of particular objects and automatically creates models describing the objects’ state changes.

# Publications

## ViewFusion: Correlating Structure and Activity Views for Execution Traces

Trümper, Jonas; Telea, Alexandru; Döllner, Jürgen in Proceedings of the 10th Theory and Practice of Computer Graphics Conference page 45-52 . **Best Application-Paper** , European Association for Computer Graphics , 2012 .

Visualization of data on structure and related temporal activity supports the analysis of correlations between the two types of data. This is typically done by linked views. This has shortcomings with respect to efficient space usage and makes mapping the effect of user input into one view into the other view difficult. We propose here a novel, space-efficient technique that fuses' the two information spaces -- structure and activity -- in one view. We base our technique on the idea that user interaction should be simple, yet easy to understand and follow. We apply our technique, implemented in a prototype tool, for the understanding of software engineering datasets, namely static structure and execution traces of the Chromium web browser.
 Tags best_student_application-paper cgs correlation execution_traces fusion information_space linking program_comprehension software view visualization BibTeX @inproceedings{TTD2012, abstract = {Visualization of data on structure and related temporal activity supports the analysis of correlations between the two types of data. This is typically done by linked views. This has shortcomings with respect to efficient space usage and makes mapping the effect of user input into one view into the other view difficult. We propose here a novel, space-efficient technique that fuses' the two information spaces -- structure and activity -- in one view. We base our technique on the idea that user interaction should be simple, yet easy to understand and follow. We apply our technique, implemented in a prototype tool, for the understanding of software engineering datasets, namely static structure and execution traces of the Chromium web browser.}, address = {**Best Application-Paper**}, author = {Trümper, Jonas and Telea, Alexandru and Döllner, Jürgen}, booktitle = {Proceedings of the 10th Theory and Practice of Computer Graphics Conference}, month = 9, pages = {45-52}, publisher = {European Association for Computer Graphics}, title = {ViewFusion: Correlating Structure and Activity Views for Execution Traces}, year = 2012 }