SOAP 2021: Proceedings of the 10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis

Full Citation in the ACM Digital Library

SESSION: Session 1

Static analysis for dummies: experiencing LiSA

Semantics-based static analysis requires a significant theoretical background before being able to design and implement a new analysis. Unfortunately, the development of even a toy static analyzer from scratch requires to implement an infrastructure (parser, control flow graphs representation, fixpoint algorithms, etc.) that is too demanding for bachelor and master students in computer science. This approach difficulty can condition the acquisition of skills on software verification which are of major importance for the design of secure systems. In this paper, we show how LiSA (Library for Static Analysis) can play a role in that respect. LiSA implements the basic infrastructure that allows a non-expert user to develop even simple analyses (e.g., dataflow and numerical non-relational domains) focusing only on the design of the appropriate representation of the property of interest and of the sound approximation of the program statements.

Security and quality: two sides of the same coin?

Poor software quality may hinder future extensions to software code. In contrast to functional bugs, such hidden issues are not immediately visible to developers and users, and the software may still be fully usable. Consequently, developers are not forced to fix these issues, not even to investigate them. Security vulnerabilities are hidden isssues as well. However, they can put systems and users’ data at risk and lead to financial losses as well as liability and fines under data protection acts. Therefore, from a risk minimization perspective, avoiding security issues may seem more critical than avoiding quality issues when dealing with limited development resources.

In this paper, we show that both types of hidden issues are correlated. Our study of more than 400 real-world Android apps shows that apps with a high number of quality issues are likely to also have a higher number of security vulnerabilities. We argue that security and quality issues should be seen as two sides of the same coin. We investigate which types of quality problems correlate with which types of security issues and give insights into potential causes.

SESSION: Session 2

Program analysis for reversible languages

Reversible computing is a technique to “let computations run forwards and backwards” and thus extends the traditional model of computation. As an example, consider a function to compute the factorial of a given number, say 5, yielding 120. Running this program backwards inverses the function by taking 120 as an input and giving 5 as result. Reversible programming languages allow the creation of programs that can be executed backwards as well as forwards and have been a focus of research over the last decade mostly due to the work of Glück, Yokoyama, Mogensen, and many others. In this paper, we report our recent activities to perform program analysis for reversible static-single-assignment form (RSSA) and using them to perform local and global optimizations. This work is based on our compiler translating from the reversible language Janus to RSSA. As far as we know, this is the first compiler from Janus to RSSA, and no results on optimization of reversible intermediate code or programming languages are known to us either. Optimization techniques in “traditional” compilers are always based on the understanding that programs are executed forwards - in reversible languages that assumption is no longer true and program analysis becomes much more difficult. Our first results on applying our analysis methods for common-subexpression-elimination and constant propagation are nevertheless promising and have been implemented successfully.

PerfLens: a data-driven performance bug detection and fix platform

The wealth of open-source software development artifacts available online creates a great opportunity to learn the patterns of performance improvements from data. In this paper, we present a data-driven approach to software performance improvement in C#. We first compile a large dataset of hundreds of performance improvements made in open source projects. We then leverage this data to build a tool called PerfLens for performance improvement recommendations via code search. PerfLens indexes the performance improvements, takes a codebase as an input and searches a pool of performance improvements for similar code. We show that when our system is further augmented with profiler data information our recommendations are more accurate. Our experiments show that PerfLens can suggest performance improvements with 90% accuracy when profiler data is available and 55% accuracy when it analyzes source code only.

Weldr: fusing binaries for simplified analysis

Modern binary analysis tools are geared primarily towards analyzing the behavior of a single binary in isolation. Frequently, they make overly conservative assumptions when faced with dynamic libraries and inter-process communication. Failure in both stem from the same source: the code involved is extremely complicated and tends to rely on kernel functionality invisible from userspace, which is exceptionally complicated to model.

In this work, we introduce Weldr, a tool that uses static linking to bring an entire distributed system within the scope of a single-binary analyzer to mitigate these shortcomings. Instead of forcing a single-binary analyzer to ingest multiple binaries, Weldr rebuilds (”welds”) all programs in the system into a single, monolithic binary. This provides a high-fidelity, analyzer-friendly model with critical library functionality replaced. By doing this, tracing data flows across an IPC-linked system is reduced to tracing data flows between memory buffers in a single address space, which is a problem to which existing tools can be applied. We tested both a comprehensive networking-based example and a set of real-world IoT middleware protocols and demonstrated in each case that the welded version of the system can be tested using the popular AFL fuzzer and other state-of-the-art analysis tools, whereas the original versions cannot. Fuzz testing the welded binary resulted in up to 12.2% more code coverage than running unit tests on the original multi-binary system. Through a comparison with gcc and clang, we show that Weldr incurs a minimal compilation overhead, making Weldr practical to use.

SESSION: Session 3

Multi-language static code analysis on the LARA framework

We propose a mechanism to raise the abstraction level of source-code analysis and robustly support multiple languages. Built on top of the LARA framework, it allows sharing language specifications between LARA source-to-source compilers, and enables the mapping of a virtual AST over the nodes of ASTs provided by different, unrelated parsers.

We use this approach to create a language specification for Object-Oriented (OO) languages and add support for three different LARA compilers. We evaluate it by implementing a library of 18 software metrics using this language specification and apply the metrics to source code in four programming languages (C, C++, Java, and JavaScript). We compare the results with other tools to evaluate the approach.

Serialization-aware call graph construction

Although call graphs are crucial for inter-procedural analyses, it is challenging to statically compute them for programs with dynamic features. Prior work focused on supporting certain kinds of dynamic features, but serialization-related features are still not very well supported. Therefore, we introduce Salsa, an approach to complement existing points-to analysis with respect to serialization-related features to enhance the call graph’s soundness while not greatly affecting its precision. We evaluate Salsa’s soundness, precision, and performance using 9 programs from the Java Call graph Assessment & Test Suite (CATS) and 4 programs from the XCorpus dataset. We compared Salsa against off-the-shelf call graph construction algorithms available on Soot, Doop, WALA, and OPAL. Our experiments showed that Salsa improved call graphs’ soundness while not greatly affecting their precision. We also observed that Salsa did not incur an extra overhead on the underlying pointer analysis method.

Scalable string analysis: an experience report

In this paper we present OLSA - a tool for scalable static string analysis of Java programs. OLSA is based on intra-procedural string value flow graphs connected via call-graph edges. Formally, this uses a context-sensitive grammar to generate the set of possible strings. The analysis is focused on scalability and is thus not sound. This trade off is acceptable in the context of bug-finding in large web applications.

We evaluate our approach by using OLSA to detect SQL injections and unsafe use of reflection in DaCapo benchmarks and a large internal Java codebase and compare the performance of OLSA with JSA, one of state-of-the-art string analysers. The results indicate that OLSA can analyse industrial-scale codebases in a matter of hours, whereas JSA does not scale to many DaCapo programs. The set of potential strings generated by our string analysis can be used for checking the validity of the reported potential vulnerabilities.