SPLASH Companion 2022: Companion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

Full Citation in the ACM Digital Library

SESSION: Keynotes

(I Can’t Get No) Verification (Keynote)

We have seen much progress in (automated) program verification techniques and a few successful verification tools in the past few decades. Still, program verification is not as widely adopted as I hoped.

In this talk, I will discuss my experience in developing theories and tools for type-based program verification, perspectives on what hampers the broad adoption of formal verification, and a few half-baked ideas of how we might overcome the problem so that everyone can get (the benefits of) verification.

The State of Debugging in 2022 (Keynote)

The gap between the power of interactive debuggers and what most people actually use is wider than it has ever been. I will survey the capabilities of state-of-the-art debuggers (such as record-and-replay, reverse execution, and omniscience), say a bit about how they work, and explain how shifting hardware and software trends are working both for and against progress in these areas. I will discuss some of the observed reasons why developers don’t use interactive debuggers more, and what might be done about that, both inside and outside the research community.

Myths and Mythconceptions: What Does It Mean to Be a Programming Language, Anyhow? (Keynote)

Modern software is embedded in sociotechnical and physical systems. It relies on computational support from interdependent subsystems as well as non-code resources such as data, communications, sensors, and interactions with humans. General-purpose programming languages and mainstream programming language research both focus on symbolic notations with well-defined semantics that are used by professionals to create correct solutions to precisely-specified problems. However, these address only a modest portion of this modern software.

Persistent myths reinforce this focus. These myths provide a lens for examining modern software: Highly-trained professionals are outnumbered by vernacular developers; writing new code is dominated by composition of ill-specified software and non-software components; general-purpose languages and functional correctness are often less appropriate than domain-specific languages and fitness for task; and reasoning about software is challenged by uncertainty and non-determinism in the execution environment, especially with the advent of systems that rely on machine learning. The lens of our persistent myths illuminates emerging opportunities and challenges for programming language research.

An essay elaborating these ideas appears in the proceedings of the Fourth ACM SIGPLAN History of Programming Languages Conference (open access, https://doi.org/10.1145/3480947).

Improving the Quality of Creative Practices with Pattern Languages (Keynote)

In a changing society, individuals, teams, and society should keep learning and growing. However, the question is how to achieve that? In this keynote, I am introducing a method based on the utilization of pattern languages. Pattern language is the described knowledge of creative practices created by studying good practices.

You may know the “design patterns” of software architecture or programs, and they are a kind of pattern language. The idea of pattern language was originally invented by Christopher Alexander for collaborative design in architecture. Then, Kent Beck and Ward Cunningham proposed using this idea in software development. For the past 15 years, my colleague and I have created many pattern languages for human actions, including education, collaboration, business, everyday life, care, and DX. Additionally, more than 100,000 people already use our pattern languages in Japan.

We also developed a methodology for creating pattern languages and published papers on the same. Moreover, we held workshops in which participants could experience the created methodology, and several people already use it to create their pattern languages. For example, in O’Reilly’s book, which was written by Pini Reznik, Jamie Dobson, and Michelle Gienow (2019), it is mentioned that their pattern language was created on the basis of our method.

In this keynote, I will overview what is pattern language and show examples in various fields. Then, I will introduce the process of creating pattern languages and show how they can be used by anyone.

SESSION: Posters

Provably Correct Smart Contracts: An Approach using DeepSEA

It is possible to download a piece of software over the internet and then verify its correctness locally using an appropriate trusted proof system. However, on a blockchain like Ethereum, smart contracts cannot be altered once deployed. This guarantee of immutability makes it possible for end users to interact collectively with a 'networked' piece of software, with the same opportunity to verify its correctness. Formal verification of smart contracts on a blockchain therefore offers an unprecedented opportunity for end users to collectively interact with a deployed instance of software that they can verify while not relying on a central authority. All that is required to be trusted beyond the blockchain itself is an appropriate proof system, a component which always needs to be in the trusted computing base, and whose rules and definitions can be public knowledge. DeepSEA (Deep Simulation of Executable Abstractions) could serve as such a proof system.

Multiverse Notebook: A Notebook Environment for Safe and Efficient Exploration

Notebook environments are used widely in data analysis and machine learning because their interactive user interfaces fit well with exploratory tasks in these areas. However, the execution model of existing notebook environments is unsuitable for safe and efficient exploration because every code cell of a notebook runs in a single execution environment even for independent exploratory tasks, where unintended interference can arise among them. To resolve this problem, we developed Multiverse Notebook, a notebook environment that runs each cell in a separate execution environment and saves its execution state. In this poster, we present Multiverse Notebook and our techniques to reduce the application's time and space of Multiverse Notebook.

Explicit Code Reuse Recommendation

Code reuse is a common and recommended practice. However, to avoid code duplication reuse may require substantial refactoring effort, especially in legacy code with which the developer is not profoundly familiar. For this reason, developers often prefer to reimplement simple code fragments rather than properly reuse them. In this work, we describe the use of annotations to recommend a useful piece of code for subsequent reuse. These annotations then guard the code against changes that can complicate its extraction, thus allowing subsequent developers to easily locate and reuse it.

Composing Linear Types and Separation Logic Proofs of Memory Safety

Separation logic and linear types are two formal techniques for reasoning about the memory safety of programs with manual memory management. To obtain the advantages of both methods when proving that a program is memory-safe, we investigate how to unify the notion of the frame from lin- ear types and the notion of the frame from a memory safety characterisation inspired by separation logic. That allows us to investigate the necessary conditions under which linear types and separation logic proofs of memory safety may be successfully composed. The benefit of that composition would be to design a framework that supports verifying that a program is memory-safe by proving the memory safety of parts of it using the more appropriate technique and then composing the proofs to certify that the entire program is safe.

SESSION: Doctoral Symposium

Program Synthesis for Artifacts beyond Programs

Program synthesis is a technique to find a program automatically from the given set of programs, i.e., a grammar, to satisfy some user-provided specifications. Significant research over the years has led to the evolution of program synthesis from synthesizing string manipulation programs in Microsoft Excel to the logic locking of digital circuits. Though program synthesis was designed to synthesize programs automatically, recent works in this area show that it can also be used in other domains.

In this manuscript, we are trying to address three problems where we use program synthesis techniques to solve them. In the first problem, we try synthesizing fences and atomic blocks in concurrent programs under relaxed memory models. The second work focuses on synthesizing semantic actions of attribute grammars. The last work is about synthesizing abstract transformers for an abstract domain to automate abstract interpretation.

Formally Verified Resource Bounds through Implicit Computational Complexity

Automatic complexity analysis has not reached mainstream adoption due to outstanding challenges, such as scalability and usability, and no formally verified analyzer exists. However, the need to evaluate resource usage is crucial: even a guaranteed correct program, whose memory usage exceeds available resources, is unreliable. The field of Implicit Computational Complexity (ICC) offers a potential avenue to resolving some of these outstanding challenges by introducing unique, machine-independent, and flexible approaches to program analysis. But since ICC techniques are mostly theoretical, it is unclear how strongly these assumptions hold in practice. This project defines a 3-directional plan---focused on practical analysis, compiler-integration, and formal verification---to assess the suitability of ICC to address outstanding challenges in automatic complexity analysis.

Programming Support for Local-First Software: Enabling the Design of Privacy-Preserving Distributed Software without Relying on the Cloud

Local-first software manages and processes private data locally while still enabling collaboration between multiple parties connected via partially unreliable networks. Such software typically involves interactions with users and the execution environment (the outside world). The decentralized nature of local-first software paired with the unpredictability of interactions driven from the outside world impede reasoning about their correctness. Yet, existing solutions to develop local-first software do not provide safety guarantees and instead expect developers to reason about concurrent interactions in an environment with unreliable network conditions. This is too much to ask of application developers, who are usually not experts in designing distributed systems.

This work seeks to develop a programming model which facilitates the construction of local-first software and eradicates certain classes of safety and security problems by design. We do so by providing a dedicated local-first programming language and an accompanying automated verification procedure that can be integrated as part of the compilation process.

Proving Obliviousness of Probabilistic Algorithms with Formal Verification

Obliviousness is a security feature to protect sensitive information from an algorithm’s observable behaviours. For better run-time performance, many oblivious algorithms published recently are probabilistic instead of deterministic, but this also brings difficulties to reason about them. We introduce some challenges, works and further ideas in this paper, including the concrete contribution of extending PSL for verifying one of the random oblivious algorithms. We also present my further PhD research plan on this topic.

Towards Automated Updates of Software Dependencies

Modern software applications use existing software components to fast forward the development process, resulting in software dependencies. These components will evolve for various reasons and could also be updated to address critical issues existing in the component, like vulnerabilities that could impact the software that depend on them. Therefore, an application’s dependencies should be updated to their latest version to avoid these issues. However, changes in a software component can cause critical failures in the dependent software if the new version includes incompatibilities compared to its previous version. These incompatibilities between two library versions are known as breaking changes. Each software system that depends on the component will need to update its code to accommodate these breaking changes, making it difficult to automate the update process. This research analyzes a set of dependency updates and code changes made in response to fixing breaking changes in Java projects. The breaking changes will be used to create a comprehensive taxonomy of the types of breaking changes that occur in Java projects. The data gathered on the dependency updates will be analyzed to identify patterns that can be applied across different projects to automate the dependency update process.

Verification of Hardware and Software with Fuzzing and Proofs

Automating formal verification of safety and security prop-erties in both hardware and software systems is challengingdue to a number of issues. In this paper, we tried to addresstwo important challenges in this regard. The first challengethat we discuss is the scalable co-verification of hardwareand firmware in a modern system-on-chip (SoC) platformto guarantee end-to-end security of the system. We havediscussed two specific problems in this regard that we havetried to solve and our approach to address this challenge, (i) we designed a verification methodology for end-to-end secu-rity property verification for authenticated firmware loaderprotocol running on a SoC platform, and (ii) we proposeda unified framework called HyperFuzzing, for specifyinghardware security properties and automatically testing forviolations of these properties using fuzzing. The second chal-lenge we have discussed in this paper is to verify hardwareor software systems that use closed-box functions or opera-tions. To address this challenge, we have introduced a newtheory for SMT solvers, called closed-box function theory,and have implemented it in our prototype solver Sādhak. Our solver Sādhak can handle SMT constraints with closed-box functions, which can be used by verification and testingtools for solving closed-box constraints.

Grammar Inference for Ad Hoc Parsers

Any time we use common string functions like split, trim, or slice, we effectively perform parsing. Yet no one ever bothers to write down grammars for such ad hoc parsers. We propose a grammar inference system that allows programmers to get input grammars from unannotated source code “for free,” enabling a range of new possibilities, from interactive documentation to grammar-aware semantic change tracking. To this end, we introduce Panini, an intermediate representation with a novel refinement type system that incorporates domain knowledge of ad ‍hoc parsing.

Verification of Programs with Concealed Components

Real-world programs contain a multitude of concealed components whose formal semantics not available to verification engines, like third-party API calls, inline assembly and SIMD instructions, system calls, sampling assignments from probability distributions and library calls. Albiet the success of program verification, proving correctness of such "open" programs has remained a challenge.

Currently, this problem is handled by manually "closing" the program---by providing hand-written mocks that attempt to capture the behavior of these concealed components. Most often, writing such mock code (stubs) is not only ardious, but are often erroneous, thus raising serious questions on the whole endeavor. In light of this challenge, we came up with a novel technique, almost verification as an attempt to prove correctness for such "open" programs.

Towards a Verified Cost Model for Call-by-Push-Value

The λ-calculus is a fundamental model of computation. It provides a foundation for functional programming. Therefore, developing an effective cost model for the λ-calculus is essential for analysing the complexity of functional code. The call-by-push-value λ-calculus allows specifying various evaluation strategies within the same calculus. This enables programmers to specify how they want their program to evaluate within the calculus. There has been ongoing research in developing cost models for the λ-calculus with various evaluation strategies. Our project aims to extend this line of work by providing an effective verified cost model for the call-by-push-value λ-calculus in the HOL4 theorem prover. As a result, it will allow formal verification of the complexity of functional programs.

Modelling the Quantification of Technical Debt

Technical Debt (TD) captures the consequences of sub-optimal decisions made during the development of a software product. If not managed properly TD can be detrimental to the product in the long-term. To manage TD it needs to be quantified (measured). Many approaches to quantify TD have been proposed, but evaluating them is difficult. We propose TDQM --- the Technical Debt Quantification Model, that captures the important concepts in TD quantification and their relationship. TDQM has been used to compare and evaluate proposed quantification approaches. TDQM will also be used to advance our knowledge about TD quantification.

SESSION: Graduate Papers

Simple Extensible Programming through Precisely-Typed Open Recursion

In this abstract, we show that a small extension to the MLscript programming language gives a simple solution to the Expression Problem through precisely typed open recursion.

LoRe: Local-First Reactive Programming with Verified Safety Guarantees

Nowadays, the dominant design approach in distributed software is cloud-centric. This comes at the cost of several issues including loss of control over data ownership and privacy, lack of offline availability, poor latency, inefficient use of communication infrastructure, and waste of (powerful) computing resources on the edge. Local-first software presents an alternative approach where data is stored and managed locally and devices primarily communicate in a peer-to-peer manner. However, the decentralized nature of local-first software paired with the unpredictability of interactions driven from the outside world impede reasoning about correctness of such applications. Yet, existing solutions to develop local-first software do not provide safety guarantees and instead expect developers to reason about concurrent interactions in an environment with unreliable network conditions.

In this work, we propose LoRe, a programming language and compiler that automatically verifies developer-supplied safety properties for local-first applications.

Automated Verification for Real-Time Systems using Implicit Clocks and an Extended Antimirov Algorithm

The correctness of real-time systems depends both on the correct functionalities and the realtime constraints. To go beyond the existing Timed Automata based techniques, we propose a novel solution that integrates a modular Hoare-style forward verifier with a new term rewriting system (TRS) on Timed Effects (TimEffs). The main purposes are to increase the expressiveness, dy- namically create clocks, and efficiently solve constraints on the clocks. We formally define a core language Ct, generalizing the real-time systems, modeled using mutable variables and timed behavioral patterns, such as delay, deadline, interrupt, etc. Secondly, to capture real-time specifications, we introduce TimEffs, a new effects logic, that extends Regular Ex- pressions with dependent values and arithmetic constraints. Thirdly, the forward verifier infers temporal behaviors of given Ct programs, expressed in TimEffs. Lastly, we present a purely algebraic TRS, i.e., an extended Antimirov algorithm, to efficiently prove language inclusions between TimEffs. To demonstrate the proposal’s feasibility, we prototype the verification system; prove its soundness; report on experimental results.

CodeSpider: Automatic Code Querying with Multi-modal Conjunctive Query Synthesis

Querying code conveniently is an appealing goal to the software engineering community. This work advances this goal by presenting a multi-modal query synthesis technique. Given a natural language description and code examples, we synthesize a conjunctive query extracting positive examples and ignoring negative ones, which is further used to query desired constructs in a program. To prune the huge search space, we generate well-typed query sketches for refinement by analyzing code examples and API signatures. We also introduce two quantitative metrics to measure the quality of candidate queries and select the best one. We have implemented our approach as a tool named CodeSpider and evaluated it upon sixteen code querying tasks. Our experimental results demonstrate its effectiveness and efficiency.

ARENA: Enhancing Abstract Refinement for Neural Network Verification

Formal verification on neural networks is earnestly needed to guarantee its robustness property. In this paper, we propose an abstract refinement process that leverages the double description method and the new activation relaxation technique to improve the verification precision and efficiency. We implement our proposal into a verification framework named ARENA. And the experimental results show that ARENA yields significantly better verification precision compared to the existing abstract-refinement-based tool DeepSRGR. It also identifies adversarial examples, with reasonable execution efficiency. Lastly, it verifies more images than a state-of-the-art verifier PRIMA.

Foundationally Sound Annotation Verifier via Control Flow Splitting

We propose VST-A, a foundationally sound program verifier for assertion annotated C programs. Our approach combines the benefits of interactive provers as well as the readability of annotated programs. VST-A analyzes control flow graphs and reduces the program verification problem to a set of straightline Hoare triples, which correspond to the control flow paths between assertions. Because of the rich assertion language, not all reduced proof goals can be automatically checked, but the system allows users to prove residual proof goals using the full power of the Coq proof assistant.

A Study of the Impact of Callbacks in Staged Static+Dynamic Partial Analysis

Partial analysis is a program analysis technique used in compilation systems when the whole program is not available. Many recent promising approaches perform partial analysis statically that involves identifying the interprocedural dependencies across program elements. These generated dependencies further get evaluated during runtime while generating the final analysis result. However, as the application and library methods are analyzed independently during static analysis, these approaches do not account for the effect of dynamic features such as callbacks. Consequently, in such scenarios, the runtime (say the Java Virtual Machine) needs to discard the static-analysis results and use the existing imprecise builtin analyses. The primary goal of this work is to find out the percentage of objects and methods that may get affected by callbacks, and to propose possible techniques to enhance the generation of dependencies in their presence.

SESSION: Undergraduate Papers

Termination of Recursive Functions by Lexicographic Orders of Linear Combinations

This paper presents an improvement to Isabelle/HOL’s lex- icographic termination algorithm. This paper also shows how to encode positive vector-component maximisation as a linear program.

Qiwi: A Beginner Friendly Quantum Language

Acknowledging the inter-disciplinary aspect of quantum computing, there is a need to make quantum programming more accessible for its fast-growing community. With the advent of quantum computers with hundreds of qubits designing circuits manually is infeasible. Most current languages are low-level or involve a steep learning curve or lack control. Thus we propose a quantum programming language (QPL) that allows for high-level abstractions with intuitive syntax while still allowing users to directly construct circuits.

High-level languages decrease user control and caution leading to sub-optimal circuits. To prevent this, Qiwi allows for dead qubit elimination and has dynamically sized data types. We also introduce a mechanism to let the compiler decide the most optimal among several overloaded function definitions, based on statements following the call. Moreover, this work also presents circuits for if outside the control family of gates and a control for loop on quantum data.

Using Mutations to Analyze Formal Specifications

The result of a formal verification is only as good as the specifications. A key challenge in automated verification is therefore coming up with a strong specification. This is somewhat analogous to having a good test suite in software testing. A well-studied technique for improving and assessing a test suite is mutation testing. It is a method for systematically modifying the source code to determine whether the resulting “mutants” can be “caught” by the tests. This paper explores mutation verification, i.e., the use of mutations to strengthen and assess the quality of formal specifications used in a verification tool. We built a prototype for mutation verification—dubbed Gambit—and used it for an industrial tool that verifies smart contracts. Early results indicate that Gambit generates useful mutants that can provide valuable insights about the coverage of the specifications.