GPCE 2021: Proceedings of the 20th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences

Full Citation in the ACM Digital Library

SESSION: Keynote

Live programming and programming by example: better together (keynote)

Live programming is a paradigm in which values from program execution are shown to the programmer through continual feedback. Programming by example is a paradigm in which code is synthesized from example values showing a desired behavior. This talk presents some of our recent research that combines these two paradigms in beneficial ways. I will walk through our ideas, explain our contributions, discuss what we learned and finally provide thoughts for the future.

SESSION: Papers

The life cycle of features in highly-configurable software systems evolving in space and time

Feature annotation based on preprocessor directives is the most common mechanism in Highly-Configurable Software Systems (HCSSs) to manage variability. However, it is challenging to understand, maintain, and evolve feature fragments guarded by #ifdef directives. Yet, despite HCSSs being implemented in Version Control Systems, the support for evolving features in space and time is still limited. To extend the knowledge on this topic, we analyze the feature life cycle in space and time. Specifically, we introduce an automated mining approach and apply it to four HCSSs, analyzing commits of their entire development life cycle (13 to 20 years and 37,500 commits). This goes beyond existing studies, which investigated only differences between specific releases or entire systems. Our results show that features undergo frequent changes, often with substantial modifications of their code. The findings of our empirical analyses stress the need for better support of system evolution in space and time at the level of features. In addition to these analyses, we contribute an automated mining approach for the analysis of system evolution at the level of features. Furthermore, we also make available our dataset to foster new studies on feature evolution in HCSSs.

Towards supporting SPL engineering in low-code platforms using a DSL approach

Low-code application platforms enable citizen developers to autonomously build complete applications, such as web applications or mobile applications. Some of these platforms also offer support for reuse to facilitate the development of similar applications. The offered mechanisms are usually elementary, they allow module reuse or building a new application from a template. However, they are insufficient to achieve the industrial level reuse necessary for software product lines (SPL). In fact, these platforms were conceived to help build standalone applications, not software families and even fewer software product lines. In this paper, we argue that the major limitation is that these platforms seldom provide access to their metamodel, the access to applications’ models and code is also limited and, therefore, makes it harder to analyze commonality and variability and construct models based on it. An approach is proposed to surpass these limitations: firstly, a metamodel of the applications built with the platform is obtained, and then, based on the metamodel, a domain-specific language (DSL) that can express the models of the applications, including variability, is constructed. With this DSL, users can combine and reuse models from different applications to explore and build similar applications. The solution is illustrated with an industrial case study. A discussion of the results is presented as well as its limitations and related work. The authors hope that this work provides inspiration and some ideas that the community can explore to facilitate the adoption and implementation of SPLs in the context, and supported by, low-code platforms.

A variational database management system

Many problems require working with data that varies in its structure and content. Current approaches, such as schema evolution or data integration tools, are highly tailored to specific kinds of variation in databases. While these approaches work well in their roles, they do not address all kinds of variation and do address the interaction of different kinds of variation in databases. In this paper, we define a framework for capturing variation as a generic and orthogonal con- cern in relational databases. We define variational schemas, variational databases, and variational queries for capturing variation in the structure, content, and information needs of relational databases, respectively. We define a type system that ensures variational queries are consistent with respect to a variational schema. Finally, we design and implement a variational database management system as an abstraction layer over a traditional relational database management system. Using previously developed use cases, we show the feasibility of our framework and demonstrate the performance of different approaches used in our system

Metaprogramming with combinators

There are a wide array of methods for writing code generators. We advocate for a point in the design space, which we call metaprogramming with combinators, where programmers use (and write) combinator libraries that directly manipulate object language terms. The key language feature that makes this style of programming palatable is quasiquotation. Our approach leverages quasiquotation and other host language features to provide what is essentially a rich, well-typed macro language. Unlike other approaches, metaprogramming with combinators allows full control over generated code, thereby also providing full control over performance and resource usage. This control does not require sacrificing the ability to write high-level abstractions. We demonstrate metaprogramming with combinators through several code generators written in Haskell that produce VHDL targeted to Xilinx FPGAs.

Artifact and reference models for generative machine learning frameworks and build systems

Machine learning is a discipline which has become ubiquitous in the last few years. While the research of machine learning algorithms is very active and continues to reveal astonishing possibilities on a regular basis, the wide usage of these algorithms is shifting the research focus to the integration, maintenance, and evolution of AI-driven systems. Although there is a variety of machine learning frameworks on the market, there is little support for process automation and DevOps in machine learning-driven projects. In this paper, we discuss how metamodels can support the development of deep learning frameworks and help deal with the steadily increasing variety of learning algorithms. In particular, we present a deep learning-oriented artifact model which serves as a foundation for build automation and data management in iterative, machine learning-driven development processes. Furthermore, we show how schema and reference models can be used to structure and maintain a versatile deep learning framework. Feasibility is demonstrated on several state-of-the-art examples from the domains of image and natural language processing as well as decision making and autonomous driving.

Delta-based verification of software product families

The quest for feature- and family-oriented deductive verification of software product lines resulted in several proposals. In this paper we look at delta-oriented modeling of product lines and combine two new ideas: first, we extend Hähnle & Schaefer’s delta-oriented version of Liskov’s substitution principle for behavioral subtyping to work also for overridden behavior in benign cases. For this to succeed, programs need to be in a certain normal form. The required normal form turns out to be achievable in many cases by a set of program transformations, whose correctness is ensured by the recent technique of abstract execution. This is a generalization of symbolic execution that permits reasoning about abstract code elements. It is needed, because code deltas contain partially unknown code contexts in terms of “original” calls.

Second, we devise a modular verification procedure for deltas based on abstract execution, representing deltas as abstract programs calling into unknown contexts.

The result is a “delta-based” verification approach, where each modification of a method in a code delta is verified in isolation, but which overcomes the strict limitations of behavioral subtyping and works for many practical programs. The latter claim is substantiated with case studies and benchmarks.

Extracting the power of dependent types

Most existing programming languages provide little support to formally state and prove properties about programs. Adding such capabilities is far from trivial, as it requires significant re-engineering of the existing compilers and tools. This paper proposes a novel technique to write correct-by-construction programs in languages without built-in verification capabilities, while maintaining the ability to use existing tools. This is achieved in three steps. Firstly, we give a shallow embedding of the language (or a subset) into a dependently typed language. Secondly, we write a program in that embedding, and we use dependent types to guarantee correctness properties of interest within the embedding. Thirdly, we extract a program written in the original language, so it can be used with existing compilers and tools.

Our main insight is that it is possible to express all three steps in a single language that supports both dependent types and reflection. Essentially, this allows us to express a program, its formal properties, and a compiler for it hand-in-hand, offering a lot of flexibility to programmers. We demonstrate this three-step approach by embedding a subset of the PostScript language in Agda, and illustrating it with several short examples. Thus we use the power of reflection to bring the benefits of dependent types to languages that had to go without them so far.

Lifted termination analysis by abstract interpretation and its applications

This paper is focused on proving termination for program families with numerical features by using abstract interpretation. Furthermore, we present an interesting application of the above lifted termination analysis for resolving “sketches”, i.e. partial programs with missing numerical parameters (holes), such that the resulting complete programs always terminate. To successfully address the above problems, we employ an abstract interpretation-based framework for inferring sufficient preconditions for termination of single programs that synthesizes piecewise-defined ranking functions.

We introduce a novel lifted decision tree domain for termination, in which decision nodes contain linear constraints defined over numerical features and leaf nodes contain piecewise ranking functions defined over program variables. Moreover, we encode a program sketch as a program family, thereby allowing the use of the lifted termination analysis as a program sketcher. In particular, we aim to find the variants (family members) that terminate under all possible inputs, which represent the correct sketch realizations.

We have implemented an experimental lifted termination analyzer, called SPLFuncTion, for proving termination of #if-based C program families and for termination-directed resolving of C program sketches. We have evaluated our approach by a set of loop benchmarks from SV-COMP, and experimental results confirm the effectiveness our approach.

Multi-stage programming with generative and analytical macros

In metaprogramming, code generation and code analysis are complementary. Traditionally, principled metaprogramming extensions for programming languages, like MetaML and BER MetaOCaml, offer strong foundations for code generation but lack equivalent support for code analysis. Similarly, existing macro systems are biased towards the code generation aspect.

In this work, we present a calculus for macros featuring both code generation and code analysis. The calculus directly models separate compilation of macros, internalizing a commonly neglected aspect of macros. The system ensures that the generated code is well-typed and hygienic.

We implement our system in Scala 3, provide a formalization, and prove its soundness.

Compiling pattern matching to in-place modifications

Algebraic data types and pattern matching are popular tools to build programs manipulating complex datastructures in a safe yet efficient manner. On top of its safety advantages, compilation techniques can turn pattern matching into highly efficient deconstruction code for immutable use cases. Conversely, high-performance datastructures and languages prefer to leverage (controlled) mutations to maximize time and memory efficiency. Algebraic data types provide a natural framework to efficiently describe in-place transformations as rewrite rules. Such representation could take advantage of parallelism opportunities that appear in tree-like structures.

We present early steps towards a new technique to compile pattern matching as parallel in-place modifications of the underlying memory representation. Towards this goal, we combine the usual language approach which is common in pattern-matching compilation with tools from the polyhedral model, which is commonly used in high-performance code generation to output efficient C code.

We present our formalism, along with a prototype implementation.

HACCLE: metaprogramming for secure multi-party computation

Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of cryptography and systems. And while the steps to arrive at a solution for a particular application are often straightforward, it remains difficult to make the implementation efficient, and tedious to apply those same steps to a slightly different application from scratch. Hence, it is an important problem to design platforms for implementing Secure MPC applications with minimum effort and using techniques accessible to non-experts in cryptography.

In this paper, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain, specifically targeted to MPC applications. HACCLE contains an embedded domain-specific language Harpoon, for software developers without cryptographic expertise to write MPC-based programs, and uses Lightweight Modular Staging (LMS) for code generation.

Harpoon programs are compiled into acyclic circuits represented in HACCLE’s Intermediate Representation (HIR) that serves as an abstraction over different cryptographic protocols such as secret sharing, homomorphic encryption, or garbled circuits. Implementations of different cryptographic protocols serve as different backends of our toolchain. The extensible design of HIR allows cryptographic experts to plug in new primitives and protocols to realize computation. And the use of standard metaprogramming techniques lowers the development effort significantly.

We have implemented Harpoon and HACCLE, and used them to program interesting applications (e.g., secure auction) and key computation components of Secure MPC applications (e.g., matrix-vector multiplication and merge sort). We show that the performance is improved by using our optimization strategies and heuristics.

MADMAX: a DSL for explanatory decision making

MADMAX is a Haskell-embedded DSL for multi-attribute, multi-layered decision making. An important feature of this DSL is the ability to generate explanations of why a computed optimal solution is better than its alternatives.

The functional approach and Haskell's type system support a high-level formulation of decision-making problems, which facilitates a number of innovations, including the gradual evolution and adaptation of problem representations, a more user-friendly form of sensitivity analysis based on problem domain data, and fine-grained control over explanations.

On-stack replacement for program generators and source-to-source compilers

On-stack replacement (OSR) describes the ability to replace currently executing code with a different version, either a more optimized one (tiered execution) or a more general one (deoptimization to undo speculative optimization). While OSR is a key component in all modern VMs for languages like Java or JavaScript, OSR has only recently been studied as a more abstract program transformation, independent of language VMs. Still, previous work has only considered OSR in the context of low-level execution models based on stack frames, labels, and jumps.

With the goal of making OSR more broadly applicable, this paper presents a surprisingly simple pattern for implementing OSR in source-to-source compilers or explicit program generators that target languages with structured control flow (loops and conditionals). We evaluate our approach through experiments demonstrating both tiered execution and speculative optimization, based on representative code patterns in the context of a state-of-the-art in-memory database system that compiles SQL queries to C at runtime. We further show that casting OSR as a high-level transformation enables new speculative optimization patterns beyond what is commonly implemented in language VMs.

Leveraging relational concept analysis for automated feature location in software product lines

Formal Concept Analysis (FCA) has been introduced for almost a decade as a suitable method for Feature Location (FL) on a collection of product variants. Even though FCA-based FL techniques allow to locate the core of a feature implementation, they do not propose a solution to trace feature interactions to their implementation. Thus, the extracted traceability links (traces) are too inaccurate, and, in the context of SPL extraction, cannot be used to generate complete products.

In this paper, we propose to complement the FCA-based FL techniques by leveraging the power of Relational Concept Analysis, an extension of FCA to multi-relational data. From two given formal contexts, one for the product’s artefact and one for their features, our technique computes the traces that link the features and the feature interactions to their corresponding artefacts. Additionally, we introduce a stage that removes unnecessary features from the extracted traces, to make them easier to understand by an expert. Our FL technique can be applied at any artefact granularity (from files to statements) and independently from software languages.

The results show that our technique produces valid traces, from which we were able to completely rebuild the set of artefacts for each initial product. Moreover, they show that our trace reduction removes, on average, between 31% and 85% of unnecessary features from the traces.

Type-safe generation of modules in applicative and generative styles

The MetaML approach for multi-stage programming provides the static guarantee of type safety and scope safety for generated code, regardless of the values of static parameters. Modules are indispensable to build large-scale programs in ML-like languages, however, they have a performance problem. To solve this problem, several languages proposed recently allow one to generate ML-style modules. Unfortunately, those languages had the problems of limited expressiveness, incomplete proofs, and code explosion.

This paper proposes two-stage programming languages for module generation, which solve all the above issues. Our languages accommodate two styles: first-class modules with generative functors and second-class modules with applicative functors. Module generation in both styles is shown to have their own merits by concrete examples. We present type systems, and type-preserving translations from our languages to plain MetaOCaml. We also show the results of performance measurements, which confirms the effectiveness of our languages.

Understanding and improving model-driven IoT systems through accompanying digital twins

Developers questioning why their system behaves differently than expected often have to rely on time-consuming and error-prone manual analysis of log files. Understanding the behavior of Internet of Things (IoT) applications is a challenging task because they are not only inherently hard-to-trace distributed systems, but their integration with the environment via sensors adds another layer of complexity. Related work proposes to record data during the execution of the system, which can later be replayed to analyze the system. We apply the model-driven development approach to this idea and leverage digital twins to collect the required data. We enable developers to replay and analyze the system’s executions by applying model-to-model transformations. These transformations instrument component and connector (C&C) architecture models with components that reproduce the system’s environment based on the data recorded by the system’s digital twin. We validate and evaluate the feasibility of our approach using a heating, ventilation, and air conditioning (HVAC) case study. By facilitating the reproduction of the system’s behavior, our method lowers the barrier to understanding the behavior of model-driven IoT systems.