VMIL '25: Proceedings of the 17th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages

Full Citation in the ACM Digital Library

SESSION: Papers

MaTSa: Race Detection in Java

Parallel programs are prone to data races, which are concurrency bugs that are difficult to track and reproduce. Various attempts have been made to create or incorporate tools that aim to dynamically detect data races in Java, but most rely on external race detectors that: a) miss some of the nuances in the Java Memory Model (JMM), b) are too slow and complicated to be used in complex real-world applications, or c) produce a lot of false positive reports. In this paper, we present MaTSa, a tool built within OpenJDK, that aims to dynamically detect data races and offer informative pointers to the origin of the race. We evaluate MaTSa and detect several races in the Renaissance benchmark suite and the Quarkus framework, many of which have been reported and resulted in upstream fixes. We compare MaTSa to Java TSan, the only current state-of-the-art dynamic race detector that works on recent OpenJDK versions. We analyze issues with false positives and false negatives for both tools and explain the design decisions causing them. We found MaTSa to be 15x faster on average, while scaling to large programs not supported by other tools.

Copy-and-Patch Just-in-Time Compiler for R

Copy-and-patch is a technique for building baseline just-in-time compilers from existing interpreters. It has been successfully applied to languages such as Lua and Python. This paper reports on our experience using this technique to implement a compiler for the R programming language. We describe how this new compiler integrates with the GNU R virtual machine, present the key optimizations we implemented, and evaluate the feasibility of this approach for R. Copy-and-patch also allows extensions such as integration of the feedback recording required by multi-tier compilation. Our evaluation on 57 programs demonstrates very fast compilation times (980 bytecode instructions per millisecond), reasonable performance gains (1.15x–1.91x speedup over GNU R), and manageable implementation complexity.

ASTro: An AST-Based Reusable Optimization Framework

Partial evaluation of abstract syntax tree (AST) traversal interpreters removes interpretation overhead while maximizing developer productivity; a language author specifies only the behavior of each AST node, and the framework specializes whole programs automatically. Existing solutions, however, come with heavyweight toolchains and tightly coupled, platform-specific back-ends, making portability and deployment difficult.

We present ASTro, a lightweight framework that keeps the node-centric workflow but eliminates heavy dependencies. ASTro translates the partially evaluated interpreter into well-structured C source code that encourages aggressive inlining by commodity compilers, yielding competitive native code. Because the output is plain C, it can be rebuilt with any mainstream toolchain, reducing deployment effort. To support just-in-time use, every AST sub-tree receives a Merkle-tree hash; identical fragments share their compiled artifacts at astro-scale—across processes, machines, and deployments—so each piece is compiled once and reused many times.

This paper introduces ASTro, a framework for building interpreters and partial evaluators, along with its generator tool, ASTroGen. It shows that language authors can implement interpreters by specifying only the behavior of AST nodes. We present empirical measurements on micro benchmarks that quantify ASTro’s runtime performance.

Memory Tiering in Python Virtual Machine

Modern Python applications consume massive amounts of memory in data centers. Emerging memory technologies such as CXL have emerged as a pivotal interconnect for memory expansion. Prior efforts in memory tiering that relied on OS page or hardware counters information incurred notable overhead and lacked awareness of fine-grained object access patterns. Moreover, these tiering configurations cannot be tailored to individual Python applications, limiting their applicability in QoS-sensitive environments. In this paper, we introduce Memory Tiering in Python VM (MTP), an extension module built atop the popular CPython interpreter to support memory tiering in Python applications. MTP leverages reference count changes from garbage collection to infer object temperatures and reduces unnecessary migration overhead through a software-defined page temperature table. To the best of our knowledge, MTP is the first framework to offer portability, easy deployment, and per-application tiering customization for Python workloads.

Heterogeneous Translation of Scala-Like Function Types in Java-TX

Java-TX (i.e. Type eXtended) is a language based on Java. The two predominant new features are global type inference and real function types for lambda expressions. The latter enables concrete typing of lambda expressions (not only target typing as in Java), subtyping of function types, and direct evaluation of lambda expressions (β–reduction) without the need for additional type casts. In this paper, we extend this work by introducing a heterogeneous translation of those function types, allowing generic type information to be retained at runtime. Furthermore, we provide a concrete implementation for the interoperability between our function types and the target types in Java. Finally, we present how the combination of global type inference, real function types, and heterogeneous translation allows a novel, flexible handling of higher-order functions in Java-like languages.

Evaluating Candidate Instructions for Reliable Program Slowdown at the Compiler Level: Towards Supporting Fine-Grained Slowdown for Advanced Developer Tooling

Slowing down programs has surprisingly many use cases: it helps finding race conditions, enables speedup estimation, and allows us to assess a profiler’s accuracy. Yet, slowing down a program is complicated because today’s CPUs and runtime systems can optimize execution on the fly, making it challenging to preserve a program’s performance behavior to avoid introducing bias. We evaluate six x86 instruction candidates for controlled and fine-grained slowdown including NOP, MOV, and PAUSE. We tested each candidate’s ability to achieve an overhead of 100%, to maintain the profiler-observable performance behavior, and whether slowdown placement within basic blocks influences results. On an Intel Core i5-10600, our ex- periments suggest that only NOP and MOV instructions are suit- able. We believe these experiments can guide future research on advanced developer tooling that utilizes fine-granular slowdown at the machine-code level.

RuntimeSave: A Graph Database of Runtime Values

To persist variable values from running programs for development purposes, we currently recognize two strategies. Techniques based on examples are only useful to store small sample objects, while record-and-replay techniques are efficient but use opaque storage formats. We lack a middle ground offering acceptable scalability and easy queryability with standard tools. In this work-in-progress paper, we present RuntimeSave – a versatile approach to saving runtime values from the Java Virtual Machine (JVM) into a persistent Neo4j graph database. Its core idea is a two-layer graph model consisting of hashed and metadata nodes, inspired by Git internals. To reduce the written data volume, it packs certain object graph shapes into simpler ones and hashes them to provide partial deduplication. We also report a preliminary evaluation, applications, and future work ideas.