A brief discussion at a conference more than 15 years ago led to a collaborative research agenda that has been incredibly fun, intellectually challenging, and impactful. Internet protocols like BGP and DNS underlie everything that we do online, but they routinely break network security and reliability due to their complexity and fragility. While there are many moving parts and underlying causes, time and again the programming languages perspective has proven to be a powerful way to make progress. In short, everything is a program (even if it’s not), and once something is a program we can give it a precise semantics and bring to bear a host of techniques to reason about its behavior. I’ll illustrate this approach through examples from my work, but the ideas are applicable beyond computer networks. I’ll also distill some general lessons that I’ve learned about doing collaborative research and making an impact.
In his keynote address on AI Winter, Richard P. Gabriel delves into the recurring cycles of high expectations and subsequent disappointments in the field of artificial intelligence. He posits that AI, unlike other computer science problems, has goals that are easily understood yet inherently vague, akin to the subjective nature of success in artistic endeavors such as painting, writing, and music. Gabriel critiques the limitations of large language models (LLMs), noting that they lack the human ability to handle novel situations and discover new information, being constrained to knowledge acquired during training. He suggests that the comprehensive training of LLMs across all perspectives prevents them from developing a unique point of view, which could be detrimental to their creative capabilities. Gabriel provocatively speculates that a more limited, less knowledgeable LLM might better emulate human writers, learning to forget and thereby fostering a more genuine form of creativity.
To control access to their data and resources, AWS users write policies that express fine-grained permissions. An authorization engine evaluates these policies trillions of times a day to determine if access is allowed. The authorization engine is a critical part of the security, availability, and correctness of AWS. To raise the bar in the security and correctness of this engine, we replaced it with a formally verified one. It was absolutely critical that users were not impacted by the change. Over a period of months and over quadrillion tests, we gathered data on the impact, and deployed the new verified engine. And then what happens? Come to the talk to find out.
The theories required for program analysis have evolved past the foundations developed decades ago. Programs display computational effects such as nondeterminism, randomization, exceptions, and mutable state. Orthogonally, bug-finding is often more tangible than establishing correctness. Specialized techniques have been introduced to handle effects and incorrectness reasoning. While successful, the disjointed nature of those techniques means that new analysis systems must be built to analyze each kind of program and property. In my research, I am creating a logical foundation to consolidate the aforementioned theories, with which I plan to develop a new generation of static analysis tools that are simpler and more efficient than those available today.
Power modeling is an essential building block for computer systems in support of energy optimization, energy profiling, and energy-aware application development. In 2024, I introduced VESTA, a novel approach to modeling the power consumption of applications with one key insight: language runtime events are often correlated with a sustained level of power consumption. When compared with the established approach of power modeling based on hardware performance counters (HPCs), VESTA has the benefit of solely requiring application-scoped information and enabling a higher level of explainability, while achieving comparable or even higher precision. I hope to continue on this track of discovering the deep connections between VM-events and power consumption while also providing practical solutions for mapping one to the other through the construction of power models.
In this new era of AI with diverse hardware accelerators such as GPUs and quantum circuits, achieving system-wide robustness requires tackling issues throughout all system layers, spanning from software applications to hardware components. My research is to enhance the robustness of heterogeneity-enabled AI systems by reinventing software testing and analysis techniques via leveraging full-stack insights and advanced AI capabilities. I have completed one research project and have collaborated on a couple of others at the application and language levels. As the next steps, I will explore (1) holistic regression testing to prioritize test inputs associated with system-wide changes and (2) full-stack analysis to optimize computing resource allocation and reduce hardware reliance by analyzing application characteristics and using alternative resources in tandem.
Record-Replay (RR) is a useful technique for investigating concurrency related bugs whose appearance are subject to particular thread schedulings. RR logs/records the nondeterministic inputs that influence a program such as thread scheduling, shared memory operations, and external file/socket data. The recording can be replayed, providing the same inputs to replicate the bug. Unfortunately, RR is not good at patch validation. Mutable replay is when a recording generated by the buggy version of a program is replayed on the patched version. Patches typically introduce divergent behavior which often prevents replay entirely. In addition, current RR systems are slow for languages running on a Virtual Machine like Java. They limit parallelism while forcing the Garbage Collector and Just-In-Time compiler to behave eactly the same between the recording and replay. RR systems designed for Java replay mulithreaded code incorrectly or require the use of a customized JVM. To alleviate this, we developed a prototype RR system for Java called JMVX. It operates within Java bytecode, allowing it to tolerate benign divergences --- differences in a program's execution caused by functionally equivalent operations introduced by the garbage collector, just in time compiler, and class loading. JMVX supports multithreaded replay by providing a total ordering of monitor/lock operations via a vector clock. JMVX will serve as a foundation to experiment with partial order implementations of the vector clock and specialized support for thread pools. In addition, we plan to support mutable replay by allowing the replay to interact "live" with a mock version of a resource's snapshot.
Information Flow Control (IFC) provides security by preventing high-secrecy data from flowing to low-secrecy data, and low-integrity data from flowing to high-integrity data. However, prior approaches to IFC either suffer from imprecision and falsely identify security violations, or require modification of the programming language or compiler, reducing the practical utility for programmers. Recent work Cocoon demonstrates an approach to IFC that allows for precision without modifying the language or compiler, but is severely limited in its practical use by virtue of only supporting IFC for static secrecy. I have created an IFC approach called Carapace that adds support for both integrity and dynamic labels.
Data-centric systems play a crucial role in computer systems, and their correctness is of paramount importance. Logic bugs are one of the main types of bugs in data-centric systems, leading to incorrect results. Existing methods for testing data-centric systems face challenges such as ineffective test oracles, insufficient coverage of target functionalities, and low efficiency in test case generation. To address these issues, we propose a general, novel black-box methodology for testing various kinds of important data-centric systems. Our key insight is that we can incrementally generate the test case and use the intermediate results to construct the reference test case. The result of the original test case should be identical to that of the reference test case. We have applied this methodology to test both Datalog engines and DBMSs, successfully uncovering 75 unique bugs. We believe that this idea might be applicable to many other systems as well, and will next explore this idea to Deep Learning libraries.
Zero-knowledge virtual machines (zkVMs) enable verifiable computation on via succinct Zero-knowledge proofs (ZKPs). However, current zkVMs, still in development, show many bugs. This paper introduces a parameterized framework for the formal verification of zkVMs in Coq. We prove the soundness and completeness of the constraint generation algorithm from machine instructions to semantics-level constraints. Existing works target specific zkVMs, and require repeated proof work in this phase, whereas our proofs are parameterized and can be reused in development and by all zkVMs. We also demonstrate the generality of our framework by instantiation on two examples: Cairo VM and a simplified zkEVM.
In the development of type systems there are multiple paths for achieving more expressiveness. On the one hand, algebraic effect handlers allow us to reason about program’s side effects. On the other hand, dependent types make it possible to more precisely reason about program states and data and to prove general mathematical statements. We propose a new type system based on bidirectional type-checking for algebraic effects handlers and dependent types.
Just-in-Time (JIT) compilers can gain information at run time that are not available to Ahead-of-Time (AOT) compilers. As such, abstract interpretation baseline JIT compilers are common in many dynamic language implementations. Yet the reference implementation of Python --- CPython, has largely avoided implementing a baseline JIT compiler, likely due to the prohibitive maintenance costs associated with one. This paper implements an abstract-interpretation based optimizer for CPython bytecode that is easy to maintain and less error-prone by automatically generating the optimizer from a pre-existing Domain Specific Language (DSL) --- reusing the same DSL used to specify the interpreter. The key insight presented in this paper is that the very same DSL used to generate a concrete interpreter can also generate an abstract interpreter, providing multiple benefits such as being less error-prone and greater extensibility. The proposed abstract interpreter has been accepted into CPython 3.13 and forms a part of its experimental JIT compiler.
Human uses mental models when interacting with systems. Misalignment between a mental model and the system design, known as mode confusions, can lead to automation surprises. To better handle the vagueness of mental models through formalization, Fuzzy Mental Model Finite State Machines (FMMs), incorporating fuzzy logic, have been introduced. This work explores the potential of FMMs in formal analyses of human-machine interactions, proposing a set of formal behavior patterns of mode confusions and a tool for identifying mode confusions and unsafe interactions.
Inversion is a fundamental operation that arises frequently in probabilistic inference and computer graphics. For example, inversion is used to decrease variance and to enable differentiation in variational inference (e.g., reparameterization trick) and in differentiable rendering (e.g., to integrate over object boundaries). Existing approaches to inversion limit the class of functions inverted, for example, to affine functions, or require a user-specified inverse. We study when a local inverse—an inverse that is valid in a neighborhood of a point—exists. We provide an algorithm to approximate the local inverse and give the convergence rate of the solver. We present LIN, a system that automatically computes the local inverse of a function using a fixed-point solver. We implement LIN in Python and use it to automatically compute the local inverse of affine, polar, and hyperbolic changes of variables arising in image stylization.
Since the discovery of Spectre attacks, various detection methods for speculative vulnerabilities have been developed. Sound static analyses based on symbolic execution give precise results but lack scalability, while pattern-based analyses can accommodate large code bases but may be unsound and require manually crafted patterns for each microarchitecture. We introduce Kawa, an abstract language designed to model control and data flows, allowing efficient analysis of Spectre vulnerabilities. Kawa's abstract nature also enables interpretation as schemata to capture entire classes of concrete programs with speculative leaks.
Static verification assures program correctness but requires heavy user annotation. Gradual verification alleviates this burden by allowing users to write partial, imprecise specifications that are checked statically where possible and dynamically when necessary. The first gradual verifier, Gradual C0, reasons about shared heap memory through a permission logic. This paper describes an extension to Gradual C0 which supports fractional permissions in its permission logic.
In Formal Languages and Automata Theory courses, students are exposed to grammars. They are expected to learn how to design them and how to use them to derive words in the grammar’s language. Some students, however, find understanding word derivation difficult due to nondeterminism and find it difficult to determine if the grammars they develop are correct. This research project aims to develop novel visualization tools to help students understand grammar derivation and establish the correctness of their designs.
Blockchain technologies are applied in diverse domains such as financial systems, supply chains, and identity management, leading to the emergence of various smart contract languages design. These contracts often involve time dependent transactions recorded immutably on the blockchain, making their correctness crucial. This paper addresses the formal verification of temporal behaviors in smart contracts without human interaction. We study 9 recent smart contract languages used in 7 leading blockchains and model 27 common temporal patterns from 3148 benchmarks across 9 domain specific application categories. We introduce VESC, a temporal specification language that allows developers to specify temporal properties in structured natural language, which VESC compiles into formal linear temporal logic. Our experiments demonstrate that VESC effectively specifies common temporal behaviors, paving the way for automated temporal verification of smart contracts.
Visualizations play a significant role in writing, debugging, profiling, and generally understanding programs. However, little work has been done to understand the structure of program visualizations at a fundamental level— namely, why and how they scale. Previous work on visualization comprehension focuses narrowly on a few domains or on surface-level details. While this work makes it possible to point out how different features of two vastly different visualizations may be independently effective, it does not provide an underlying theory that explains both. We develop a theory of the interpretability of program visualizations focused on their abstraction and composition properties. We summarize the preliminary results from our grounded theory study of open-source software visualizations, comprising more than 150 examples. Finally, we hint at work-in-progress to experimentally confirm our findings.
We introduce Meerkat, a novel, type-safe reactive programming language supporting concurrent live code updates in distributed scenarios. Meerkat seamlessly integrates the development of both server-side and client-side components within a unified framework. The language ensures causal consistency during data updates through the Historiographer algorithm. The soundness of concurrent code updates is maintained by a type-checking process protected by our locking protocol.
Scannerless Generalised LR can parse context-free grammars extended with conjunction-with-negation, represented by reject rules. To handle reject rules correctly, Visser defines a specific ordering between states that can be reduced during parsing: first reduce non-rejectable states, then rejectable states. However, he remarks that in specific situations an ordering between rejectable states is required for correct parsing results, for example when encoding intersections with reject rules. The question of how to order rejectable states was left open. In this work, we present a solution that orders stacks topped by rejectable states.
We present the new project called AUTOINC, which aims to automatically incrementalize general-purpose computations.