Cloud platforms allow applications to meet fluctuating levels of demand through automatic horizontal scaling. These deployment models are characterized by short-lived applications running in resource-constrained environments. This amplifies the overhead of dynamic languages with just-in-time (JIT) compilation. Dynamic-language runtimes suffer from a warmup phase and resource-usage peaks caused by JIT compilation. Offloading compilation jobs to a dedicated server is a possible mitigation for these problems. We propose leveraging remote JIT compilation as a means to enable coordination between the independent instances. By sharing compilation results, aggregating profiles, and adapting the compiler and compilation policy, we strive to improve the peak performance and further reduce the warmup time of these applications. Additionally, an implementation on top of the Truffle framework enables us to bring these benefits to many popular languages.
The automatic synthesis of algorithms can effectively reduce the difficulty of algorithm design. However, multiple challenges exist for synthesizing algorithms. Among them, scalability of the synthesizer is the most prominent one because of the significant complexity of efficient algorithms. To address this scalability challenge, we propose several approaches from two aspects, improving the efficiency of existing program synthesizers and reducing the difficulty of algorithm synthesis by properly using algorithmic knowledge, respectively.
This paper introduces two methods for automated program repair (APR) utilizing pre-trained language models. The first method demonstrates program repair as a code completion task and is validated on a dataset of Java programs. The second method, Mentat, leverages OCaml’s parser and type system as fault localization techniques to generate prompts for GPT-3, producing candidate patches. Evaluation results show promising repair rates, with 27% and 39.2% effectiveness, respectively. For OCaml, a comparative study employing an automated validation strategy is presented in which the technique outperforms other tools. Language models are effective at APR, enhancing bug fixing and freeing developers to focus on other critical aspects of software engineering.
We propose a novel code assistant and generation paradigm aimed at closing the gap between visual sketching and code creation for Machine Learning (ML) development. This approach empowers developers and ML practitioners to translate hand-drawn sketches into functional code with enhanced accuracy and usability. Developers are recruited to assess the tool's performance. This research contributes to the future of low-code approaches, facilitating ML application development, and promoting an intuitive and accessible programming environment.
We propose a language-based approach to software versioning. Unlike the traditional approach of mainstream version control systems, where each evolution step is represented by a textual diff, we treat versions as programming elements. Each evolution step, merge operation, and version relationship, is represented explicitly in the program. This provides compile time guarantees for safety code reuse from previous versions, as well as forward and backwards compatibility between versions, allowing clients to use newly introduced code without needing to refactor their program. By lifting the versioning to the language level, we pave the way for tools that interact with software repositories to have more insight regarding the evolution of the software semantics.
State-of-the-art software is crafted in multiple programming languages. Such multi-language software challenges static analyses: Since many static analyses are focused on analyzing single-language programs, they are inherently unsound or imprecise in analyzing cross-language interaction. Existing approaches that perform multi-language analysis are not analysis- or language independent and thus lack in extensibility for new analyses and languages. We will develop an extensible, language-, framework-, and analysis-independent architecture to reuse existing static analyses for analyzing multi-language software. Our hypotheses are that, our architecture allows reusing existing single-language analyses and improves precision and soundness compared to the state of the art. We will evaluate our architecture with a points-to and data flow analysis for Java, JavaScript, and C/C++ code and compare it against the state of the art.
Programming languages share a social and formal heritage. These families were historically divided, but share deep roots, and we argue their destined matrimony heralds important consequences for language design and generative language modeling. In our work, we develop a sociotechnical framework for understanding the dynamics of programming and argue it captures many of the social and formal properties of language acquisition and evolution.
Gradual typing supports imprecise types in the type system, allowing incremental migration from untyped code to typed in the same language. Through the gradual typing approach, our ongoing work proposes a new theory based on the Martin-Löf type theory called Partial Gradual Dependent Type Theory. PGTT supports a gradual step from non-dependently typed code to dependently typed, enhancing the functionality of code reasoning while preserving the usability of widely used non-dependent type systems. PGTT restricts entirely unknown types and only permits dynamic terms on the type indices, making it naturally maintain better properties than existing gradual dependent type theories. It allows us to simplify runtime type checks into type parameter checks and elaborate the surface language into a static, dependently typed language, thereby reducing the performance overhead associated with gradual typing.
Despite great progress in algorithms for synthesizing recursive programs, state of the art approaches continue to have several limitations. Principal among these is their inability to “invent” auxiliary functions. This makes them sensitive to the available set of primitive components. In this paper, we present an alternative approach to recover recursive programs. We develop a system of constraints that characterizes patterns of data flow in the unrollings of a recursive program. Combined with a generator of seed nonrecursive circuits and a constraint solver, these constraints naturally form the basis of a general algorithm to synthesize recursive circuits.
Teaching novices to program is an unsolved problem. One part of the problem lies in the fact that industrial languages are used for teaching novices, while they were not made for this purpose. I am designing a Programming Education Runtime System to easily create modular languages for education. This system utilizes object algebras, trampolining and algebraic effects and handlers. It has been used to implement an interpreter for the Hedy programming language. This implementation has several important advantages over the existing one, such as better debugging support and better integration with the Hedy platform.
We propose a novel distributed reactive propagation semantics that provides strong consistency guarantees with minimal locking. This is achieved by decoupling reactive propagation from transaction execution, utilizing reactive histories to avoid propagating in-progress or inconsistent results. We formally define soundness properties in terms of histories, and sketch how our semantics upholds them. We implement Historiographer, a runtime incorporating our methods, and conduct a preliminary evaluation demonstrating performance improvements of up to 38% on select benchmarks.
Introducing new maintainers to established projects is critical to the long-term sustainability of open-source projects. Yet, we have little understanding of what motivates developers to join and maintain already established projects. Previous research on volunteering motivations emphasizes that individuals are motivated by a unique set of factors to volunteer in a specific area, suggesting that the motivations behind open-source contributions also depend on the nature of the work. We aim to determine correlations between types of open-source contributions and their specific motivators through surveys of open-source contributors.
Information Flow Control (IFC) in dynamic contexts is challenging due to different interpretations of security that arise. This paper introduces a modular framework to address this challenge. We present a dynamic floating-label enforcement mechanism that can be instantiated based on the intended security. Our approach formalizes a simply typed λ-calculus, extended with IFC operations, and adopts an epistemic perspective on security definition.
We present progress towards the formal verification of Wigderson’s graph coloring algorithm in Coq. We have created a library of formalized graph theory that aims to bridge the literature gap between introductory material on Coq and large-scale formal developments, while providing a motivating case study. Our library contains over 180 proven theorems. The development is available at https://github.com/siraben/coq-wigderson.
The Abstract Syntax Tree (AST) serves as a pivotal representation of program codes, offering a structured and hierarchical view of the program’s syntax. When developers modify code, the underlying AST also evolves to reflect these changes. Tree-diff algorithms, such as truediff and Gumtreediff, are developed to compare different versions of the AST and identify the modifications made between them. However, these heuristics are based on certain vertex matching methods that do not ensure optimality and preciseness. In this study, I propose a novel tree-diff approach that utilizes a MaxSAT (Maximum satisfiability) solver to address this issue. By encoding potential vertex matches and edges with associated costs as a tree-diff SAT problem, the MaxSAT solver effectively minimizes the edit distance and reveals the optimal vertex matching plan.
Automatic differentiation (AD) has become the backbone for a new wave of optimization-driven domains such as computer graphics and machine learning over the past decade. However, existing AD systems face limitations, either lacking support for in-browser development or failing to harness more recent, compiler-based approaches to achieve both expressiveness and size-preserving differentiation. This work introduces Rose, a portable, extensible AD library that runs on the web. Through Rose, we aim to increase accessibility to AD and empower end-user programming in optimization-driven domains. We plan to evaluate Rose by replacing the AD engines of real-world, client-side optimization systems and assess the improvements on the computation power and expressiveness of such systems.
We present an overview of primary feedback mechanisms used by Ecma International Technical Committee 39 (TC39), the standardizing body of JavaScript programming language.
Software is composed of different parts with different goals, each with different needs. Security-wise, this means not all necessarily need the same permissions: it can be beneficial to isolate some code such that it has limited control over the process, following the principle of least privilege. Without any sort of compartmentalization, a vulnerability found in a sensitive part of an application means the entire process can get corrupted, as its entire memory is now open to be read and modified by the attacker.
One notable example of dangerous code is the use of external libraries: using a library implies that it is trusted, as it is run within the program's context and can therefore read and modify any program state. This means malicious or vulnerable code can be introduced to the system this way, which can compromise sensitive information that is present in the process' memory. This begs the question: if parts of the software cannot necessarily be trusted, what can we do to limit their capabilities over the process?
We propose a new library sandboxing approach, focusing on isolating dynamically loaded libraries. While existing approaches usually leverage static analysis to perform instrumentation at the source level and during build time, we instead strive to work entirely during the program's run time.
This paper presents Sui Move, a new smart contract language for programming blockchains using objects as an abstraction.
Many large-scale Java empirical studies require not only source code but also resulting binaries such as JAR files. Pre-compiled datasets quickly become obsolete, and the creation of a custom corpus for every study is tedious. We present a prototype of JaMaBuild, a tool and a framework for mass building of Java projects from source. Given a list of projects and optional settings, it downloads the projects, filters them by user-definable criteria, builds them using Maven or Gradle, and collects outputs such as JAR files and build logs. Our tool can also be used for local build breakage studies.
Developers automate deployments with Programming Languages Infrastructure as Code (PL-IaC) by implementing IaC programs in popular languages like TypeScript and Python. Yet, systematic testing---well established for high-velocity software development---is rarely applied to IaC programs because IaC testing techniques are either slow or require extensive development effort. To solve this dilemma, we develop ProTI, a novel IaC unit testing approach, and implement it for Pulumi TypeScript. Our preliminary experiments with simple type-based test case generators and oracles show that ProTI can find bugs reliably in a short time, often without writing any additional testing code. ProTI's extensible plugin architecture allows combining, adopting, and experimenting with new approaches, opening the discussion about novel generators and oracles for efficient IaC testing.
This work presents the concept of MorphLang, a functional reactive programming language tailored for shape-changeable computers, which are built using wirelessly interconnected chiplets. MorphLang simplifies the programming process for these systems by concentrating on the basic behaviors of individual nodes and their asynchronous communication. The language allows for compilation into binary or Arduino formats, and programs can be transmitted to each node either wirelessly or through physical connections.
We present a library named ReactCOP that extends React's capabilities with support for Context-Oriented Programming. The library lets developers manage behavioral variations in React applications through layers, and adapt the application's behavior dynamically based on different contexts.
Programming distributed systems requires maintaining consistency among data replicas. In recent years, various frameworks have proposed language-level abstractions for this, falling into two fundamental approaches: data-centric and operation-centric solutions. The former allow developers to explicitly assign consistency levels to data, the latter enable attaching consistency constraints to operations. In practice, developers may benefit from both in the same application: data-centric consistency harmonizes well with object-oriented programming, yet one may need the flexibility to access the same data with a different consistency level depending on the operation. Currently, there is no solution that integrates both: it is a conceptual challenge to unify these two models and design a type system capable of ensuring their correct interaction. We present ConOpY, a programming language that integrates both data-centric and operation-centric consistency into the same design. The ConOpY type system guarantees the proper usage of consistency levels, preventing consistency violations resulting from an improper mix of consistency models. ConOpY is implemented as a Java extension based on annotations.
Graphical user interfaces present data as structures (lists, trees, grids). Convenient features to manipulate these structures are tedious to implement. We are working towards a GUI programming approach, where concise specifications of structures give rise to full-fledged GUIs with a complete set of structure manipulation features.
Large language models (LLMs) can generate programs in general-purpose languages from prose descriptions, but are not trained on many domain-specific languages (DSLs). Diagram authoring with Penrose, a diagramming system using three DSLs, exemplifies the utility of DSL program generation with LLMs, which enables diagram creation from prose. We provide methods to conceptualize and evaluate the structures of one-shot LLM prompts to generate error-free DSL programs and implement Penrose diagram creation from prose using LLMs. We will evaluate our LLM prompt structures by testing prompt variations across different diagramming domains and plan to run a user study to assess the ease of LLM-augmented Penrose diagramming over other tools.