We propose design guidelines for a probabilistic programming facility suitable for deployment as a part of a production software system. As a reference implementation, we introduce Infergo, a probabilistic programming facility for Go, a modern programming language of choice for server-side software development. We argue that a similar probabilistic programming facility can be added to most modern general-purpose programming languages.
Probabilistic programming enables automatic tuning of program parameters and algorithmic decision making through probabilistic inference based on the data. To facilitate addition of probabilistic programming capabilities to other programming languages, we share implementation choices and techniques employed in development of Infergo. We illustrate applicability of Infergo to various use cases on case studies, and evaluate Infergo's performance on several benchmarks, comparing Infergo to dedicated inference-centric probabilistic programming frameworks.
As reactive systems, such as cyber-physical systems and the Internet of Things, become increasingly important, time-varying values, also known as signals, are playing an important role in software development. Although reactive systems require the change histories of some signals to be stored for various purposes such as post analysis and simulation, current programming languages do not provide a way to declare that signals are persistent. This paper proposes a method that realizes persistent signals in a reactive programming language, where (1) every update to each persistent signal is recorded in a time-series database, which can be seen as a part of the programming language runtime; and (2) persistent signals support a convenient time-oriented query mechanism. In this approach, each signal in the reactive programming language is seamlessly connected with the time-series database. This method is implemented as an extension of SignalJ, a Java-based reactive programming language that supports signals. In the implementation, the persistent signal mechanism is integrated with TimescaleDB, a PostgreSQL-based time-series database. In preliminary performance evaluations, our implementation had good responsiveness on most tests, indicating its feasibility for use in many applications.
Effect systems are used to statically reason about the effects an expression may have when evaluated. In the literature, such effects include various behaviours as diverse as memory accesses and exception throwing. Here we present CallƐ, an object-oriented language that takes a flexible approach where effects are just method calls: this works well because ordinary methods often model things like I/O operations, access to global state, or primitive language operations such as thread creation. CallƐ supports both flexible and fine-grained control over such behaviour, in a way designed to minimise the complexity of annotations.
CallƐ’s effect system can be used to prevent OO code from performing privileged operations, such as querying a database, modifying GUI widgets, exiting the program, or performing network communication. It can also be used to ensure determinism, by preventing methods from (indirectly) calling non-deterministic primitives like random number generation or file reading.
Relational model finding is a successful technique which has been used in a wide range of problems during the last decade. This success is partly due to the fact that many problems contain relational structures which can be explored using relational model finders. Although these model finders allow for the exploration of such structures they often struggle with incorporating the non-relational elements.
In this paper we introduce AlleAlle, a method and language that integrates reasoning on both relational structure and non-relational elements -the data- of a problem. By combining first order logic with Codd's relational algebra, transitive closure, and optimization criteria, we obtain a rich input language for expressing constraints on both relational and scalar values.
We present the semantics of AlleAlle and the translation of AlleAlle specifications to SMT constraints, and use the off-the-shelf SMT solver Z3 to find solutions. We evaluate AlleAlle by comparing its performance with Kodkod, a state-of-the-art relational model finder, and by encoding a solution to the optimal package resolution problem. Initial benchmarking show that although the translation times of AlleAlle can be improved, the resulting SMT constraints can efficiently be solved by the underlying solver.
Software applications have grown increasingly complex to deliver the features desired by users. Software modularity has been used as a way to mitigate the costs of developing such complex software. Active learning-based program inference provides an elegant framework that exploits this modularity to tackle development correctness, performance and cost in large applications. Inferred programs can be used for many purposes, including generation of secure code, code re-use through automatic encapsulation, adaptation to new platforms or languages, and optimization. We show through detailed examples how our approach can infer three modules in a representative application. Finally, we outline the broader paradigm and open research questions.
A new approach to web application development is presented, in which an application is constructed by configuring and composing concepts drawn from a catalog developed by experts.
A concept is a self-contained, reusable increment of functionality. Each concept includes both front-end and back-end functionality, and exports a collection of components—full-stack GUI elements, backed by application logic and database storage.
To build an app, the developer imports concepts from the catalog, tunes them to fit the application’s particular needs via configuration variables, and links concept components together to create pages. Components of different concepts may be executed independently, or bound together declaratively with dataflows and synchronization. The instantiation, configuration, linking and binding of components is all expressed in a simple template language that extends HTML.
The approach has been implemented in a platform called Déjà Vu, which we outline and compare to conventional web application architectures. We describe a case study in which a collection of applications previously built as team projects for a web programming course were replicated in Déjà Vu. Preliminary results validate our hypothesis, suggesting that a variety of non-trivial applications can be built from a repository of generic concepts.
Debugging distributed systems is hard. Most of the techniques that have been developed for debugging such systems use either extensive model checking, or postmortem analysis of logs and traces. Interactive debugging is typically a tool that is only effective in single threaded and single process applications, and is rarely applied to distributed systems. While the live observation of state changes using interactive debuggers is effective, it comes with a host of problems in distributed scenarios. In this paper, we discuss the requirements an interactive debugger for distributed systems should meet, the role the underlying distributed model plays in facilitating the debugger, and the implementation of our interactive debugger: GoTcha.
GoTcha is a browser based interactive debugger for distributed systems built on the Global Object Tracker (GoT) programming model. We show how the GoT model facilitates the debugger, and the features that the debugger can offer. We also demonstrate a typical debugging workflow.
The ability to compose software from high level components is as sought after as it is elusive. The REST architectural style used in the World Wide Web enables such plug-compatible components in distributed settings.
We propose storage combinators, a type of plug-compatible component that can be used as generic intermediary in a non-distributed setting.
Storage combinators combine several stores – components that support REST-style verbs – into a single component that also provides a store interface.
This mechanism allows a few basic components to be combined in many different ways to achieve different effects with or without adaptation. It correlates with reported increases in productivity while performing well in commercial applications with millions of users.
Anglo-American law enables property owners to split up rights among multiple entities by breaking their ownership apart into future interests that may evolve over time. The conveyances that owners use to transfer and subdivide property rights follow rigid syntactic conventions and are governed by an intricate body of interlocking doctrines that determine their legal effect. These doctrines have been codified, but only in informal and potentially ambiguous ways.
This paper presents preliminary work in developing a formal model for expressing and analyzing property conveyances. We develop a domain-specific language capable of expressing a wide range of conveyances in a syntax approximating natural language. This language desugars into a core calculus for which we develop operational and denotational semantics capturing a variety of important properties of property law in practice. We evaluate an initial implementation of our languages and semantics on examples from a popular property law textbook.
The field of big code relies on mining large corpora of code to perform some learning task towards creating better tools for software engineers. A significant threat to this approach was recently identified by Lopes et al. (2017) who found a large amount of near-duplicate code on GitHub. However, the impact of code duplication has not been noticed by researchers devising machine learning models for source code. In this work, we explore the effects of code duplication on machine learning models showing that reported performance metrics are sometimes inflated by up to 100% when testing on duplicated code corpora compared to the performance on de-duplicated corpora which more accurately represent how machine learning models of code are used by software engineers. We present a duplication index for widely used datasets, list best practices for collecting code corpora and evaluating machine learning models on them. Finally, we release tools to help the community avoid this problem in future research.
Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost.
In this article we propose local-first software, a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data.
We survey existing approaches to data storage and sharing, ranging from email attachments to web apps to Firebase-backed mobile apps, and we examine the trade-offs of each. We look at Conflict-free Replicated Data Types (CRDTs): data structures that are multi-user from the ground up while also being fundamentally local and private. CRDTs have the potential to be a foundational technology for realizing local-first software.
We share some of our findings from developing local-first software prototypes at the Ink & Switch research lab over the course of several years. These experiments test the viability of CRDTs in practice, and explore the user interface challenges for this new data model. Lastly, we suggest some next steps for moving towards local-first software: for researchers, for app developers, and a startup opportunity for entrepreneurs.
In his essay, Designed as Designer, Richard Gabriel suggests that artifacts are agents of their own design. Building on Gabriel’s position, this essay makes three observations (1) Code “speaks” to the programmer through code smells, and it talks about the shape it wants to take by signalling design principle violations. By “listening” to code, even a novice programmer can let the code itself signal its own emergent natural structure. (2) Seasoned programmers listen for code smells, but they hear in the language of design principles (3) Design patterns are emergent structures that naturally arise from designers listening to what the code is signaling and then responding to these signals through refactoring transformations. Rather than seeing design patterns as an educational destination, we see them as a vehicle for teaching the skill of listening. By showing novices the stories of listening to code and unfolding design patterns (starting from code smells, through refactorings, to arrive at principled structure), we can open up the possibility of listening for emergent design.
The dream of programming language design is to bring about orders-of-magnitude productivity improvements in software development tasks. Designers can endlessly debate on how this dream can be realized and on how close we are to its realization. Instead, I would like to focus on a question with an answer that can be, surprisingly, clearer: what will be the common principles behind next-paradigm, high-productivity programming languages, and how will they change everyday program development? Based on my decade-plus experience of heavy-duty development in declarative languages, I speculate that certain tenets of high-productivity languages are inevitable. These include, for instance, enormous variations in performance (including automatic transformations that change the asymptotic complexity of algorithms); a radical change in a programmer's workflow, elevating testing from a near-menial task to an act of deep understanding; and a change in the need for formal proofs.