Software correctness and security have been a central issue in the field for decades. Researchers have developed a wide range of approaches to these problems, none of which has solved these problems to date.
In this talk I consider two very different approaches to solving correctness and security problems, failure-oblivious computing and domain-specific languages. I will discuss how these approaches (as well as others) interact with the cognitive limitations and available technical skills of the human population of software developers that currently must be part of any solution for it to be successful. I’ll conclude by outlining a new approach that, by deploying automated programming language technology in an appropriately targeted way, may interact more productively with the characteristics of the developer population as a whole.
In layout-sensitive languages, the indentation of an expression or statement can influence how a program is parsed. While some of these languages (e.g., Haskell and Python) have been widely adopted, there is little support for software language engineers in building tools for layout-sensitive languages. As a result, parsers, pretty-printers, program analyses, and refactoring tools often need to be handwritten, which decreases the maintainability and extensibility of these tools. Even state-of-the-art language workbenches have little support for layout-sensitive languages, restricting the development and prototyping of such languages.
In this paper, we introduce a novel approach to declarative specification of layout-sensitive languages using layout declarations. Layout declarations are high-level specifications of indentation rules that abstract from low-level technicalities. We show how to derive an efficient layout-sensitive generalized parser and a corresponding pretty-printer automatically from a language specification with layout declarations. We validate our approach in a case-study using a syntax definition for the Haskell programming language, investigating the performance of the generated parser and the correctness of the generated pretty-printer against 22191 Haskell files.
At SLE in 2014, Ridge presented the P3 combinator library with which parsers can be developed for left-recursive, non-deterministic and ambiguous grammars. A combinator expression in P3 yields a binarised grammar reflecting the expression's structure. The grammar is given to an underlying, generalised parsing procedure computing all derivations.
In this paper we present a combinator library with a similar architecture to P3, adjusting it to avoid grammar binarisation. Avoiding binarisation has a significant positive effect on the running times of the underlying parsing procedure, which we demonstrate using real-world grammars. Binarisation is avoided by restricting the applicability of combinators, resulting in combinator expressions closely resembling BNF fragments. Usability is recovered by defining coercions that automatically convert expressions where necessary. As the underlying parsing procedure, we use a purely functional variant of generalised top-down (GLL) parsing.
The POSIX shell language defies conventional wisdom of compiler construction on several levels: The shell language was not designed for static parsing, but with an intertwining of syntactic analysis and execution by expansion in mind. Token recognition cannot be specified by regular expressions, lexical analysis depends on the parsing context and the evaluation context, and the shell grammar given in the specification is ambiguous. Besides, the unorthodox design choices of the shell language fit badly in the usual specification languages used to describe other programming languages. This makes the standard usage of LEX and YACC as a pipeline inadequate for the implementation of a parser for POSIX shell.
The existing implementations of shell parsers are complex and use low-level character-level parsing code which is difficult to relate to the POSIX specification. We find it hard to trust such parsers, especially when using them for writing automatic verification tools for shell scripts.
This paper offers an overview of the technical difficulties related to the syntactic analysis of the POSIX shell language. It also describes how we have resolved these difficulties using advanced parsing techniques (namely speculative parsing, parser state introspection, context-dependent lexical analysis and longest-prefix parsing) while keeping the implementation at a sufficiently high level of abstraction so that experts can check that the POSIX standard is respected.
The resulting tool, called MORBIG, is an open-source static parser for a well-defined and realistic subset of the POSIX shell language. Its implementation crucially relies on the purity and incrementality of LR(1) parsers generated by MENHIR, a parser generator for OCaml.
Regular expressions are extended by splitting the terminals into left brackets, right brackets, and neutral terminals. These extended regular expressions define a superset of regular languages. Their languages are parsed in linear time in the size of the input. The addition of annotations to these regular expressions results in more detailed parse trees.
The goal of modular language development is to enable the definition of new languages as assemblies of pre-existing ones. Recent approaches in this area are plentiful but usually suffer from two main problems: either they do not support modular language composition both at the specification and implementation levels, or they require advanced knowledge of specific paradigms which hampers wide adoption in the industry. In this paper, we introduce a non-intrusive approach to modular development of language concerns with well-defined interfaces that can be composed modularly at the specification and implementation levels. We present an implementation of our approach atop the Eclipse Modeling Framework, namely Alex, an object-oriented meta-language for semantics definition and language composition. We evaluate Alex in the development of a new DSL for IoT systems modeling resulting from the composition of three independently defined languages (UML activity diagrams, Lua, and the OMG Interface Description Language). We evaluate the effort required to implement and compose these languages using Alex with regards to similar approaches of the literature.
The ability to extend programming languages with domain-specific concepts is becoming an essential technology for developing complex software. However, many domain-specific languages are implemented in a way that interact poorly with the host language. There are a number of tools that aim to improve the situation by simplifying the creation of domain-specific languages, and allow easier interactions between the host language and the domain-specific language. However, many of these tools are limited to a single host language, and rarely allow extending the language used for language creation. To improve the situation, we created the language platform Storm, which aims to make the creation and usage of multiple extensible languages easy and seamless. This is accomplished by means of a shared, standardized namespace and in-process code generation, which gives Storm a high degree of extensibility, making it possible to extend or replace the built-in languages at will.
In this paper, we introduce languages as first-class citizens as a sub-paradigm of language-oriented programming. In this approach, language definitions are in the context of a general purpose programming language with the same status as any other expression. In particular, language definitions are elevated to be run-time values, that can be assigned to variables, passed to functions, returned by functions, and inserted into lists, to name a few possibilities. This approach offers flexible features in the run-time creation and modification of languages, and may promote new idioms in language-oriented programming. As a proof of concept, we have designed and implemented lang-n-play, a functional language with languages as first-class citizens. We present the features of lang-n-play with an example, and show that they naturally enable dynamic programming scenarios.
Just like current software systems, models are characterised by increasing complexity and rate of change. Yet, these models only become useful if they can be continuously evaluated and validated. To achieve sufficiently low response times for large models, incremental analysis is required. Reference Attribute Grammars (RAGs) offer mechanisms to perform an incremental analysis efficiently using dynamic dependency tracking. However, not all features used in conceptual modelling are directly available in RAGs. In particular, support for non-containment model relations is only available through manual implementation. We present an approach to directly model uni- and bidirectional non-containment relations in RAGs and provide efficient means for navigating and editing them. This approach is evaluated using a scalable benchmark for incremental model editing and the JastAdd RAG system. Our work demonstrates the suitability of RAGs for validating complex and continuously changing models of current software systems.
To provide empirical evidence to what extent migration of business logic to an incremental computing language (ICL) is useful, we report on a case study on a learning management system. Our contribution is to analyze a real-life project, how migrating business logic to an ICL affects information system validatability, performance, and development effort.
We find that the migrated code has better validatability; it is straightforward to establish that a program ‘does the right thing’. Moreover, the performance is better than the previous hand-written incremental computing solution. The effort spent on modeling business logic is reduced, but integrating that logic in the application and tuning performance takes considerable effort. Thus, the ICL separates the concerns of business logic and performance, but does not reduce effort.
Compiler construction is one of the oldest areas of software engineering, yet despite its maturity it has underdeveloped sides such as compiler testing. There exist many disparate methods for testing parsers, optimisers and other components, but no unified methodology that consumable by practitioners from a book to be directly applied to fulfil their needs.
Instead of striving to cover all theoretical aspects of compiler testing in one paper, we present a case study for an ongoing project of a relatively large size for our company (2 years, 3--6 devs, 500kLOC), a clean room compiler development effort in replicating a 4GL. We built a testing framework and a model-based test data generator, which consumes manually written specifications and generates all the necessary test code in the 4GL, in the host language, and in auxiliary DSLs (batch files, XML project descriptions), to both the developers' and the customer's satisfaction. The number of specifications is 927 at the publication time, while the number of test cases generated from them, is 6268. All these tests have been run prior to shipping for the last 49 releases of the compiler, both to ensure the lack of regression and to report on the project overall progress. The generated tests are separated into 11 categories which the paper details in the hope that the classification will aid in seeking related work and in pushing this line of research forward.
This tool paper presents the design and tool-support of Messir, an approach centered on textual domain-specific languages supported by our open-source UML requirements engineering tool, named Excalibur. The novelty of our approach is the actual integration in a single workbench (Excalibur) of textual DSLs richly covering the requirements and analysis phases, i.e. improved use-cases, environment, conceptual and operations models; and the read-only visualisation of the requirements with UML-compliant views; and the generation of scientific requirements analysis documents in LATEX; and the formal simulation of test cases requirements.
We designed our Messir language, with a grammar-based approach generating a textual editor, using the XText framework as an Eclipse plugin. Messir DSL’s static semantics is defined as a set of validation rules guiding end-users through the requirements analysis phase. Messir DSL’s semantics is given as a semi-automatic translation to prolog code. We also generate, from the requirements model elements, read-only graphical views (using the Sirius eclipse plugin) as well as a complete requirements analysis document in LATEX.
This approach and tool have been used as a requirements engineering educational tool in several bachelor and master semesters.
Live modeling enables modelers to incrementally update models as they are running and get immediate feedback about the impact of their changes. Changes introduced in a model may trigger inconsistencies between the model and its run-time state (e.g., deleting the current state in a statemachine); effectively requiring to migrate the run-time state to comply with the updated model. In this paper, we introduce an approach that enables to automatically migrate such run-time state based on declarative constraints defined by the language designer. We illustrate the approach using Nextep, a meta-modeling language for defining invariants and migration constraints on run-time state models. When a model changes, Nextep employs model finding techniques, backed by a solver, to automatically infer a new run-time model that satisfies the declared constraints. We apply Nextep to define migration strategies for two DSLs, and report on its expressiveness and performance.
In scientific applications, physical quantities and units of measurement are used regularly. If the inherent incompatibility between these units is not handled properly it can lead to major, sometimes catastrophic, problems. Although the risk of a miscalculation is high and the cost equally so, almost none of the major programming languages has support for physical quantities. Instead, scientific code developers often make their own tools or rely on external libraries to help them spot or prevent these mistakes.
We employed a systematic approach to examine and analyse all available physical quantity open-source libraries. Approximately 3700 search results across seven repository hosting sites were condensed into a list of 82 of the most comprehensive and well-developed libraries currently available. In this group, 30 different programming languages are represented. Out of these 82 libraries, 38 have been updated within the last two years. These 38 are summarised in this paper as they are deemed the most relevant.
The conclusion we draw from these results is that there is clearly too much diversity, duplicated efforts, and a lack of code sharing and harmonisation which discourages use and adoption.
Aliasing is a vital concept of programming, but it comes with a plethora of challenging issues, such as the problems related to race safety. This has motivated years of research, and promising solutions such as ownership or linear types have found their way into modern programming languages. Unfortunately, most current approaches are restrictive. In particular, they often enforce a single-writer constraint, which prohibits the creation of mutable self-referential structures. While this constraint is often indispensable in the context of preemptive multithreading, it can be worked around in the case of single threaded programs. With the recent resurgence of cooperative multitasking, where processes voluntarily share control over a single execution thread, this appears to be interesting trade-off. In this paper, we propose a type system that relaxes the usual single-writer constraint for single threaded programs, without sacrificing race safety properties. We present it in the form of a simple reference-based language, for which we provide a formal semantics, as well as an interpreter.
Model-driven engineering (MDE) promotes models as the principal assets in software projects. Models are built using a modelling language whose syntax is defined by a metamodel. Hence, objects in models are typed by a metamodel class, and this typing relation is static as it is established at creation time and cannot be changed later. This way, objects in MDE are closed and fixed with respect to the type they conform to, the slots/properties they have, and the constraints they should obey. This hampers the reuse of model-related artefacts like model transformations, as well as the opportunistic or dynamic combination of metamodels.
To alleviate this rigidity, we propose making model objects open so that they can acquire or drop so-called facets, each one contributing a type, slots and constraints to the object. Facets are defined by regular metamodels, hence being a lightweight extension of standard metamodelling. Facet metamodels may declare usage interfaces, and it is possible to specify laws that govern how facets are to be assigned to the instances of a metamodel. In this paper, we describe our proposal, report on an implementation, and illustrate scenarios where facets have advantages over other techniques.
Model-driven engineering advocates the use of models to describe and automate many software development tasks. The syntax of modelling languages is defined by meta-models, making them essential artefacts. A combination of product line engineering methods and meta-models has been proposed to enable specification of modelling language variants, e.g., to describe a range of systems. However, there is a lack of techniques for ensuring syntactic correctness of all meta-models within a family (including their OCL constraints), and semantic correctness related to properties of individual instances of the different variants. The absence of verification methods at the product-line level can cause synthesis of ill-formed meta-models and problematic feature combinations whose effect at the instance level may go unnoticed.
To attack this problem, we propose an approach to lifting both the meta-model syntax checking and the satisfiability checking of properties of individual meta-model instances, to the product-line level. We validate the approach via a prototype tool called Merlin, and report on several experiments that show the advantages of our method w.r.t. an enumerative analysis approach.
There is a software language engineering gap between metamodel-based languages and grammar-based languages. Grammars can support integrated definition of concrete syntax and abstract syntax, which facilitates processing models, but usually prevents reusing the variety of language tools operating on Ecore metamodels (such as editors, interpreters, debuggers, etc.). Existing work on translating grammars to Ecore metamodels features very cursory translations only, which requires re-engineering intricacies natural to grammars for the metamodels again. We conceived a translation from an EBNF-like syntax to Ecore metamodels that considers the grammars’ intricacies. This translation is realized as a fully automated toolchain from grammars into Ecore & OCL using the language workbench MontiCore. Using this translation enables grammar-based languages to leverage the benefits of Ecore-compatible language tools while supporting natural definition of concrete and abstract syntax.
A prime decision of engineering domain-specific languages (DSLs) is implementing these as external DSLs or internal DSLs. Agile language engineering benefits from easily switching between both shapes to provide rapidly developed prototypes before settling on a specific syntax. This switching, however, is rarely feasible due to the effort of re-implementing language tooling for both shapes. Current research in software language engineering focuses either on internal DSLs or external DSLs. We conceived a concept to automatically derive customizable internal DSLs from grammars that operate on the same abstract syntax as the external DSL. This supports reusing tooling (such as model checkers or code generators) between both shapes. We realized our concept with the MontiCore language workbench and Groovy as host language for internal DSLs. This concept is applicable to many grammar-based language definition
Advanced and mature language workbenches have been proposed in the past decades to develop Domain-Specific Languages (DSL) and rich associated environments. They all come in various flavors, mostly depending on the underlying technological space (e.g., grammarware or modelware).
However, when the time comes to start a new DSL project, it often comes with the choice of a unique technological space which later bounds the possible expected features.
In this tool paper, we introduce NabLab, a full-fledged industrial environment for scientific computing and High Performance Computing (HPC), involving several metamodels and grammars.
Beyond the description of an industrial experience of the development and use of tool-supported DSLs, we report in this paper our lessons learned, and demonstrate the benefits from usefully combining metamodels and grammars in an integrated environment.
We present a tool architecture that supports migrating custom domain-specific language (DSL) implementations to a language workbench. We demonstrate an implementation of this architecture for models in the domains of defining component interfaces (IDL) and modeling system behavior (OIL) which are developed and used at a digital printer manufacturing company. Increasing complexity and the lack of DSL syntax and IDE support for existing implementations in Python based on XML syntax hindered their evolution and adoption. A reimplementation in Spoofax using modular language definition enables composition between IDL and OIL and introduces more concise DSL syntax and IDE support. The presented tool supports migrating to new implementations while being backward compatible with existing syntax and related tooling.
Interactive notebooks allow people to communicate and collaborate through a single rich document that might include live code, multimedia, computed results, and documentation, which is persisted as a whole for reproducibility. Notebooks are currently being used extensively in domains such as data science, data journalism, and machine learning. However, constructing a notebook interface for a new language requires a lot of effort. In this tool paper, we present Bacatá, a language parametric notebook generator for domain-specific languages (DSL) based on the Jupyter framework. Bacatá is designed so that language engineers may reuse existing language components (such as parsers, code generators, interpreters, etc.) as much as possible. Moreover, we explain the design of Bacatá and how DSL notebooks can be generated with minimum effort in the context of the Rascal meta programming system and language workbench.
Domain-Specific Languages (DSLs) manifest themselves in remarkably diverse shapes, ranging from internal DSLs embedded as a mere fluent API within a programming language, to external DSLs with dedicated syntax and tool support. Although different shapes have different pros and cons, combining them for a single language is problematic: language designers usually commit to a particular shape early in the design process, and it is hard to reconsider this choice later. In this new ideas paper, we envision a language engineering approach enabling (i) language users to manipulate language constructs in the most appropriate shape according to the task at hand, and (ii) language designers to combine the strengths of different technologies for a single DSL. We report on early experiments and lessons learned building , our prototype approach to this problem. We illustrate its applicability in the engineering of a simple shape-diverse DSL implemented conjointly in Rascal, EMF, and Java. We hope that our initial contribution will raise the awareness of the community and encourage future research.