Static type systems help prevent errors, improve abstractions, and enable optimisations. There is a whole spectrum of type systems for general-purpose languages, covering a wide range of safety guarantees and expressivity. Despite this, type systems for array programming languages are usually at one of two extremes. In the majority of cases they are nearly untyped, only distinguishing array types by element type or number of dimensions. Otherwise, they tend to reach for powerful dependent types. However, it is difficult to extend existing solutions with a dependent type system, and some problems become undecidable when we do so. Practical array programming – in data science, machine learning and the like – sticks to the bliss of dynamic typing.
We propose a novel calculus for array programming: Star. Array indices and shapes in Star make use of structural record and variant types equipped with subtyping. We prevent indexing errors not by resolving arithmetic problems, but by enabling richer types for arrays, allowing programmers to capture their structure explicitly. While we present Star with only subtype polymorphism, we sketch how algebraic subtyping promises efficient ML-style polymorphic type inference.
Deep Neural Networks are increasingly being used in mission-critical systems. However, their use poses significant safety challenges and requires rigorous verification to ensure their correctness in all scenarios. Formal verification methods such as abstract interpretation are commonly used, but their scalability becomes problematic for DNNs with many parameters. Relational abstract domains, such as affine forms, improve verification accuracy by capturing linear relationships between variables. However, with affine forms, the verification process becomes computationally intensive. To address this challenge, this paper proposes a novel approach called batch representation, which processes the abstract values in batches rather than individually. By exploiting the linear independence of noise symbols and using deep learning frameworks such as Pytorch, the proposed method significantly improves computational efficiency and scalability. Experimental results show that the use of GPUs and CPU clusters enables a reduction in computational time of about 80%, demonstrating the effectiveness of the approach in handling large-scale verification tasks.
Most efficient state-vector quantum simulation frameworks are imperative. They work by having circuit gates operate on the global state vector in sequence and with each gate operation accessing and updating, in parallel, all (or large subsets of) the elements of the state vector. The precise access and update patterns used by a particular gate operation depend on which qubits the individual gate operates on.
Imperative implementations of state-vector simulators, however, often lack a more declarative specification, which may hinder reasoning and optimisations. For instance, correctness is often argued for using reasoning that involves bit-operations on state-vector indexes, which make it difficult for compilers to perform high-level index-optimisations.
In this work, we demonstrate how gate operations can be understood as maps over index-transformed state-vectors. We demonstrate correctness of the approach and implement a library for gate-operations in the data-parallel programming language Futhark. We further demonstrate that Futhark's fusion-engine is sufficiently powerful that it will ensure that consecutive gate operations on identical qubits are fused using map-map fusion. Moreover, we demonstrate that, using Futhark's uniqueness type system, state vectors may be updated in place. We evaluate the approach by comparing it with the state-of-the art state-vector simulators qsim and QuEST.
Today, the lion's share of machine learning and high-performance computing workloads is executed on GPUs, including high-stakes applications such as self-driving cars and fusion reactor simulations. Unfortunately, GPU computations are carried out on largely undocumented hardware units that cannot trap or report floating-point exceptions. Worsening the situation is an ongoing and accelerating shift toward lower-precision arithmetic, driven by performance demands—yet this shift only exacerbates the frequency and severity of floating-point exceptions. Increasingly, matrix multiplications are offloaded to specialized hardware such as Tensor Cores. However, because these units do not adhere to a unified arithmetic standard, their computed results can deviate to unacceptable levels.
This experience report aims to consolidate our previously published work and relate it to array programming in two key ways: (1) by providing tools to diagnose bugs that may arise during array computations, and (2) by addressing broader correctness challenges inherent to array-based programming. This report highlights GPU-FPX, a debugging tool extended to analyze computations involving Tensor Cores. It addresses key correctness challenges, such as the potential for different Tensor Core implementations to produce inconsistent results for the same input. These discrepancies can be systematically uncovered using a targeted testing approach known as FTTN. We conclude with a discussion on how formal methods, particularly those based on SMT solvers, can play a critical role in identifying and bridging gaps in manufacturer-provided hardware specifications—and, in the long term, in proving desired correctness properties.