Machine learning (ML) software is increasingly being deployed in high-stakes and sensitive applications, raising important challenges related to safety, privacy, and fairness. In response, ML verification has quickly gained traction within the formal methods community, particularly through techniques like abstract interpretation. However, much of this research has progressed with minimal dialogue and collaboration with the ML community, where it often goes underappreciated. In this talk, we advocate for closing this gap by surveying possible ways to make formal methods more appealing to the ML community. We will survey our recent and ongoing work in the design and development of abstract domains for machine learning verification, and discuss research questions and avenues for future work in this context.
Pointer analysis is foundational for statically analyzing real-world programs. We present C-2PO — a weakly relational domain for C programs, which tracks must-equalities and -disequalities between pointer expressions. This domain captures address arithmetic and its confinement to single memory objects, both core concepts in C. We implement the domain in Goblint and provide a preliminary evaluation. For 95% of SV-COMP tasks, the slowdown incurred by adding C-2PO is below a factor of 3. To measure precision, we instrumented coreutil programs with assertions computed by C-2PO. For an existing non-relational pointer analysis, 80% of the assertions are out of reach.
State-of-the-art abstract domains for static analysis typically focus on over-approximating either the values a variable can hold at a specific program point or the relational dependencies among variables. In this paper, we aim to capture the trend of numerical values during program executions (e.g., increasing, decreasing, or stable) relative to preceding states. By integrating the Stability domain with numerical domains, we can verify co-/contra-variance relationships among potentially unrelated variables. This approach has promising applications for anomaly detection in complex software systems, and for the verification of functional requirements.
Due to its interdisciplinary nature, the development of data science code is subject to a wide range of potential mistakes that can easily compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low level programming issues. We discuss the steps needed to implement a tool that is rather meant to focus on higher level errors that are specific of the data science pipeline. To this end, we propose a static analysis assigning ad hoc abstract datatypes to the program variables, which are then checked for consistency when calling functions defined in data science libraries. By adopting a descriptive (rather than prescriptive) abstract type system, we obtain a linter tool reporting data science related code smells. While being still work in progress, the current prototype is able to identify and report the code smells contained in several examples of questionable data science code.
The pitfalls of numerical computations using floating-point numbers are well known. Existing static techniques for float- ing-point error analysis provide a single upper bound across all potential values for a floating-point variable. We present a new abstract domain for floating-point error analysis which describes error as a function of each variable’s value. This domain accurately models the nature of floating-point error as dependent on the magnitude of its operands. We use this domain to effectively handle exceptional values (e.g., NaN), branch instability, and binade boundaries. The granular anal- ysis provides users with a detailed understanding of forward error. We implement the abstract domain in a tool that sup- ports analyzing a subset of C including conditionals, arrays, and arithmetic operators. We compare our implementation with Fluctuat and show how our analysis can improve the error bounds for subranges of possible outputs.
In quantum computation, entanglement is a fundamental phenomenon that significantly affects the behaviour and correctness of quantum programs. One of the most striking effects of such a phenomenon arises from its interplay with quantum measurement. A non-classical correlation between the entangled qubits, which is related to a non-local action, makes the measurement of one qubit of an entangled pair instantaneously affect also the other. This leads to potential problems in the execution of a quantum program where variables become entangled during a computation since unintended measurements may cause erroneous results. A static analysis detecting such critical situations is, therefore, necessary to guarantee the correct execution of a quantum program. To pursue this objective, we introduce a novel abstract domain specifically designed to analyse and manage entanglement in quantum programs.