Climate models and earth system models often comprise submodels composed via a 'coupler', a software component that enables interaction between submodel components. The continuous exchange of data through couplers creates the risk of subtle errors propagating across components, potentially distorting scientific conclusions. In this paper, we argue for lightweight formal verification techniques applied at the coupler interface to improve both coupler and model correctness. By enforcing formal contracts on data exchanges, the coupler can act as a checkpoint that detects and prevents certain classes of component-level errors before they propagate between models. We abstract general design principles for couplers and propose verifiable subsystems. Using an example of a real-world bug, we illustrate a hybrid verification strategy that integrates module-level contracts, verified through both static and runtime techniques. We aim to offer a practical pathway for both existing and future couplers, ultimately enabling auditable and formally verifiable couplers.
Scientists increasingly write software as part of large-scale collaborative workflows, but current tools make it difficult to follow FAIR principles (findability, accessibility, interoperability, reusability) and ensure reproducibility by default.
This paper proposes Fairground, a computational commons designed as a collaborative notebook system where thousands of scientific artifacts are authored, collected, and maintained together in executable form in a manner that is FAIR, reproducible, and live by default. Unlike existing platforms, Fairground notebooks can reference each other as libraries, forming a single planetary-scale live program executed by a distributed scheduler.
We describe the design of Fair Python, a purely functional subset of Python, and a foreign function interface for interoperating with existing code. Through three interleaved research tracks focusing on language design, interoperability, and distributed execution, we aim to create a next-generation collaborative scientific workflow system that makes best practices the path of least resistance.
Geospatial datasets have complex lineages that are crucial for reproducibility and understanding data provenance, yet current metadata standards like STAC (SpatioTemporal Asset Catalog) provide limited support for capturing complete processing workflows. We propose STACD (STAC extension with DAGs), an extension to STAC specifications that incorporates Directed Acyclic Graph (DAG) representations along with defining algorithms and version changes in the workflows. We also provide a reference implementation on Apache Airflow to demonstrate STACD capabilities such as selective recomputation when some datasets or algorithms in a DAG are updated, complete lineage construction for a dataset, and opportunities for improved collaboration and distributed processing that arise with this standard.
Climate change research relies on complex computational tools to model environmental processes, analyse large datasets, and inform policy. Current scientific computing practices pose major barriers to entry, particularly for interdisciplinary researchers and those in low and middle-income countries (LMICs). Challenges include steep learning curves, limited access to expert support, and difficulties with legacy or under-documented software. Drawing on real-world experiences, we identify recurring obstacles in the usability, accessibility, and sustainability of scientific software. Our analysis highlights that current approaches to scientific software disadvantage interdisciplinary and LMIC researchers. We propose specific mechanisms to address these inequities: improved documentation, domain-aware training, automation for diverse hardware environments, domain-specific languages and hybrid support communities. These measures should be integrated into grant funding requirements to ensure sustainability beyond initial project periods, transforming scientific software from short-lived outputs into accessible research infrastructure. By reimagining scientific programming as a shared public good, we can lower barriers to entry and foster a more inclusive, equitable climate research ecosystem.
Critical hydrology related algorithms to trace the path of surface water flows, including flow accumulation, stream order, watershed delineation, and runoff simulation, can be difficult to execute for large aerial extents at fine spatial and temporal resolutions. Libraries like GDAL that use multi-threaded CPU-based implementations running on a single host may be slow, and distributed infrastructures like Google Earth Engine may not support the kind of computational primitives required by these algorithms. We have developed a GPU-accelerated framework that re-engineers these four algorithms and is able to process areas as large as river basins of 250,000 km2 on commodity GPU workstations. We express these algorithms in terms of easily parallelizable primitives of pixel independent (PI) and short-pixel (SP) operations, and iterative primitives of long-pixel (LP) operations. Each algorithm uses a different mix of the primitives which helps us ensure that the implementation is generic. We show that our implementation of these algorithms produces accurate outputs and is able to achieve significant performance benefits over alternative methods. Being able to execute the algorithms on a commodity GPU workstation paves the path to use on-prem infrastructure to produce national-scale outputs, and collaborate to pool multiple national-scale outputs together for global-scale analysis.
We present Yirgacheffe, a declarative geospatial library that allows spatial algorithms to be implemented concisely, supports parallel execution, and avoids common errors by automatically handling data (large geospatial rasters) and resources (cores, memory, GPUs). Our primary user domain comprises ecologists, where a typical problem involves cleaning messy occurrence data, overlaying it over tiled rasters, combining layers, and deriving actionable insights from the results. We describe the successes of this approach towards driving key pipelines related to global biodiversity and describe the capability gaps that remain, hoping to motivate more research into geospatial domain-specific languages.