Options
Atkinson, Malcolm
Loading...
Preferred name
Atkinson, Malcolm
10 results
Now showing 1 - 10 of 10
- PublicationOpen AccessS-ProvFlow. Storing and Exploring Lineage Data as a Service(2022)
; ; ; ; ; We present a set of configurable Web service and interactive tools, s-ProvFlow, for managing and exploiting records tracking data lineage during workflow runs. It facilitates detailed analysis of single executions. It helps users manage complex tasks by exposing the relationships between data, people, equipment and workflow runs intended to combine productively. Its logical model extends the PROV standard to precisely record parallel data-streaming applications. Its metadata handling encourages users to capture the application context by specifying how application attributes, often using standard vocabularies, should be added. These metadata records immediately help productivity as the interactive tools support their use in selection and bulk operations. Users rapidly appreciate the power of the encoded semantics as they reap the benefits. This improves the quality of provenance for users and management. Which in turn facilitates analysis of collections of runs, enabling users to manage results and validate procedures. It fosters reuse of data and methods and facilitates diagnostic investigations and optimisations. We present S-ProvFlow’s use by scientists, research engineers and managers as part of the DARE hyper-platform as they create, validate and use their data-driven scientific workflows.53 15 - PublicationOpen AccessDr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way(2021-10)
; ; ; ; ; ; ; ; ; Collaboration across institutional boundaries is widespread and increasing today. It depends on federations sharing data that often have governance rules or external regulations restricting their use. However, the handling of data governance rules (aka. data-use policies) remains manual, time-consuming and error-prone, limiting the rate at which collaborations can form and respond to challenges and opportunities, inhibiting citizen science and reducing data providers’ trust in compliance. Using an automated system to facilitate compliance handling reduces substantially the time needed for such non-mission work, thereby accelerating collaboration and improving productivity. We present a framework, Dr.Aid, that helps individuals, organisa- tions and federations comply with data rules, using automation to track which rules are applicable as data is passed between processes and as derived data is generated. It encodes data-governance rules using a formal language and performs reasoning on multi-input-multi-output data-flow graphs in decentralised contexts. We test its power and utility by working with users performing cyclone tracking and earthquake modelling to support mitigation and emergency response. We query standard provenance traces to detach Dr.Aid from details of the tools and systems they are using, as these inevitably vary across members of a federation and through time. We evaluate the model in three aspects by encoding real-life data-use policies from diverse fields, showing its capability for real-world usage and its advantages compared with traditional frameworks. We argue that this approach will lead to more agile, more productive and more trustworthy collaborations and show that the approach can be adopted incrementally. This, in-turn, will allow more appropriate data policies to emerge opening up new forms of collaboration.65 79 - PublicationOpen AccessDARE Architecture and Technology internal report(2020)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;; ; ; ; ; ;49 358 - PublicationOpen AccessDARE to Perform Seismological Workflows(2019-12-09)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; The DARE e-science platform (http://project-dare.eu) offers innovative tools to ease scientific workflow development and execution exploiting efficient Cloud resources. It aims to enable on-demand numerical computations and analyses, fast large dataset handling, flexible and customisable workflow pipelines and complete provenance tracking. It also integrates available e-infrastructure services (e.g. EUDAT, EIDA) and can be linked to user developed interfaces. DARE is validated via two domain-specific pilots, one from the climate modelling community and one from the seismological research field. Focusing on the latter, the EPOS Use Case is driven by urgent issues and general user needs of solid Earth Science community, following developments and application standards in the computational seismology research society. This Use Case also benefits from the pioneering experience of previous European projects (e.g. VERCE, EPOS-IP) in this framework. We present here the development of a scientific workflow to perform a quick calculation of seismic source parameters after an earthquake. The workflow requirements include HPC calculations (on local-institutional or Cloud resources), fast data-intensive processing, provenance exploitation and seismic source inverse modelling tools. The DARE platform automatically conducts the required actions optimally mapped to computational resources, linking them together by managing intermediate data. It automatically deploys the necessary environment to perform on-demand transparent computations executing a dockerised version of the numerical simulation code on a Kubernetes cluster via a web API. Other API calls allow for remote, distributed execution of dispel4py workflows, used to describe the steps for data analysis and download of seismic recorded data via EIDA Research Infrastructure services. Well established scientific python codes, such as those for waveform misfit calculation and source inversion, are thus easily implemented in this flexible and modular structure, and executed at scale. Moreover, the pilot requirement of searching and reusing multiple simulations for the same earthquake strongly benefits from customisable management of metadata and lineage through the DARE platform exploiting the integration of S-ProvFlow with dispel4py.79 21 - PublicationOpen AccessDARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud(2019-09)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;; ; ; ;The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts. The platform is Cloud-ready, and relies on the exposure of APIs, which are suitable for raising the abstraction level and hiding complexity. At its core, the platform implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services. Reflection is achieved via a logical knowledge base, comprising multiple internal catalogues, registries and semantics, while it supports persistent and pervasive data provenance. This paper presents design and implementation aspects of the DARE platform, as well as it provides directions for future development.76 81 - PublicationOpen AccessComprehensible Control for Researchers and Developers Facing Data ChallengesThe DARE platform enables researchers and their developers to exploit more capabilities to handle complexity and scale in data, computation and collaboration. Today’s challenges pose increasing and urgent demands for this combination of capabilities. To meet technical, economic and governance constraints, application communities must use use shared digital infrastructure principally via virtualisation and mapping. This requires precise abstractions that retain their meaning while their implementations and infrastructures change. Giving specialists direct control over these capabilities with detail relevant to each discipline is necessary for adoption. Research agility, improved power and retained return on intellectual investment incentivise that adoption. We report on an architecture for establishing and sustaining the necessary optimised mappings and early evaluations of its feasibility with two application communities.
65 57 - PublicationOpen AccessActive provenance for Data-Intensive workflows: engaging users and developersWe present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration.Provenance Types handle domain contextualisation and allow developers to model lineage patterns by re-defining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution to prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.
55 79 - PublicationRestrictedEstablishing Core Concepts for Information-Powered CollaborationsScience benefits tremendously from mutual exchanges of information and pooling of effort and resources. The combination of different skills and diverse knowledge is a powerful capacity, source of new intuitions and creative insights. Therefore multidisciplinary approaches can be a great opportunity to explore novel scientific horizons. Collaboration is not only an opportunity, it is essential when tackling today’s global challenges by exploiting our fast growing wealth of data. In this paper we introduce the concept of Information-Powered Collaborations (IPC) — an abstraction that captures those requirements and opportunities. We propose a conceptual framework that partitions the inherent complexity of such dynamic environments and offers concrete tools and methods to thrive in the data revolution era. Such a framework promotes and enables information sharing from multiple heterogeneous sources that are independently managed. We present the results of assessing our approach as an IPC for solid-Earth sciences: the European Plate Observing System (EPOS).
149 4 - PublicationOpen AccessVERCE delivers a productive e-Science environment for seismology research(2015-10-07)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ; ; ; ;; ;; ; ; ;The VERCE project has pioneered an e-Infrastructure to support researchers using established simulation codes on high-performance computers in conjunction with multiple sources of observational data. This is accessed and organised via the VERCE science gateway that makes it convenient for seismologists to use these resources from any location via the Internet. Their data handling is made flexible and scalable by two Python libraries, ObsPy and dispel4py and by data services delivered by ORFEUS and EUDAT. Provenance driven tools enable rapid exploration of results and of the relationships between data, which accelerates understanding and method improvement. These powerful facilities are integrated and draw on many other e-Infrastructures. This paper presents the motivation for building such systems, it reviews how solid-Earth scientists can make significant research progress using them and explains the architecture and mechanisms that make their construction and operation achievable. We conclude with a summary of the achievements to date and identify the crucial steps needed to extend the capabilities for seismologists, for solid-Earth scientists and for similar disciplines.100 99 - PublicationRestrictedTowards Addressing CPU-Intensive Seismological Applications in Europe(Springer Berlin Heidelberg, 2013)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;Carpené, M.; CINECA, Bologna, Italy ;Klampanos, I. A.; University of Edinburgh, School of Informatics, UK ;Leong, S. H.; Leibniz Supercomputing Centre (LRZ), Garching, Germany ;Casarotti, E.; Istituto Nazionale di Geofisica e Vulcanologia, Sezione Roma1, Roma, Italia ;Danecek, P.; Istituto Nazionale di Geofisica e Vulcanologia, Sezione CNT, Roma, Italia ;Ferini, G.; CINECA, Bologna, Italy ;Gemünd, A.; Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Germany ;Krause, A.; University of Edinburgh, Edinburgh Parallel Computing Centre (EPCC), UK ;Krischer, L.; Ludwig-Maximilianis-University, Department of Earth and Environmental Sciences, Germany ;Magnoni, F.; Istituto Nazionale di Geofisica e Vulcanologia, Sezione CNT, Roma, Italia ;Simon, M.; Ludwig-Maximilianis-University, Department of Earth and Environmental Sciences, Germany ;Spinuso, A.; The Royal Netherlands Meteorological Institute (KNMI), Netherlands ;Trani, L.; The Royal Netherlands Meteorological Institute (KNMI), Netherlands ;Atkinson, M.; University of Edinburgh, School of Informatics, UK ;Erbacci, G.; CINECA, Bologna, Italy ;Frank, A.; Leibniz Supercomputing Centre (LRZ), Garching, Germany ;Igel, H.; Ludwig-Maximilianis-University, Department of Earth and Environmental Sciences, Germany ;Rietbrock, H.; University of Liverpool, Department of Earth, Ocean and Ecological Sciences, UK ;Schwichtenberg, H.; Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Germany ;Vilotte, J.; Institut de Physique du Globe de Paris (IPGP), France; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Advanced application environments for seismic analysis help geosci- entists to execute complex simulations to predict the behaviour of a geophysical system and potential surface observations. At the same time data collected from seismic stations must be processed comparing recorded signals with predictions. The EU-funded project VERCE (http://verce.eu/) aims to enable specific seismological use-cases and, on the basis of requirements elicited from the seis- mology community, provide a service-oriented infrastructure to deal with such challenges. In this paper we present VERCE’s architecture, in particular relating to forward and inverse modelling of Earth models and how the, largely file-based, HPC model can be combined with data streaming operations to enhance the scala- bility of experiments. We posit that the integration of services and HPC resources in an open, collaborative environment is an essential medium for the advancement of sciences of critical importance, such as seismology.190 24