Please use this identifier to cite or link to this item: http://hdl.handle.net/2122/13262
Authors: Spinuso, Alessandro* 
Atkinson, Malcolm* 
Magnoni, Federica* 
Title: Active provenance for Data-Intensive workflows: engaging users and developers
Issue Date: Sep-2019
DOI: 10.1109/eScience.2019.00077
Keywords: Reproducibility
Workflow management software
Metadata
Collaborative work
Data flow computing
Abstract: We present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration.Provenance Types handle domain contextualisation and allow developers to model lineage patterns by re-defining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution to prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.
Appears in Collections:Conference materials

Files in This Item:
File Description SizeFormat
eScienceProv.pdf1.75 MBAdobe PDFView/Open
Show full item record

Page view(s)

48
checked on Apr 20, 2024

Download(s)

72
checked on Apr 20, 2024

Google ScholarTM

Check

Altmetric