Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR701:
Temporal Data Mining of Scientific Data Provenance

Peng Chen, Beth Plale, Mehmet Aktas
(Jul 2012), 24 pages pages
Abstract:
Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to determine the usefulness of the temporal representation. We evaluate the temporal representation using an existing 10 GB database of provenance captured from a range of scientific workflows.

Available as: