View on GitHub  

Scape Component Profiles

In Digital Preservation a variety of tools and services for preservation actions, characterisation and quality assurance are used. These tools require various inputs and parameters and provide a variety of output formats. Preservation Components help provide interoperability between tools and services and enable automation of preservation processes. Components are based on Taverna workflows as a common language. They either wrap command line tools or services with necessary post-processing, are directly implemented as workflows or are composed of other components.

Based on the component's task, the component profile defines a unified interface as well as other metadata required to find and execute components. The interface is described as required and optional inputs and outputs. Annotations help identify the ports and provide further metadata used for execution. Processors depending on external tools can be enriched with licensing information and information about the installation of the tools for different environments. Depending on the component profile, supported filetypes and other metadata can be added to the component itself.

Profiles

Taverna's components are defined as XML documents validated against the Component Profile Schema.

Migration Action

A migration action migrates an object to a different format. The component needs the source object as input. Additionally it can accept parameters and a working directory. The output is the migrated target object. Optionally a status message is provided. To identify supported formats the component must be annotated with migration paths, pairs of source- and target mimetypes. External tools can be annotated with information about the installation of dependencies.

See the Migration Action Component Profile for further details.

Profile
Imagemagick - image/* ↝ image/tiff
Source object
<> #accepts #SourceObject.
Parameter [optional]
<> #accepts #Parameter.

<> #accepts
  [ a #PredefinedParameterValue ;
    #parameterDescription "Description" ;
    #parameterValue "value"
  ] .
Status
<> #provides #Status.
Target object
<> #provides #TargetObject.
External tool [optional]
<> #requiresInstallation
  [ a #Installation ;
    #hasEnvironment #Environment-URI ;
    #hasSourceConfiguration
      [ a #SourceConfig-URI ;
        #hasConfiguration "package name" ;
        #requiresSource "source configuration"
      ] ;
    #dependsOn
      [ a #Dependency ;
        skos#prefLabel "dependency name" ;
        purl#dependencyVersion "dependency version" ;
        foaf#page <dependency page> ;
        cc#license <license URI>
    ]
  ] .
Workflow
<> #fits #MigrationAction .

<> #migrates
  [ a #MigrationPath ;
    #sourceMimetype "source mimetype" ;
    #targetMimetype "target mimetype" ;
  ] .
[Click workflow parts to show annotations]
Source object
<> #accepts #SourceObject.
Parameter
<> #accepts #Parameter.

<> #accepts
  [ a #PredefinedParameterValue ;
    #parameterDescription "no compression" ;
    #parameterValue "none"
  ] .

<> #accepts
  [ a #PredefinedParameterValue ;
    #parameterDescription "CCITT Group 4" ;
    #parameterValue "Group4"
  ] .

<> #accepts
  [ a #PredefinedParameterValue ;
    #parameterDescription "run length encoding" ;
    #parameterValue "RLE"
  ] .
Status
<> #provides #Status.
Target object
<> #provides #TargetObject.
Imagemagick
<> #requiresInstallation
  [ a #Installation ;
    #hasEnvironment #Debian ;
    #hasSourceConfiguration
      [ a #DpkgConfiguration ;
        #installsDpkgs "imagemagick" ;
        #requiresAptSource "deb http://scape.keep.pt/apt stable main"
      ] ;
    #dependsOn
      [ a #Dependency ;
        skos#prefLabel "imagemagick" ;
        purl#dependencyVersion "5" ;
        foaf#page <http://www.imagemagick.org> ;
        cc#license <http://opensource.org/licenses/Apache-2.0>
    ]
  ] .
Workflow
<> #fits #MigrationAction .

<> #migrates
  [ a #MigrationPath ;
    #sourceMimetype "image/*" ;
    #targetMimetype "image/tiff" ;
  ] .
[Click workflow parts to show annotations]

For more examples check the myExperiment component families for images, audio or video, documents, scientific data and webpages.

Characterisation

A characterisation component provide measures about properties of single objects. The source object must be specified as input. Additionally it can accept parameters and a working directory. As output, the component must provide one or more measures. Supported formats must be added to the component. External tools can be annotated with information about the installation of dependencies.

See the Characterisation Component Profile for further details.

Profile
Imagemagick - image/* - size
Source object
<> #accepts #SourceObject.
Image resolution
<> #provides <measure-URI> .
Image width
<> #provides <measure-URI>.
Image height
<> #provides <measure-URI>.
External tool
<> #requiresInstallation
  [ a #Installation ;
    #hasEnvironment #Environment-URI ;
    #hasSourceConfiguration
      [ a #SourceConfig-URI ;
        #hasConfiguration "package name" ;
        #requiresSource "source configuration"
      ] ;
    #dependsOn
      [ a #Dependency ;
        skos#prefLabel "dependency name" ;
        purl#dependencyVersion "dependency version" ;
        foaf#page <dependency page> ;
        cc#license <license URI>
    ]
  ] .
Workflow
<> #fits #Characterisation .
<> #handlesMimetype "mimetype" .
[Click workflow parts to show annotations]
Source object
<> #accepts #SourceObject.
Image resolution
<> #provides <http://purl.org/DP/quality/measures#54>.
Image width
<> #provides <http://purl.org/DP/quality/measures#50>.
Image height
<> #provides <http://purl.org/DP/quality/measures#52>.
Imagemagick
<> #requiresInstallation
  [ a #Installation ;
    #hasEnvironment #Debian ;
    #hasSourceConfiguration
      [ a #DpkgConfiguration ;
        #installsDpkgs "imagemagick" ;
        #requiresAptSource "deb http://scape.keep.pt/apt stable main"
      ] ;
    #dependsOn
      [ a #Dependency ;
        skos#prefLabel "imagemagick" ;
        purl#dependencyVersion "5" ;
        foaf#page <http://www.imagemagick.org> ;
        cc#license <http://opensource.org/licenses/Apache-2.0>
    ]
  ] .
Workflow
<> #fits #Characterisation ;
  #handlesMimetype "image/*" .
[Click workflow parts to show annotations]

For more examples check the myExperiment component families for images, audio or video, documents, scientific data and webpages.

Object Quality Assurance

To compare two objects, e.g. after a migration action, a quality assurance component can be used. It requires a left and right object as input. Additionally it can accept parameters and a working directory. The output is one or more measures. To specify supported formats they must be specified as mimetype pairs relating to the left and right objects. External tools can be annotated with information about the installation of dependencies.

See the Quality Assurance Object Component Profile for further details.

Profile
Imagemagick - image/tiff - MSE
Left object
<> #accepts #LeftObject.
Right object
<> #accepts #RightObject.
Measure port
<> #provides <measure-URI>.
External tool [optional]
<> #requiresInstallation
  [ a #Installation ;
    #hasEnvironment #Environment-URI ;
    #hasSourceConfiguration
      [ a #SourceConfig-URI ;
        #hasConfiguration "package name" ;
        #requiresSource "source configuration"
      ] ;
    #dependsOn
      [ a #Dependency ;
        skos#prefLabel "dependency name" ;
        purl#dependencyVersion "dependency version" ;
        foaf#page <dependency page> ;
        cc#license <license URI>
    ]
  ] .
Workflow
<> #fits #QAObjectComparison .

<> #handlesMimetype "mimetype" .

<> #handlesMimetypes
  [ a #AcceptedMimetypes ;
      #handlesLeftMimetype "mimetype" ;
      #handlesRightMimetype "mimetype"
  ] .
[Click workflow parts to show annotations]
Left object
<> #accepts #LeftObject.
Right object
<> #accepts #RightObject.
MSE
<> #provides <http://purl.org/DP/quality/measures#6>.
Imagemagick
<> #requiresInstallation
  [ a #Installation ;
    #hasEnvironment #Debian ;
    #hasSourceConfiguration
      [ a #DpkgConfiguration ;
        #installsDpkgs "imagemagick" ;
        #requiresAptSource "deb http://scape.keep.pt/apt stable main"
      ] ;
    #dependsOn
      [ a #Dependency ;
        skos#prefLabel "imagemagick" ;
        purl#dependencyVersion "5" ;
        foaf#page <http://www.imagemagick.org> ;
        cc#license <http://opensource.org/licenses/Apache-2.0>
    ]
  ] .
Workflow
<> #fits #QAObjectComparison ;
  #handlesMimetype "image/*" .
              
<> #handlesMimetypes
  [ a #AcceptedMimetypes ;
      #handlesLeftMimetype "image/tiff" ;
      #handlesRightMimetype "image/tiff"
  ] .
[Click workflow parts to show annotations]

For more examples check the myExperiment component families for images, audio or video, documents, scientific data and webpages.

Executable Plan

As the result of a preservation planning process, an executable plan combines a preservation action with necessary characterisation and quality assurance components. The input of the component must consist of the source object and optionally parameters and a working directory. The output is the migrated target object, a status message and measures relevant measures of the characterisation and quality assurance components. Supported formats are added as migration paths, pairs of source- and target mimetypes. External tools can be annotated with information about the installation of dependencies.

See the Executable Plan Component Profile for further details.

Imagemagick - image/* ↝ image/tiff

Workflow

Ports

Tools

Annotations Tools

Component Creation

There are two recommended ways to create and publish components. The Scape Toolwrapper allows automatic package and component generation from a tool- and component specification and publishing to myExperiment. Alternatively the Taverna Workbench can be used to create components manually and publish them.

Toolwrapper

The Scape Toolwrapper is a tool to package command line tools and specific tool operations. Additionally it can create artefacts for tool execution, one of which is a Scape Preservation Component.

Tool operations are specified in a toolspec and consist of the operation's command and it's inputs and outputs. The toolspec also contains installation information and dependencies. Additional metadata required to elevate a wrapped tool to a Preservation Component is provided in a componentspec.

Publication of a Preservation Component is a three step process supported by the Toolwrapper:

  1. Create a wrapper script and component
  2. Package the wrapper script and component
  3. Publish the component on myExperiment.org

For an example toolwrapper configuration check out Photohawk's command line version toolspec.

Taverna

The Taverna Workbench supports creating workflows and adding semantic annotations to create Components out of the box since Taverna 2.5 or through the Taverna Components Plugin for Taverna 2.4. Taverna uses myExperiment as registry by default but also allows creating a local registry in the Taverna data directory on your computer.

Starting from a regular workflow a component can be created by adding it to a component family. The component family links the components to the profile. After that, Taverna uses the information from the profile to allow adding semantic annotations to the inputs, outputs, processors and the workflow itself. Simple validation against the profile is performed.

Once the component is ready for publication, it can be added to myExperiment or the local registry.