General Structure

Each UNode requires a resource and a wrapper file to be declared properly. Additionally, the Unode parameters have to be declared in the ursgal/uparams.py file, which holds the grouped and universal ursgal parameters including specific translations to the different engine parameters, their description, default values and value types (see schematic overview A below).

Schematic Overview

_images/figAB.png

Resources

The resources/ directory contains the main code for each UNode, e.g.:

  1. executables (i.e. .exe or .jar)
  2. standalone Python scripts
  3. any additional files that are required by the engine

Compared to the original standalone applications, the folder structure is unchanged. Integration of standalone applications into Ursgal is achieved by Python wrappers around the executables (“wrappers”, see below) and entries in the general ursgal/uparams.py file.

The resources directory path depends on the platform dependencies of the UNode:

  1. <installation path of ursgal>/resources/<platform>/<architecture> Whereas platform is darwin (OS X), linux or win32 (Windows (and yes even if you have windows 64 bit ...))
  2. Architecture independent engines, like Python scripts or Java packages can be placed in <installation path of ursgal>/resources/platform_independent/arc_independent/
  3. Each UNode has to have its own folder following Python class name conventions, but all lowercase. For more details in the naming convention see PEP 3131.

Wrapper Python class

The wrapper inherits from ursgal.UNode. During the instantiation, the default parameters are injected into the class. The default parameters are collected using the umapmaster class, which parses the grouped parameters listed in ursgal.uparams. Therefore, it is imperative that all parameters are listed in the uparams.py file (see below).

The default structure of a wrapper is:

#!/usr/bin/env python3.4
import ursgal

class omssa_2_1_9( ursgal.UNode ):
    """
    omssa_2_1_9 UNode

    Parameter options at http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/asn_spec/omssa.asn.html

    2.1.9 parameters at http://proteomicsresource.washington.edu/protocols06/omssa.php

    Reference:
    Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH (2004) Open Mass Spectrometry Search Algorithm.

    """
    META_INFO = { ... }

    def __init__(self, *args, **kwargs):
        super(omssa_2_1_9, self).__init__(*args, **kwargs)

    def preflight(self):
        # code that should be run before the UNode is executed
        # e.g. writing a config file
        # Note: not mandatory
        return

    def postflight(self):
        # code that should be run after the UNode is executed
        # e.g. formatting the output file
        # Note: not mandatory
        return

It is important that the super class is called with the wrapper’s name. Default parameters are collected from uparams.py using this name (see below). The special methods preflight() and postflight() are automatically called by Ursgal’s UController when a UNode is launched.

The META INFO

The META_INFO class attributed is most important for proper function. The META_INFO entries are described below; for more examples, please refer to the wrapper folder.

Engine_type

Engine Type will define were the engine is grouped into. The groups are shown after ucontroller instantiation. Additionally, the wrapper registers the engines to certain controller functionality, e.g. engine_type[‘search_engine’] : True will allow ucontroller.search(engine=’omssa_2_1_9’) to be executed.

META_INFO = {
    'engine_type'            : {
      'controller'        : False,
      'converter'         : False,
      'validation_engine' : False,
      'search_engine'     : True,
      'meta_engine'       : False
    },
    ...
}

Citation

Please enter the proper citation for each engine you are wrapping so users can be reminded to cite the proper work. In an academic world, this is the only credit that one can hope for ;) For example.

META_INFO = {
    ...
    'citation' : 'Geer LY, Markey SP, Kowalak JA, '\
        'Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH (2004) '\
        'Open Mass Spectrometry Search Algorithm.',
    ...
}

Input types

Input types are currently not used but the next iteration will include this. For example.

META_INFO = {
    ...
    'input_types'           : ['.mgf'],
    ...
}

Output Extension

The output extension is required to auto-generate the output file name. For example:

META_INFO = {
    ...
    'output_extension'      : '.csv',
    ...
}

Create own folder

This option allows all files and results for this engine to be placed in its own folder. The engine will define the folder name, here omssa_2_1_9. The master switch for all unodes to create their folder (if it is specified in the META_INFO) is the ucontroler param engines_create_folders)

META_INFO = {
    ...
    'create_own_folder'     : True,
    ...
}

In Development

In development flag will hide the wrapper from the controller overview, however the node will be instantiated during start and is therefore nevertheless available.

META_INFO = {
    ...
    'in_development'        : False,
    ...
}

Include in GIT

The standalone executable can be distributed via the ursgal git.

Note

Big executables are distributed via the ./install_resources.py script, thus refrain overloading ursgal.git too much :)

META_INFO = {
    ...
    'include_in_git'        : False,
    ...
}

UTranslation Style

Since ursgal translates the general ursgal parameters to engine specific parameters and multiple versions of one engines can be available in ursgal (see e.g. 4+ X! Tandem versions), we define translation styles. Therefore all X! Tandem versions share (up to now) all parameter translation rules, defined as xtandem_style_1. Which translation style is used for which wrapper is defined by this entry in the META info.

META_INFO = {
    ...
    'utranslation_style'    : 'omssa_style_1',
    ...
}

Download information

The download information is required for the install_resources.py script to function.

META_INFO = {
    ...
    ### Below are the download information ###
    'engine': {
        'darwin' : {
            '64bit' : {
                'exe'            : 'omssacl',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.macos.tar.gz',
                'zip_md5'        : '9cb92a98c4d96c34cc925b9336cbaec7',
                'additional_exe' : ['makeblastdb'],
            },
        },
        'linux' : {
            '64bit' : {
                'exe'            : 'omssacl',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.linux.tar.gz',
                'zip_md5'        : '921e01df9cd2a99d21e9a336b5b862c1',
                'additional_exe' : ['makeblastdb'],
            },
        },
        'win32' : {
            '64bit' : {
                'exe'            : 'omssacl.exe',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.win32.exe',
                'zip_md5'        : 'b9d9a8aec3cfe77c48ce0f5752aba8f9',
                'additional_exe' : ['makeblastdb'],
            },
            '32bit' : {
                'exe'            : 'omssacl.exe',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.win32.exe',
                'zip_md5'        : 'a05a5cdd45fd8abcfc75b1236f8a2390',
                'additional_exe' : ['makeblastdb'],
            },
        },
    },
    ...
}

Grouped parameters - uparams.py

The ursgal/uparams.py file holds all parameter information available in ursgal. All default parameters for all nodes are stored there, can be accessed and modified. This file contains one Python dictionary with keys representing the ursgal parameter.

Entries:

  • ‘available_in_unode’
    Defines which nodes use this parameter. Complete engine names are given.
  • ‘default_value’
    Defines the default value for this parameter. Please note that these can be adjusted via parameters or profiles.
  • ‘description’
    Provides a short explanatory text for the parameter.
  • ‘trigger_rerun’
    Defines if a change in this parameter will cause the unode to be executed, independently if there are already result files present. Since not all parameter changes require re-execution, this ensures minimal total runtime for pipelines.
  • ‘ukey_translation’
    Defines how the ursgal parameter name is translated into the name in the corresponding engine. The unified parameter name in ursgal helps the user to group the parameter names from different engines and simplifies the parameter handling for the user.
  • ‘utag’
    Helps to sort and group parameters.
  • ‘uvalue_translation’
    Defines how the ursgal parameter value is translated into the value of the corresponding engines. Please note that the value type can change when its translated, in order to be functional for the engine.
  • ‘uvalue_type’
    Defines the uvalue type of this parameter.
  • ‘uvalue_option’
    Provides information for possible parameter value ranges and step sizes.

The following example shows the parameter dict for the ‘frag_mass_tolerance’ parameter.

ursgal_params = {
    'frag_mass_tolerance' : {
            'available_in_unode' : [
                'msamanda_1_0_0_5242',
                'msamanda_1_0_0_5243',
                'myrimatch_2_1_138',
                'myrimatch_2_2_140',
                'novor_1_1beta',
                'omssa_2_1_9',
                'pepnovo_3_1',
                'xtandem_cyclone_2010',
                'xtandem_jackhammer',
                'xtandem_piledriver',
                'xtandem_sledgehammer',
                'xtandem_vengeance',
            ],
            'default_value' : 20,
            'description' :  ''' Mass tolerance of measured and calculated fragment ions ''',
            'triggers_rerun' : True,
            'ukey_translation' : {
                'msamanda_style_1' : 'ms2_tol',
                'myrimatch_style_1' : 'FragmentMzTolerance',
                'novor_style_1' : 'fragmentIonErrorTol',
                'omssa_style_1' : '-to',
                'pepnovo_style_1' : '-fragment_tolerance',
                'xtandem_style_1' : 'spectrum, fragment monoisotopic mass error',
            },
            'utag' : [
                'fragment',
            ],
            'uvalue_translation' : {
            },
            'uvalue_type' : "int",
            'uvalue_option' : {
                'min': 0,        # default = |default_value * 100| * -1
                'max': 100000,   # default = |default_value * 100|
                'updownval': 1,  # default = 1
        },
    },
    ...
}