General Structure

Each UNode requires a resource and a wrapper file to be declared properly. Additionally, the Unode parameters have to be declared in the ursgal/uparams.py file, which holds the grouped and universal ursgal parameters including specific translations to the different engine parameters, their description, default values and value types (see schematic overview A below).

Schematic Overview

_images/figAB.png

Resources

The resources/ directory contains the main code for each UNode, e.g.:

  1. executables (i.e. .exe or .jar)
  2. standalone Python scripts
  3. any additional files that are required by the engine

Compared to the original standalone applications, the folder structure is unchanged. Integration of standalone applications into Ursgal is achieved by Python wrappers around the executables (“wrappers”, see below) and entries in the general ursgal/uparams.py file.

The resources directory path depends on the platform dependencies of the UNode:

  1. <installation path of ursgal>/resources/<platform>/<architecture> Whereas platform is darwin (OS X), linux or win32 (Windows (and yes even if you have windows 64 bit …))
  2. Architecture independent engines, like Python scripts or Java packages can be placed in <installation path of ursgal>/resources/platform_independent/arc_independent/
  3. Each UNode has to have its own folder following Python class name conventions, but all lowercase. For more details in the naming convention see PEP 3131.

Wrapper Python class

The wrapper inherits from ursgal.UNode. During the instantiation, the default parameters are injected into the class. The default parameters are collected using the umapmaster class, which parses the grouped parameters listed in ursgal.uparams. Therefore, it is imperative that all parameters are listed in the uparams.py file (see below).

The default structure of a wrapper is:

#!/usr/bin/env python3.4
import ursgal

class xtandem_alanine( ursgal.UNode ):
    """
    X!Tandem UNode
    Parameter options at http://www.thegpm.org/TANDEM/api/

    Reference:
    Craig R, Beavis RC. (2004) TANDEM: matching proteins with tandem mass spectra.
    """
    META_INFO = { ... }

    def __init__(self, *args, **kwargs):
        super(xtandem_alanine, self).__init__(*args, **kwargs)

    def preflight(self):
        # code that should be run before the UNode is executed
        # e.g. writing a config file
        # Note: not mandatory
        return

    def postflight(self):
        # code that should be run after the UNode is executed
        # e.g. formatting the output file
        # Note: not mandatory
        return

It is important that the super class is called with the wrapper’s name. Default parameters are collected from uparams.py using this name (see below). The special methods preflight() and postflight() are automatically called by Ursgal’s UController when a UNode is launched.

The META INFO

The META_INFO class attributed is most important for proper function. The META_INFO entries are described below; for more examples, please refer to the wrapper folder.

Edit Version

This number is used to determine the most recent version of the wrapper on different work stations. Therefore, it should be updated everytime a change in the wrapper is made.

META_INFO = {
    ...
    'edit_version'           : 1.00,
    ...
}

Name, Version and Release Date

The original name of the engine, it’s version number and the release date of this version (if available).

META_INFO = {
    ...
    'name'               : 'X!Tandem',
    'version'            : 'ALANINE',
    'release_date'       : '2017-02-01',
    ...
}

Engine Type

Engine Type will define were the engine is grouped into. The groups are shown after ucontroller instantiation. Additionally, the wrapper registers the engines to certain controller functionality, e.g. engine_type[‘search_engine’] : True will allow ucontroller.search(engine=’omssa_2_1_9’) to be executed. The engine types and corresponding ucontroller functions are also listed in ukb.ENGINE_TYPES

META_INFO = {
    'engine_type' : {
        'controller'                    : False,
        'converter'                     : False,
        'fetcher'                       : False,
        'meta_engine'                   : False,
        'misc_engine'                   : False,
        'cross_link_search_engine'      : False,
        'de_novo_search_engine'         : False,
        'protein_database_search_engine': True,
        'quantification_engine'         : False,
        'validation_engine'             : False,
        'visualizer'                    : False,
    },
    ...
}

Citation

Please enter the proper citation for each engine you are wrapping so users can be reminded to cite the proper work. In an academic world, this is the only credit that one can hope for ;) For example.

META_INFO = {
    ...
    'citation' : \
        'Craig R, Beavis RC. (2004) TANDEM: matching proteins with tandem '\
        'mass spectra.',
    ...
}

Input Extensions

List of file extensions that can be used as input files for the engine. For example.

META_INFO = {
    ...
    'input_extensions' : ['.mgf', '.gaml', '.dta', '.pkl', '.mzData', '.mzXML'],
    ...
}

Output Extensions

List of file extensions generated by the engine. They are required to auto-generate the output file name and to check if an output file was produced. For example:

META_INFO = {
    ...
    'output_extensions'      : ['.csv'],
    ...
}

Create own folder

This option allows all files and results for this engine to be placed in its own folder. The engine will define the folder name, here omssa_2_1_9. The master switch for all unodes to create their folder (if it is specified in the META_INFO) is the ucontroler param engines_create_folders)

META_INFO = {
    ...
    'create_own_folder'     : True,
    ...
}

In Development

In development flag will hide the wrapper from the controller overview, however the node will be instantiated during start and is therefore nevertheless available.

META_INFO = {
    ...
    'in_development'        : False,
    ...
}

Include in GIT

The standalone executable can be distributed via the ursgal git.

Note

Big executables are distributed via the ./install_resources.py script, thus refrain overloading ursgal.git too much :)

META_INFO = {
    ...
    'include_in_git'        : False,
    ...
}

Distributable

This is True if the coresponding standalone executable can be distributed via the ./install_resources.py and False if the standalone executable needs to be downloaded manually.

META_INFO = {
    ...
    'include_in_git'        : False,
    ...
}

UTranslation Style

Since ursgal translates the general ursgal parameters to engine specific parameters and multiple versions of one engines can be available in ursgal (see e.g. 4+ X! Tandem versions), we define translation styles. Therefore all X! Tandem versions share (up to now) all parameter translation rules, defined as xtandem_style_1. Which translation style is used for which wrapper is defined by this entry in the META info.

META_INFO = {
    ...
    'utranslation_style'    : 'omssa_style_1',
    ...
}

Download information

The download information is required for the install_resources.py script to function.

META_INFO = {
    ...
    ### Below are the download information ###
    'engine': {
        'darwin' : {
            '64bit' : {
                'exe'            : 'omssacl',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.macos.tar.gz',
                'zip_md5'        : '9cb92a98c4d96c34cc925b9336cbaec7',
                'additional_exe' : ['makeblastdb'],
            },
        },
        'linux' : {
            '64bit' : {
                'exe'            : 'omssacl',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.linux.tar.gz',
                'zip_md5'        : '921e01df9cd2a99d21e9a336b5b862c1',
                'additional_exe' : ['makeblastdb'],
            },
        },
        'win32' : {
            '64bit' : {
                'exe'            : 'omssacl.exe',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.win32.exe',
                'zip_md5'        : 'b9d9a8aec3cfe77c48ce0f5752aba8f9',
                'additional_exe' : ['makeblastdb'],
            },
            '32bit' : {
                'exe'            : 'omssacl.exe',
                'url'            : 'ftp://ftp.ncbi.nih.gov/pub/lewisg/omssa/2.1.9/omssa-2.1.9.win32.exe',
                'zip_md5'        : 'a05a5cdd45fd8abcfc75b1236f8a2390',
                'additional_exe' : ['makeblastdb'],
            },
        },
    },
    ...
}

Grouped parameters - uparams.py

The ursgal/uparams.py file holds all parameter information available in ursgal. All default parameters for all nodes are stored there, can be accessed and modified. This file contains one Python dictionary with keys representing the ursgal parameter.

Note

The entries ‘uvalue_option’ and ‘edit_version’ will be removed with ursgal version 0.7.0. Let us know if and why you require them.

Entries:

  • ‘available_in_unode’
    Defines which nodes use this parameter. Complete engine names are given.
  • ‘default_value’
    Defines the default value for this parameter. Please note that these can be adjusted via parameters or profiles.
  • ‘description’
    Provides a short explanatory text for the parameter.
  • ‘edit_version’
    This number is used to determine the most recent version of the uparam on different work stations. Therefore, it should be updated everytime a change in the wrapper is made.
  • ‘trigger_rerun’
    Defines if a change in this parameter will cause the unode to be executed, independently if there are already result files present. Since not all parameter changes require re-execution, this ensures minimal total runtime for pipelines.
  • ‘ukey_translation’
    Defines how the ursgal parameter name is translated into the name in the corresponding engine. The unified parameter name in ursgal helps the user to group the parameter names from different engines and simplifies the parameter handling for the user.
  • ‘utag’
    Helps to sort and group parameters.
  • ‘uvalue_translation’
    Defines how the ursgal parameter value is translated into the value of the corresponding engines. Please note that the value type can change when its translated, in order to be functional for the engine.
  • ‘uvalue_type’
    Defines the uvalue type of this parameter.
  • ‘uvalue_option’
    Provides informations for parameter settings in the GU, e.g. possible parameter value ranges and step sizes. The required informations depend on the uvalue type and are listed in the following.

uvalue_option entries for different uvalue_types:

  • ‘bool’:

    no uvalue_options are required

  • ‘float’/’int’:
    ‘none_val’:

    This value will be set to None (since None cannot be entered in the GUI)

    ‘max’:

    maximal value to be entered

    ‘min’:

    minimal value to be entered

    ‘f-point’:

    number ob decimal points (not available/needed for uvalue_type ‘int’)

    ‘updownval’:

    step size of user control arrows/slider

    ‘unit’:

    unit of the uvalue

  • ‘str’:
    ‘none_val’:

    This value will be set to None (since None cannot be entered in the GUI)

    ‘multiple_line’:

    True, if the user should be allowed to enter multiple lines

  • ‘list’:
    ‘none_val’:

    This value will be set to None (since None cannot be entered in the GUI)

    ‘item_title’:

    generic title for the items in the list

    ‘item_type’:

    type of items in the list,

    ‘custom_val_max’:

    maximum number of entries (?)

    Please note: depending on the item_type, additional options are required, e.g. multiple_line if the item_type is ‘str’

  • ‘dict’:
    ‘none_val’:

    This value will be set to None (since None cannot be entered in the GUI)

    ‘multiple_line’:

    True, if the user should be allowed to enter multiple lines (given for each key in the dict)

    ‘item_titles’:

    generic titles for the keys and values in the dict (given in the same structure as the dict)

    ‘value_types’:

    types of values for each key in the dict

    Please note: depending on the value_types, additional options are required, e.g. max/min/f-point/updownval/unit if the item_type is ‘float’

  • ‘select’:
    ‘select_type’:

    type of selection from available_values, e.g. ‘combo_box’ if combinations of multiple values can be selected, ‘radio_button’ if only one value can be selected

    ‘available_values’:

    list of available values to be selected

    ‘custom_val_max’:

    maximum number of entries (?)

The following example shows the parameter dict for the ‘frag_mass_tolerance’ parameter.

ursgal_params = {
    'frag_mass_tolerance' : {
        'edit_version' : 1.00,
        'available_in_unode' : [
            'moda_v1_51',
            'msamanda_1_0_0_5242',
            'msamanda_1_0_0_5243',
            'msamanda_1_0_0_6299',
            'msamanda_1_0_0_6300',
            'msamanda_1_0_0_7503',
            'msamanda_1_0_0_7504',
            'msfragger_20170103',
            'myrimatch_2_1_138',
            'myrimatch_2_2_140',
            'novor_1_1beta',
            'omssa_2_1_9',
            'pepnovo_3_1',
            'pipi_1_3_0'
            'xtandem_cyclone_2010',
            'xtandem_jackhammer',
            'xtandem_piledriver',
            'xtandem_sledgehammer',
            'xtandem_vengeance',
            'xtandem_alanine',
        ],
        'triggers_rerun' : True,
        'ukey_translation' : {
            'moda_style_1'      : 'FragTolerance',
            'msamanda_style_1'  : 'ms2_tol',
            'msfragger_style_1' : 'fragment_mass_tolerance',
            'myrimatch_style_1' : 'FragmentMzTolerance',
            'novor_style_1'     : 'fragmentIonErrorTol',
            'omssa_style_1'     : '-to',
            'pepnovo_style_1'   : '-fragment_tolerance',
            'pipi_style_1'      : 'ms2_tolerance',
            'xtandem_style_1'   : 'spectrum, fragment monoisotopic mass error',
        },
        'utag' : [
            'fragment',
        ],
        'uvalue_translation' : {
        },
        'uvalue_type' : 'int',
        'uvalue_option' : {
            'none_val'  : None,
            'max'       : 100000,
            'min'       : 0,
            'updownval' : 1,
            'unit'      : ''
        },
        'default_value' : 20,
        'description' : \
            'Mass tolerance of measured and calculated fragment ions',
    },
    ...
}