UNode¶

class ursgal.UNode(*args, **kwargs)¶

ursgal class

__init__(*args, **kwargs)¶: Initialize self. See help(type(self)) for accurate signature.

__weakref__¶: list of weak references to the object (if defined)

_execute()¶

The _execute unode function

Executes the unode executable via shell.

Note: internal function: Unodes that do not require execution via shell redefine the _execute() function in their engine class.

Returns:	None

_group_psms(input_file, validation_score_field=None, bigger_scores_better=None)¶

Reads an input csv and returns a defaultdict with the spectrum title mapping to a sorted list of tuples containing each a) score (from validation_score_field) and b) the whole line dict

Keyword Arguments:

validation_score_field (str) – fieldname of the column that should be used as validation score for sorting of PSMs. If None, get_last_search_engine is used to get the validation_score_field defined for the last used search engine.
bigger_scores_better (bool) – defines if in the validation score are increasing (True) or decreasing (False) with their quality. If None, get_last_search_engine is used to get bigger_scores_better defined for the last used search engine.

abs_paths_for_specific_keys(params, param_keys=None)¶

Absolute paths for specific keys from the params dict are determined

Returns:	params with paths in abspath version
Return type:	dict

calc_md5(input_file)¶

Calculated MD5 for input_file

Parameters:	input_file (str) – Path to file
Returns:	MD5 of input file
Return type:	str

Thanks Raymond :) http://stackoverflow.com/questions/7829499/using-hashlib-to-compute-md5-digest-of-a-file-in-python-3

collect_and_translate_params(params)¶

Translates ursgal parameters into uNode specific syntax.

Each unode.USED_SEARCH_PARAMS contains params that have

to be passed to the uNode.
params values are not translated is they [] or {}

params values are translated using:

uNode.USEARCH_PARAM_VALUE_TRANSLATIONS
> translating only values, regardless of key
uNode.USEARCH_PARAM_KEY_VALUE_TRANSLATOR
> translating only key:value pairs to key:newValue

Those lookups are found in kb/{engine}.py

TAG:

v0.4

compare_json_and_local_ursgal_version(history, json_path)¶: Print a warning if the history is a from a different version number

determine_common_name(input_files, mode=None)¶

The unode function determines for a list of input files a basic common name

Keyword Arguments:
	mode – head or tail for first or last part of the filename, respectively
Parameters:	input_files (list) – list with input file names
Returns:	common file name
Return type:	str

determine_common_top_level_folder(input_files=None)¶

The unode function determines for a list of input files a common top level folder they all belong to

Keyword Arguments:
	input_files (list) – list with input files
Returns:	The common top level folder
Return type:	str

dump_json_and_calc_md5(stats=None, params=None, calc_md5=True)¶

Dumps json with params and stats and calcs md5 for output

Deletes all entries that are defined in params[‘del_from_params_before_json_dump’] or keys that start with ‘_’

fix_md5_and_file_in_json(ifile=None, json_path=None)¶: Fixes supplementary output json

flatten_list(multi_list=[])¶

The unode get_last_engine function

Reduces a multidimensional list of lists to a flat list including all elements

get_last_engine(history=None, engine_types=None, multiple_engines=False)¶

The unode get_last_engine function

Note: returns None if the specified engine type was not used yet.

Keyword Arguments:

history (list) – A list of path unodes, timestamps and parameters that were used. This function can be used on the history loaded from a file .json, to find out which search engine was used on that file. If not specified, this information is taken from the unode class itself, and not a specific file.
engine_types (list) – the engine type(s) for which the last used engine should be identified
multiple_engines (bool) – if muliple engines have been used, this can be set to True. Then reports a list of used engines.

Examples

>>> fpaths = self.generate_basic_file_info( "14N_xtandem.csv" )
>>> file_info, __    = self.load_json( fpaths=fpaths, mode='input')
>>> last_engine      = self.get_last_engine(
        history      = file_info["history"],
        engine_types = ["protein_database_search_engine"]
    )
>>> print( last_engine )
"xtandem_sledgehammer"

Returns:	The name of the last engine that was used.
Return type:	str

get_last_engine_type(history=None)¶

The unode get_last_engine_type function

Keyword Arguments:
	history (list) – A list of path unodes, timestamps and parameters that were used. This function can be used on the history loaded from a file .json, to find out which search engine was used on that file. If not specified, this information is taken from the unode class itself, and not a specific file.

Examples

>>> fpaths = self.generate_basic_file_info( "14N_xtandem.csv" )
>>> file_info, __    = self.load_json( fpaths=fpaths, mode='input')
>>> last_engine_type = self.get_last_engine_type(
        history      = file_info["history"],
    )
>>> print( last_engine_type )
"protein_database_search_engine"

Returns:	The type of the last engine that was used. Returns None if the engine_type cannot be specified or if no engine was previously executed on this file.
Return type:	str

get_last_search_engine(history=None, multiple_engines=False)¶

The unode get_last_search_engine function

Note: returns None if no search engine was not used yet.

Keyword Arguments:

history (list) – A list of path unodes, timestamps and parameters that were used. This function can be used on the history loaded from a file .json, to find out which search engine was used on that file. If not specified, this information is taken from the unode class itself, and not a specific file.
multiple_engines (bool) – if muliple engines have been used, this can be set to True. Then reports a list of used engines.

Examples

>>> fpaths = self.generate_basic_file_info( "14N_xtandem.csv" )
>>> file_info, __ = self.load_json( fpaths=fpaths, mode='input')
>>> last_engine   = self.get_last_search_engine(
        history   = file_info["history"]
    )
>>> print( last_engine )
"xtandem_sledgehammer"

Returns:	The name of the last search engine that was used. Returns None if no search engine was used yet.
Return type:	str

import_engine_as_python_function(function_name=None)¶

The unode import_engine_as_python_function function

Imports the main function from a unodes “executable”. For unodes that are written completely in python and can be executed by importing them instead of using the command line.

Examples

>>> us = ursgal.UController()
>>> cFDR_unode = us.unodes["combine_FDR_0_1"]["class"]
>>> cFDR_main  = cFDR_unode.import_engine_as_python_function()
>>> cFDR_main(
    input_file_list = ["1.csv", "2.csv"],
    directory       = "/tmp/",
)

Returns:	The function called “main” that is specified in the engines python script.
Return type:	function

Note

Assertion exception if the executable is not a python script, or has no main function.

map_mods()¶

Maps modifications defined in params[“modification”] using unimod.

Examples

>>> [
...    "M,opt,any,Oxidation",        # Met oxidation
...    "C,fix,any,Carbamidomethyl",  # Carbamidomethylation
...    "*,opt,Prot-N-term,Acetyl"    # N-Acteylation
... ]

peptide_regex(database, protein_id, peptide)¶

Note

This function is not longer used at the moment.

The unode peptide_regex function

Parameters:	database (str) – Name of the used fasta database protein_id (str) – protein ID of the processed protein peptide (str) – peptide which should be mapped on the protein ID’s sequence

This function takes a peptide sequence and maps it to its according proteins sequence, returning the start and stop position in the sequence as well as the amino acid before and after the peptide sequence in the full protein sequence. If the peptide sequence contains known amino acid substitutions like U (Selenocystein) or J (Leucin or Isoleucin) this amino acid is replaced by a regex wildcard ‘.’ in order to be matchable on the fasta database (this is defined in kb.unify_csv_1_0_0.py). This is especially needed if the original sequence contains a ‘X’ and the search engine guesses/determines the amino acid at this position.

If the protein ID is ambigous, the peptide is matched against all protein candidates and the positions, pre- and post aminoacids in the matching sequence as well as the full protein ID as named in the fasta database is returned. This is especially needed for MS Amanda results where protein IDs are returned truncated and become ambigous for some databases.

If the peptide occurs several times in the protein, all occurences are returned.

The function uses a buffer to perform the regex only once for (peptide, protein, database) tuples. All fasta sequences are also buffered in self.lookups[‘fasta_dbs’] with the name of the database as key and then all protein IDs and sequnces as key, value pairs.

Pre and post amino acids are required for e.g. percolator input files.

Note

The regex and peptide to protein ID mapping may take a while, if a large file has to be processed.

Returns:	list of tuples [( peptide_start, peptide_stop, aa_before_peptide, aa_after_peptide, protein_id )]
Return type:	list

postflight()¶: This can be/is overwritten by the engine uNode class

preflight()¶: This can be/is overwritten by the engine uNode class

run(json_path=None)¶

The general run function.

Runs engine/uNode child with given params on defined input_file. This function is automatically called by all ucontroller functions that take an input file and produce a single output file (i.e. ucontroller.search() and ucontroller.validate() )

Keyword Arguments:
	json_path (str) – path to input file json, dumped by a controller input_file (#) – path to the input file fpaths (#) – dictionary containing file path information. If None, this is generated using unode.generate_basic_file_info (#) – force (#) – (re)do the analysis if ouput files already exists
Returns:	Report of the run.
Return type:	dict

Note

Internal function. This function executes the preflight, postflight and _execute functions, if defined in the engine python script.

time_point(tag=None, diff=True, format_time=False, stop=False)¶: Stores time_points in self.stats[‘time_points’] given a tag. returns time since tag was inserted if tag already exists.

update_output_json()¶

Updates self.io[‘output’][‘params’] with self.io[‘input’][‘params’]

Although re-run might not be triggered, we need to update the output_json.

update_params_with_io_data()¶: Generates a flat structure in params combining io.[‘input’][‘finfo’] & io.[‘output’][‘finfo’]