UNode¶
-
class
ursgal.
UNode
(*args, **kwargs)¶ ursgal class
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
_execute
()¶ The _execute unode function
Executes the unode executable via shell.
- Note: internal function
- Unodes that do not require execution via shell redefine the _execute() function in their engine class.
Returns: None
-
_group_psms
(input_file, validation_score_field=None, bigger_scores_better=None)¶ Reads an input csv and returns a defaultdict with the spectrum title mapping to a sorted list of tuples containing each a) score (from validation_score_field) and b) the whole line dict
Keyword Arguments: - validation_score_field (str) – fieldname of the column that should be used as validation score for sorting of PSMs. If None, get_last_search_engine is used to get the validation_score_field defined for the last used search engine.
- bigger_scores_better (bool) – defines if in the validation score are increasing (True) or decreasing (False) with their quality. If None, get_last_search_engine is used to get bigger_scores_better defined for the last used search engine.
-
abs_paths_for_specific_keys
(params, param_keys=None)¶ Absolute paths for specific keys from the params dict are determined
Returns: params with paths in abspath version Return type: dict
-
calc_md5
(input_file)¶ Calculated MD5 for input_file
Parameters: input_file (str) – Path to file Returns: MD5 of input file Return type: str Thanks Raymond :) http://stackoverflow.com/questions/7829499/using-hashlib-to-compute-md5-digest-of-a-file-in-python-3
-
collect_and_translate_params
(params)¶ Translates ursgal parameters into uNode specific syntax.
- Each unode.USED_SEARCH_PARAMS contains params that have
to be passed to the uNode.
params values are not translated is they [] or {}
params values are translated using:
uNode.USEARCH_PARAM_VALUE_TRANSLATIONS > translating only values, regardless of key uNode.USEARCH_PARAM_KEY_VALUE_TRANSLATOR > translating only key:value pairs to key:newValue
Those lookups are found in kb/{engine}.py
- TAG:
- v0.4
-
compare_json_and_local_ursgal_version
(history, json_path)¶ Print a warning if the history is a from a different version number
-
determine_common_name
(input_files, mode=None)¶ The unode function determines for a list of input files a basic common name
Keyword Arguments: mode – head or tail for first or last part of the filename, respectively Parameters: input_files (list) – list with input file names Returns: common file name Return type: str
-
determine_common_top_level_folder
(input_files=None)¶ The unode function determines for a list of input files a common top level folder they all belong to
Keyword Arguments: input_files (list) – list with input files Returns: The common top level folder Return type: str
-
dump_json_and_calc_md5
(stats=None, params=None, calc_md5=True)¶ Dumps json with params and stats and calcs md5 for output
Deletes all entries that are defined in params[‘del_from_params_before_json_dump’] or keys that start with ‘_’
-
fix_md5_and_file_in_json
(ifile=None, json_path=None)¶ Fixes supplementary output json
-
flatten_list
(multi_list=[])¶ The unode get_last_engine function
Reduces a multidimensional list of lists to a flat list including all elements
-
get_last_engine
(history=None, engine_types=None, multiple_engines=False)¶ The unode get_last_engine function
Note: returns None if the specified engine type was not used yet.
Keyword Arguments: - history (list) – A list of path unodes, timestamps and parameters that were used. This function can be used on the history loaded from a file .json, to find out which search engine was used on that file. If not specified, this information is taken from the unode class itself, and not a specific file.
- engine_types (list) – the engine type(s) for which the last used engine should be identified
- multiple_engines (bool) – if muliple engines have been used, this can be set to True. Then reports a list of used engines.
Examples
>>> fpaths = self.generate_basic_file_info( "14N_xtandem.csv" ) >>> file_info, __ = self.load_json( fpaths=fpaths, mode='input') >>> last_engine = self.get_last_engine( history = file_info["history"], engine_types = ["protein_database_search_engine"] ) >>> print( last_engine ) "xtandem_sledgehammer"
Returns: The name of the last engine that was used. Return type: str
-
get_last_engine_type
(history=None)¶ The unode get_last_engine_type function
Keyword Arguments: history (list) – A list of path unodes, timestamps and parameters that were used. This function can be used on the history loaded from a file .json, to find out which search engine was used on that file. If not specified, this information is taken from the unode class itself, and not a specific file. Examples
>>> fpaths = self.generate_basic_file_info( "14N_xtandem.csv" ) >>> file_info, __ = self.load_json( fpaths=fpaths, mode='input') >>> last_engine_type = self.get_last_engine_type( history = file_info["history"], ) >>> print( last_engine_type ) "protein_database_search_engine"
Returns: The type of the last engine that was used. Returns None if the engine_type cannot be specified or if no engine was previously executed on this file. Return type: str
-
get_last_search_engine
(history=None, multiple_engines=False)¶ The unode get_last_search_engine function
Note: returns None if no search engine was not used yet.
Keyword Arguments: - history (list) – A list of path unodes, timestamps and parameters that were used. This function can be used on the history loaded from a file .json, to find out which search engine was used on that file. If not specified, this information is taken from the unode class itself, and not a specific file.
- multiple_engines (bool) – if muliple engines have been used, this can be set to True. Then reports a list of used engines.
Examples
>>> fpaths = self.generate_basic_file_info( "14N_xtandem.csv" ) >>> file_info, __ = self.load_json( fpaths=fpaths, mode='input') >>> last_engine = self.get_last_search_engine( history = file_info["history"] ) >>> print( last_engine ) "xtandem_sledgehammer"
Returns: The name of the last search engine that was used. Returns None if no search engine was used yet. Return type: str
-
import_engine_as_python_function
(function_name=None)¶ The unode import_engine_as_python_function function
Imports the main function from a unodes “executable”. For unodes that are written completely in python and can be executed by importing them instead of using the command line.
Examples
>>> us = ursgal.UController() >>> cFDR_unode = us.unodes["combine_FDR_0_1"]["class"] >>> cFDR_main = cFDR_unode.import_engine_as_python_function() >>> cFDR_main( input_file_list = ["1.csv", "2.csv"], directory = "/tmp/", )
Returns: The function called “main” that is specified in the engines python script. Return type: function Note
Assertion exception if the executable is not a python script, or has no main function.
-
map_mods
()¶ Maps modifications defined in params[“modification”] using unimod.
Examples
>>> [ ... "M,opt,any,Oxidation", # Met oxidation ... "C,fix,any,Carbamidomethyl", # Carbamidomethylation ... "*,opt,Prot-N-term,Acetyl" # N-Acteylation ... ]
-
peptide_regex
(database, protein_id, peptide)¶ Note
This function is not longer used at the moment.
The unode peptide_regex function
Parameters: - database (str) – Name of the used fasta database
- protein_id (str) – protein ID of the processed protein
- peptide (str) – peptide which should be mapped on the protein ID’s sequence
This function takes a peptide sequence and maps it to its according proteins sequence, returning the start and stop position in the sequence as well as the amino acid before and after the peptide sequence in the full protein sequence. If the peptide sequence contains known amino acid substitutions like U (Selenocystein) or J (Leucin or Isoleucin) this amino acid is replaced by a regex wildcard ‘.’ in order to be matchable on the fasta database (this is defined in kb.unify_csv_1_0_0.py). This is especially needed if the original sequence contains a ‘X’ and the search engine guesses/determines the amino acid at this position.
If the protein ID is ambigous, the peptide is matched against all protein candidates and the positions, pre- and post aminoacids in the matching sequence as well as the full protein ID as named in the fasta database is returned. This is especially needed for MS Amanda results where protein IDs are returned truncated and become ambigous for some databases.
If the peptide occurs several times in the protein, all occurences are returned.
The function uses a buffer to perform the regex only once for (peptide, protein, database) tuples. All fasta sequences are also buffered in self.lookups[‘fasta_dbs’] with the name of the database as key and then all protein IDs and sequnces as key, value pairs.
Pre and post amino acids are required for e.g. percolator input files.
Note
The regex and peptide to protein ID mapping may take a while, if a large file has to be processed.
Returns: list of tuples [( peptide_start, peptide_stop, aa_before_peptide, aa_after_peptide, protein_id )] Return type: list
-
postflight
()¶ This can be/is overwritten by the engine uNode class
-
preflight
()¶ This can be/is overwritten by the engine uNode class
-
run
(json_path=None)¶ The general run function.
Runs engine/uNode child with given params on defined input_file. This function is automatically called by all ucontroller functions that take an input file and produce a single output file (i.e. ucontroller.search() and ucontroller.validate() )
Keyword Arguments: - json_path (str) – path to input file json, dumped by a controller
- input_file (#) – path to the input file
- fpaths (#) – dictionary containing file path information.
- If None, this is generated using unode.generate_basic_file_info (#) –
- force (#) – (re)do the analysis if ouput files already exists
Returns: Report of the run.
Return type: dict
Note
Internal function. This function executes the preflight, postflight and _execute functions, if defined in the engine python script.
-
time_point
(tag=None, diff=True, format_time=False, stop=False)¶ Stores time_points in self.stats[‘time_points’] given a tag. returns time since tag was inserted if tag already exists.
-
update_output_json
()¶ Updates self.io[‘output’][‘params’] with self.io[‘input’][‘params’]
Although re-run might not be triggered, we need to update the output_json.
-
update_params_with_io_data
()¶ Generates a flat structure in params combining io.[‘input’][‘finfo’] & io.[‘output’][‘finfo’]
-