UController¶
-
class
ursgal.ucontroller.
UController
(*args, **kwargs)¶ ursgal main class
Keyword Arguments: - params (dict) – params that are used for all further analyses, overriding default values from ursgal/uparams.py
- profile (str) –
Profiles key for faster parameter selection. This idea is adapted from MS-GF+ and translated to all search engines.
Currently available profiles are:
- ’QExactive+’
- ’LTQ XL high res’
- ’LTQ XL low res’
Example:
>>> us = ursgal.UController( ... profile = 'LTQ XL low res', ... params = { 'database': 'BSA.fasta' } ...)
-
combine_search_results
(input_files, engine=None, force=None, output_file_name=None)¶ The ucontroller combine_search_results function combines search result .csv files that were generated by different search engines.
Keyword Arguments: - input_files (list) – A list containing the complete paths to two or more input files. Input files have to be unified result .csv files that were produced by different engines.
- engine (str) – The name of the desired search result combiner. Can also be a shortened version if it is unambigous.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> uc=ursgal.UController() >>> unified_merged_results = [ ... 'BSA_xtandem_piledriver_unified_merged.csv', ... 'BSA_msgfplus_unified_merged.csv', ... 'BSA_omssa_unified_merged.csv' ...] >>> uc.combine_search_results( ... input_files = unified_merged_results, ... engine = 'combine_FDR_0_1' ...)
Note
If you have multiple result files from the same engine, you can merge them with
merge_csvs()
.Returns: Path of the output file Return type: str
-
convert
(input_file, engine=None, force=None, output_file_name=None, guess_engine=False)¶ The UController convert function converts the given input_file into another format as defined by the specified engine.
Keyword Arguments: - input_file (str) – The complete path to the input file.
- engine (str) – The name of the desired converter engine. Can also be a shortened version if it is unambigous.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
- guess_engine (bool) – The converter engine is guessed based on the input file. This works so far for mzml2mgf conversion and conversion of search_engine result files to csv.
Example:
>>> uc=ursgal.UController() >>> unified_merged_results = 'BSA_msgfplus_unified_merged.csv', >>> uc.convert_file( ... input_file = unified_merged_results, ... engine = 'csv2ssl_1_0_0' ...)
Returns: Path of the output file Return type: str
-
convert_results_to_csv
(input_file, force=None, output_file_name=None)¶ The ucontroller convert_results_to_csv function
Note: uses the Java mzidentml library (Reisinger et al., 2012)
Keyword Arguments: - input_file (str) – The complete path to the input, input file currently has to be an identification engine result file
- force (bool) – (re)do the analysis if output files already exists
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> us=ursgal.UController( profile='LTQ XL high res' ) >>> us.convert_results_to_csv( ... input_file = 'my_result.xml', ...)
Returns: Path of the output file Return type: str Notes: internal function, use
convert()
instead
-
convert_to_mgf_and_update_rt_lookup
(input_file, force=None, output_file_name=None)¶ Converts the mzML to mgf and updates the scanID to retention time lookup. The looukp is needed for the unifying of the .csv files.
Parameters: input_file (str) – mzML input file name Returns: name of the output mgf file Return type: str Notes: internal function, use
convert()
instead
-
determine_availability_of_unodes
()¶ The ucontroller determine_availability_of_unodes function
Note: internal function
Checks for engines in ursgal/resources/<platform>/<architecture> and expects the executable to be in the corresponding folder.
-
distinguish_multi_and_single_input
(in_input)¶ Finds out whether the input is a single file or a list of files and returns a bool indicating so, as well as the input file(s)
-
download_resources
(resources=None)¶ Function to download all executable from the specified http url
Keyword Arguments: resources (list) – list of specific resources that should be downloaded. If left to None, all possible resources are downloaded.
-
dump_multi_json
(fpath, fdicts)¶ For UNodes that take multiple input files. Generates a json for the multi-input helper file. This json allows ursgal to check whether input changed or not, to determine if a node has to be re-run or not.
-
engine_sanity_check
(short_engine)¶ The ucontroller engine_sanity_check function
Takes input and name and tries to guess the full engine name, e.g. including the version number. omssa as inpout will yield omssa_2_1_9 if there is only one omssa engine installed, i.e. the mapping (<stored_fulle_engine_name>.startswith( <input> ) has to be unique and defined.
Additionally, sanity check also validates if engine is available on the system.
Note: internal function, since assertion error is called.
Parameters: short_engine (str) – engine short name or tag calls self.guess_engine_name()
Returns: Full name of the engine or None. Return type: str
-
eval_if_run_needs_to_be_executed
(engine=None, force=None)¶ Returns the reason why self.run needs to be executed or None if there is no need
-
execute_misc_engine
(input_file, engine=None, force=None, output_file_name=None, merge_duplicates=False)¶ The UController execute_misc_engine function
This function can be used to execute any misc engine by only giving the input_file and engine name.
Keyword Arguments: - input_file (str) – The complete path to the input, a unified (and possibly merged) search result .csv.
- engine (str) – the name of the validation engine which should be run, can also be a short version if this name is unambigous
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
- merge_duplicates (bool) – If True, the produced output file will be checked for duplicated PSMs, which will be merged into a single line. Caution, the original output file will be overwritten!
Note
Input files to
validate()
must be in unified csv format (i.e. output files ofsearch()
orunify_csv()
).Example:
>>> my_databases = ['homo_sapiensA.fasta', 'homo_sapiensB.fasta'] >>> uc = ursgal.UController() >>> new_target_decoy_db = uc.execute_misc_engine( ... input_files = my_databases, ... engine = 'generate_target_decoy_1_0_0', ... output_file_name = 'my_homo_sapiens_target_decoy_db.fasta' ...)
Returns: Path of the output file Return type: str
-
execute_unode
(input_file, engine=None, force=False, output_file_name=None, dry_run=False, merge_duplicates=False)¶ The UController execute_unode function. Executes arbitrary UNodes, as specified by their name.
Keyword Arguments: - input_file (str or list of str) – The complete path to the input, or a list of paths to the input files.
- engine (str) – Engine name one wants to execute
- force (bool) – (Re)do the analysis if output files already exists
- dry_run (bool) – Do not execute; only return the output file name
Note
Can also execute UNodes that are tagged as ‘in development’ in kb (=not shown in UController overview) if their name is specified.
-
fetch_file
(engine=None)¶ The UController fetch_file function
Downloads files (FTP or HTTP).
Keyword Arguments: engine (str) – Available options are ‘get_http_files_1_0_0’ and ‘get_ftp_files_1_0_0’ Example:
>>> params = { ... 'ftp_url' : 'ftp.peptideatlas.org', ... 'ftp_login' : 'PASS00269', ... 'ftp_password' : 'FI4645a', ... 'ftp_include_ext' : [ ... 'JB_FASP_pH8_2-3_28122012.mzML', ... ], ... 'ftp_output_folder' : '/home/Desktop/, ... } >>> uc = ursgal.UController( ... params = params ... ) >>> uc.fetch_file( ... engine = 'get_ftp_files_1_0_0' ... )
Returns: Path of the downloaded file Return type: str
-
filter_csv
(input_file, force=False, output_file_name=None)¶ - [ WARNING ] This function is not supported anymore!
- Please use
execute_misc_engine()
instead
The UController filter_csv function
Filters .csv files row-wise according to user-defined rules.
Keyword Arguments: - input_file (str) – The complete path to the input, input file has currently to be a .csv file.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
The filter rules have to be defined in the params. See the engine documentation for further information (
filter_csv_1_0_0._execute()
).Example
>>> # Only columns with these attributes will be retained: >>> # a) 'PEP' column value must be lower than or equal to 0.01 >>> # b) 'Is decoy' column value must equal 'false' >>> uc.params['csv_filter_rules'] = [ ... ['PEP', 'lte', 0.01 ], ... ['Is decoy', 'equals', 'false'] ... ] >>> uc.filter_csv( 'my_results.csv' )
-
generate_multi_file_dicts
(input_files)¶ generates a file_dict for access in the UNode classes. in the UNode classes, a file_dict can be found for each input file under self.params[“input_file_dicts”]. also adds some “quick-access” entries to the file_dicts. these file dicts contain the input/output file dicts for that file, as well as quick-access information (i.e. “last_engine”)
-
generate_multi_helper_file
(input_files)¶ for UNodes that take multiple input files. generates a temporary single input helper file, which acts as the input file so that all the routines (set_io, write history) work normally with multiple files.
-
generate_target_decoy
(input_files=None, engine=None, force=False, output_file_name=None)¶ - [ WARNING ] This function is not supported anymore!
- Please use
execute_misc_engine()
instead
The ucontroller function for target_decoy database generation.
Keyword Arguments: - input_files (list) – List with complete paths to one or more fasta databases.
- engine (str) – name of the database generator which should be run, can also be a short version if this name is unambigous
- force (bool) – (re)do the analysis if ouput files already exists
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> my_databases = ['homo_sapiensA.fasta', 'homo_sapiensB.fasta'] >>> uc = ursgal.UController() >>> new_target_decoy_db = uc.generate_target_decoy( ... input_files = my_databases, ... engine = 'generate_target_decoy_1_0_0', ... output_file_name = 'my_homo_sapiens_target_decoy_db.fasta' ...)
The returned database can then be set as the new database for searches.
Example:
>>> uc.params['database'] = new_target_decoy_db
Returns: Name/path of the output file Return type: str
-
get_mzml_that_corresponds_to_mgf
(mgf_path)¶ Checks the history of a MGF file to determine which mzML is stems from. Returns the path to that mzML.
-
guess_engine_name
(short_engine)¶ The ucontroller function for guessing the right engine name from a short name. For example ‘omssa’ is translated into omssa_2_1_9 which is the only available version of omssa in ursgal. If you use an ambigous name or if a engine has multiple version, it is required to name the engine unambigously. Instead of myrimatch use myrimatch_2_1_138.
Parameters: short_engine (str) – engine short name or tag Iterates over self.unodes.keys() and checks if:
- the keys start with the short_engine
- that the match is unique
Notes: internal function
Returns: - Full name of engine or None if short_engine has
- multiple hits
Return type: str
-
input_file_sanity_check
(input_file, engine=None, extensions=None, multi=False, custom_str=None)¶ The ucontroller input_file_sanity_check function
Asserts that input files exist, can be read, have the right file type and file extension etc. Raises an AssertionError if any criterion is violated.
Keyword Arguments: - input_file (str or list) – input file path to be checked, or a list of input file paths in the case of multi-nodes
- engine (str) – the name of the engine, file extension requirements will be looked up in engine/kb (optional)
- extensions (list) – a list of permitted file extensions (optional)
- multi (bool) – whether the UNode accepts multiple input files or not
Note
Internal Function
Returns: None
-
map_peptides_to_fasta
(input_file, force=False, output_file_name=None)¶ - [ WARNING ] This function is not supported anymore!
- Please use
execute_misc_engine()
instead
The ucontroller function to call the upeptide_mapper node.
Note
Different converter versions can be used (see parameter ‘peptide_mapper_converter_version’) as well as different classes inside the converter node (see parameter ‘peptide_mapper_class_version’ )
- Available converter nodes
- upeptide_mapper_1_0_0
- Available converter classes of upeptide_mapper_1_0_0
- UPeptideMapper_v3 (default)
- UPeptideMapper_v4 (no buffering and enhanced speed to v3)
- UPeptideMapper_v2
Keyword Arguments: - input_file (str) – The complete path to the input, input file has currently to be a .csv file.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Returns: Path of the output file
Return type: str
-
merge_csvs
(input_files, force=None, output_file_name=None, merge_duplicates=False)¶ The ucontroller merge_csvs function
Merges unified .csv files generated by the same search engine into a single .csv file. This is needed if you want to validate search results from the same identification engine on multiple mzML files. For example if multiple fraction of the original sample for LS-MS/MS analysis were measured and represent a sample/analysis entity.
Keyword Arguments: - input_files (list) – A list containing the complete paths to two or more input files. Input files have to be .csv files.
- force (bool) – (re)do the analysis if output file already exists
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> us = ursgal.UController() >>> xtandem_results = [ ... 'BSA_1_xtandem_sledgehammer_unified.csv', ... 'BSA_2_xtandem_sledgehammer_unified.csv', ... 'BSA_3_xtandem_sledgehammer_unified.csv' ... ] >>> us.merge_csvs( input_files = xtandem_results )
Returns: Path of the output file Return type: str
-
quantify
(input_file, engine, force=None, output_file_name=None, multi=False)¶ The ucontroller quantify function
Performs a peptide/protein quantification using the specified quantification engine and mzML/ident file file. Produces a CSV file with peptide/protein quants in the unified Ursgal CSV format. see: List of available engines
Keyword Arguments: - input_file (str) – The complete path to the mzML file.
- engine (str) – The name of the quantification engine which should be used, can also be a short version if this name is unambigous.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> uc = ursgal.UController( ... profile = 'LTQ XL high res', ... params = {'evidence': 'BSA_idents.csv'} ... ) >>> uc.quantify( ... input_file = 'BSA.mzML', ... engine = 'pyQms_0_0_1' ... )
Returns: Path of the output file (unified CSV format) Return type: str
-
run_unode_if_required
(force, engine_name, answer, merge_duplicates=False, history_addon=None)¶ The ucontroller run_unode_if_required function
Note
internal function
Executes a UNode if required. Otherwise prints why the run was not required. If the UNode is executed, the corresponding json is dumped and the history is updated.
Keyword Arguments: - force (bool) – (re)do the analysis if output files already exists
- engine_name (str) – name of the engine to be executed (after verifying with engine_sanity_check )
- answer (str or None) – The answer of prepare_unode_run(). Can be None if no re-run is required, or a string indicating the reason for re-run
-
sanitize_userdefined_output_filename
(user_fname, engine)¶ If the user defined a node output file name, we remove all path info from it (not supported) and throw a warning; possibly add a prefix; possibly add the correct file extension (if user didn’t already include it)
-
search
(input_file, engine=None, force=None, output_file_name=None, multi=False)¶ The ucontroller search function
Performs a peptide search using the specified search engine and mzML file. Produces a CSV file with peptide spectrum matches in the unified Ursgal CSV format. see: List of available engines
Keyword Arguments: - input_file (str) – The complete path to the mzML file, or an MGF file that was converted from mzML.
- engine (str) – The name of the identification engine which should be used, can also be a short version if this name is unambigous.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
- Example::
>>> uc = ursgal.UController( ... profile = 'LTQ XL high res', ... params = {'database': 'BSA.fasta'} ... ) >>> uc.search( ... input_file = 'BSA.mzML', ... engine = 'omssa' ... )
Returns: Path of the output file (unified CSV format) Return type: str Note
Some search engines require a lot of RAM (up to 14GB, depending on your input files). If you don’t have a lot of RAM, some engines might crash. Consider using X!Tandem or OMSSA in these cases, since they are less demanding.
Note
This function calls five search-related ursgal functions in succession, all of which can also be called individually:
convert()
(mzml to mgf, if required, using the mzml2mgf engine)search_mgf()
convert()
(raw search results to csv, if required)execute_misc_engine()
(peptide_mapper)execute_misc_engine()
(unify_csv)
-
search_mgf
(input_file, engine=None, force=None, output_file_name=None, multi=False)¶ The UController search_mgf function
Does the main peptide identification search with the specified identification engine. This function is called with every mzML and every search which should be used. The function uses
UNode.run()
to execute a single search engine. For example to execute X!Tandem via command line.Keyword Arguments: - input_file (str) – The complete path to the input, input file has to be a .MGF file (but .mzML files can be converted to .MGF with Ursgal)
- engine (str) – the name of the identification engine which should be run, can also be a short version if this name is unambigous.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> uc = ursgal.UController( ... profile ='LTQ XL high res', ... params = {'database': 'BSA.fasta'} ... ) >>> uc.search_mgf( ... input_file = 'BSA.mgf', ... engine = 'xtandem_piledriver' ... )
Returns: Path of the output file Return type: str
-
set_file_info_dict
(in_file)¶ Splits ext and path and so on
-
set_profile
(profile, dev_mode=False)¶ The ucontroller set_profile function
Note
internal function
Parameters: profile (str) – Profile speficied to use for all searches. Available profiles:
- ‘QExactive+’
- ‘LTQ XL high res’
- ‘LTQ XL low res’
Sets self.params according to profile name defined in ursgal.kb.profiles
Example:
>>>'LTQ XL low res' : { ... # MS 1 orbitrap & MSn iontrap ... 'frag_mass_tolerance' : 0.5, ... 'frag_mass_tolerance_unit' : 'da', ... 'instrument' : 'low_res_LTQ', ... 'frag_method' : 'cid' ...}
Own profiles can easily be defined in profiles.py in ursgal/kb according to the need parameters or machine specifications.
-
show_unode_overview
()¶ The ucontroller show_unode_overview function
Note
internal function
Prints the overview of all available nodes. The overview includes the category, name and availability of each node. Available nodes are highlighted. Here also the correct functionality of the engine avaibility and installation is verified.
-
unify_csv
(input_file, force=False, output_file_name=None)¶ - [ WARNING ] This function is not supported anymore!
- Please use
execute_misc_engine()
instead
The ucontroller unify_csv function
Unifies the .csv files which were converted by the mzidentml library. The corrections for each engine are listed in the node under ursgal/resources/arc_independent/unify_csv_1_0_0
Keyword Arguments: - input_file (str) – The complete path to the input, input file has currently to be a .csv file.
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> uc=ursgal.UController( ... profile = 'LTQ XL low res', ... params = {'database': 'BSA.fasta'} ... ) >>> xtandem_result_xml = uc.search_mgf( ... input_file = 'BSA.mzML', ... engine = 'xtandem', ... ) >>> xtandem_result_csv = uc.convert_results_to_csv( ... input_file = xtandem_result_xml ... ) >>> unified_csv = uc.unify_csv( ... input_file = xtandem_result_csv ... )
Returns: Path of the output file Return type: str
-
validate
(input_file, engine=None, force=None, output_file_name=None)¶ The UController validate function
Does statistical post-processing of unified search result .csv files with the specified validation engine.
Depending on the validation method a posterior error probability (PEP) and/or a q-value will be available in the final results.
Keyword Arguments: - input_file (str) – The complete path to the input, a unified (and possibly merged) search result .csv.
- engine (str) – the name of the validation engine which should be run, can also be a short version if this name is unambigous
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Note
Input files to
validate()
must be in unified csv format (i.e. output files ofsearch()
orunify_csv()
).Example:
>>> uc = ursgal.UController( ... profile = 'LTQ XL low res', ... params = {'database': 'BSA.fasta'} ... ) >>> xtandem_result_csv = uc.search( ... input_file = 'BSA.mzML', ... engine = 'xtandem_piledriver' ... ) >>> validated_csv = uc.validate( ... input_file = xtandem_result_csv, ... engine = 'percolator_2_08' ... )
Returns: Path of the output file Return type: str
-
verify_engine_produced_an_output_file
(expected_fpath, engine_name)¶ Since not all engines raise an exception when they fail, we check if the output file was successfully produced or not to throw a proper exception in case the engine crashed.
-
visualize
(input_files, engine=None, force=None, output_file_name=None, multi=True)¶ The ucontroller function for visualization
Does graphical visualization of result .csv files.
Keyword Arguments: - input_files (list) – list with complete paths of .csv files
- engine (str) – the name of the visualizer which should be run, can also be a short version if this name is unambigous
- force (bool) – (Re)do the analysis, even if output file already exists.
- output_file_name (str or None) – Desired output file name excluding path (optional). If None, output file name will be auto-generated.
Example:
>>> uc = ursgal.UController( profile='LTQ XL high res' ) >>> xtandem_result_csv = uc.search( ... input_file = 'BSA.mzML', ... engine = 'xtandem_piledriver', ... ) >>> omssa_result_csv = uc.search( ... input_file = 'BSA.mzML', ... engine = 'omssa', ... ) >>> uc.visualize( ... input_files = [xtandem_result_csv, omssa_result_csv], ... engine = 'venndiagram', ... )
Note
For detailed information about the VennDiagram UNode, see
venndiagram_1_0_0._execute()
.Returns: Path of the output file Return type: str