Other Engines

Combine FDR 0_1

class ursgal.wrappers.combine_FDR_0_1.combine_FDR_0_1(*args, **kwargs)

combine FDR 0_1 UNode

An implementation of the “combined FDR Score” algorithm, as described in: Jones AR, Siepen JA, Hubbard SJ, Paton NW (2009): “Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines.”

Input should be multiple CSV files from different search engines. Each CSV requires a PEP column, for instance by post-processing with Percolator.

Returns a merged CSV file with all PSMs that were found and an added column “Combined FDR Score”.

_execute()

Executing the combine_FDR_0_1 main function with parameters that were defined in preflight (stored in self.command_dict)

The main function is imported and then executed using the parameters from command_dict.

Returns:None
preflight()

Building the list of parameters that will be passed to the combine_FDR_0_1 main function.

These parameters are stored in self.command_dict

Returns:None

Combine PEP 1_0_0

class ursgal.wrappers.combine_pep_1_0_0.combine_pep_1_0_0(*args, **kwargs)

combine_pep_1_0_0 UNode

Combining Multiengine Search Results with “Combined PEP”

“Combined PEP” is a hybrid approach combining elements of the “combined FDR” approach (Jones et al., 2009), elements of PeptideShaker, and elements of Bayes’ theorem. Similar to “combined FDR”, “combined PEP” groups the PSMs. For each search engine, the reported PSMs are treated as a set and the logical combinations of all sets are treated separately as done in the “combined FDR” approach. For instance, three search engines would result in seven PSM groups, which can be visualized by the seven intersections of a three-set Venn diagram. Typically, a PSM group that is shared by multiple engines contains fewer decoy hits and thus represents a higher quality subset and thus its PSMs receive a higher score. This approach is based on the assumption that the search engines agree on the decoys and false-positives as they agree on the targets.

The combined PEP approach uses Bayes’ theorem to calculate a multiengine PEP (MEP) for each PSM based on the PEPs reported by, for example, Percolator for different search engines, that is

http://pubs.acs.org/appl/literatum/publisher/achs/journals/content/jprobs/2016/jprobs.2016.15.issue-3/acs.jproteome.5b00860/20160229/images/pr-2015-00860d_m001.gif

This is done for each PSM group separately.

Then, the combined PEP (the final score) is computed similar to PeptideShaker using a sliding window over all PSMs within each group (sorted by MEP). Each PSM receives a PEP based on the target/decoy ratio of the surrounding PSMs.

http://pubs.acs.org/appl/literatum/publisher/achs/journals/content/jprobs/2016/jprobs.2016.15.issue-3/acs.jproteome.5b00860/20160229/images/pr-2015-00860d_m002.gif

Finally, all groups are merged and the results reported in one output, including all the search result scores from the individual search engines as well as the FDR based on the “combined PEP”.

The sliding window size can be defined by adjusting the Ursgal parameter “window_size” (default is 249).

Input should be multiple CSV files from different search engines. Each CSV requires a PEP column, for instance by post-processing with Percolator.

Returns a merged CSV file with all PSMs that were found and two added columns:

  • column “Bayes PEP”:
    The multi-engine PEP, see explanation above
  • column “combined PEP”:
    The PEP as computed within the engine combination PSMs

For optimal ranking, PSMs should be sorted by combined PEP. Ties can be resolved by sorting them by Bayes PEP.

_execute()

Executing the combine_FDR_0_1 main function with parameters that were defined in preflight (stored in self.command_dict)

The main function is imported and then executed using the parameters from command_dict.

Returns:None
preflight()

Building the list of parameters that will be passed to the combine_pep_1_0_0 main function.

These parameters are stored in self.command_dict

Returns:None

Convert CSV to SSL 1_0_0

class ursgal.wrappers.csv2ssl_1_0_0.csv2ssl_1_0_0(*args, **kwargs)

csv2ssl_1_0_0 UNode

_execute()

Result files (.csv) are converted to spectrum sequence list (.ssl) files. These .ssl can be used as input files for BiblioSpec.

Input file has to be a .csv

Creates a _converted.csv file and returns its path.

ursgal.resources.platform_independent.arc_independent.csv2ssl_1_0_0.csv2ssl_1_0_0.main(input_file=None, output_file=None, score_column_name=None, score_type=None)

Convert csvs to ssl

Convert MS-GF+ MZID to CSV v2016_09_16

class ursgal.wrappers.msgfplus2csv_v2016_09_16.msgfplus2csv_v2016_09_16(*args, **kwargs)

msgfplus2csv_v2016_09_16 UNode Parameter options at https://omics.pnl.gov/software/ms-gf

Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.

postflight()

Convert .tsv result file to .csv

preflight()

mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+

Input file has to be a .mzid

Creates a .csv file and returns its path

Convert MZML to MGF 1_0_0

class ursgal.wrappers.mzml2mgf_1_0_0.mzml2mgf_1_0_0(*args, **kwargs)

mzml2mgf_1_0_0 UNode

Converts .mzML files into .mgf files

Convert X!Tandem XML to CSV 1_0_0

class ursgal.wrappers.xtandem2csv_1_0_0.xtandem2csv_1_0_0(*args, **kwargs)

xtandem2csv_1_0_0 UNode

ursgal.resources.platform_independent.arc_independent.xtandem2csv_1_0_0.xtandem2csv_1_0_0.main(input_file=None, decoy_tag=None, output_file=None)

Converts xTandem.xml files into .csv We need to do this on our own, because mzidentml_lib reports wrong positions for modifications (and it is also not able to convert the piledriver.mzid into csv)

It should be noted that - xtandem groups are not merged (since it is not the same as protein groups) - multiple domains (multiple occurence of a peptide in the same protein) are not reported

Filter CSV 1_0_0

class ursgal.wrappers.filter_csv_1_0_0.filter_csv_1_0_0(*args, **kwargs)

filter_csv_1_0_0 UNode

_execute()

Result files (.csv) are filtered for defined filter parameters.

Input file has to be a .csv

Creates a _accepted.csv file and returns its path. If defined also rejected entries are written to _rejected.csv.

Note

To write the rejected entries define ‘write_unfiltered_results’ as True in the parameters.

Available rules:

  • lte
  • gte
  • lt
  • gt
  • contains
  • contains_not
  • equals
  • equals_not
  • regex

Example

>>> params = {
>>>     'csv_filter_rules':[
>>>         ['PEP', 'lte', 0.01],
>>>         ['Is decoy', 'equals', 'false']
>>>     ]
>>>}

The example above would filter for posterior error probabilities lower than or equal to 0.01 and filter out all decoy proteins.

Rules are defined as list of lists with the first list element as the column name/csv fieldname, the second list element the rule and the third list element the value which should be compared. Multiple rules can be applied, see example above. If the same fieldname should be filtered multiply (E.g. Sequence should not contain ‘T’ and ‘Y’), the rules have to be defined separately.

Example

>>> params = {
>>>     'csv_filter_rules':[
>>>         ['Sequence','contains_not','T'],
>>>         ['Sequence','contains_not','Y']
>>>     ]
>>>}

lte:

‘lower than or equal’ (<=) value has to comparable i.e. float or int. Values are accepted if they are lower than or equal to the defined value. E.g. [‘PEP’,’lte’,0.01]

gte:

‘greater than or equal’ (>=) value has to comparable i.e. float or int. Values are accepted if they are greater than or equal to the defined value. E.g. [‘Exp m/z’,’gte’,180]

lt:

‘lower than’ (<=) value has to comparable i.e. float or int. Values are accepted if they are lower than the defined value. E.g. [‘PEP’,’lt’,0.01]

gt:

‘greater than’ (>=) value has to comparable i.e. float or int. Values are accepted if they are greater than the defined value. E.g. [‘PEP’,’gt’,0.01]

contains:

Substrings are checked if they are present in the the full string. E.g. [‘Modifications’,’contains’,’Oxidation’]

contains_not:

Substrings are checked if they are present in the the full string. E.g. [‘Sequence’,’contains_not’,’M’]

equals:

String comparison (==). Comparison has to be an exact match to pass. E.g. [‘Is decoy’,’equals’,’false’]. Floats and ints are not compared at the moment!

equals_not:

String comparison (!=). Comparisons differing will be rejected. E.g. [‘Is decoy’,’equals_not’,’true’]. Floats and ints are not compared at the moment!

regex:

Any regular expression matching is possible E.g. CT and CD motif search [‘Sequence’,’regex’,’C[T|D]’]

Note

Some spreadsheet tools interpret False and True and show them as upper case when opening the files, even if they are actually written in lower case. This is especially important for target and decoy filtering, i.e. [‘Is decoy’,’equals’,’false’]. ‘false’ has to be lower case, even if the spreadsheet tool displays it as ‘FALSE’.

ursgal.resources.platform_independent.arc_independent.filter_csv_1_0_0.filter_csv_1_0_0.main(input_file=None, output_file=None, filter_rules=None, output_file_unfiltered=None)

Filters csvs

Generate Target Decoy 1_0_0

class ursgal.wrappers.generate_target_decoy_1_0_0.generate_target_decoy_1_0_0(*args, **kwargs)

Generate Target Decoy 1_0_0 UNode

_execute()

Creates a target decoy database based on shuffling of peptides or complete reversing the protein sequence.

The engine currently available generates a very stringent target decoy database by peptide shuffling but also offers the possibility to simple reverse the protein sequence. The mode can be defined in the params with ‘decoy_generation_mode’.

The shuffling peptide method is described below. As one of the first steps redundant sequences are filtered and the protein gets a tag which highlight its double occurence in the database. This ensures that no unequal distribution of target and decoy peptides is present. Further, every peptide is shuffled, while the amindo acids where the enzyme cleaves aremaintained at their original position. Every peptide is only shuffled once and the shuffling result is stored. As a result it is ensured that if a peptide occurs multiple times it is shuffled the same way. It is further ensured that unmutable peptides (e.g. ‘RR’ for trypsin) are not shuffled and are reported by the engine as unmutable peptides in a text file, so that they can be excluded in the further analysis. This way of generating a target decoy database lead to the fulfillment of the following quality criteria (Proteome Bioinformatics, Eds: S.J. Hubbard, A.R. Jones, Humana Press ).

Quality criteria:

  • every target peptide sequence has exactly one decoy peptide sequence
  • equal amino acid distribution
  • equal protein and peptide length
  • equal number of proteins and peptides
  • similar mass distribution
  • no predicted peptides in common

Avaliable modes:

  • shuffle_peptide - stringent target decoy generation with shuffling
    of peptides with maintaining the cleavage site amino acid.
  • reverse_protein - reverses the protein sequence

Available enzymes and their cleavage site can be found in the knowledge base of generate_target_decoy_1_0_0.

ursgal.resources.platform_independent.arc_independent.generate_target_decoy_1_0_0.generate_target_decoy_1_0_0.main(input_files=None, output_file=None, enzyme=None, decoy_tag='decoy_', mode='shuffle_peptide')

Kojak tailored Percolator 2_08

class ursgal.wrappers.kojak_percolator_2_08.kojak_percolator_2_08(*args, **kwargs)

Kojak adjusted Percolator 2_08 UNode

Kojak provides preformatted Percolator input, this is used direclty as the input file for Percolator. In contrast to the original Percolator node, the input files are not reformatted or used to write a new input file.

Note

Percolator (2.08) has to be symlinked or copied to engine-folder ‘kojak_percolator_2_08’ in order to make this node work.

Reference: Käll L, Canterbury JD, Weston J, Noble WS, MacCoss MJ. (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets.

postflight()

Convert the percolator output .tsv into the .csv format with headers as in the unified csv format.

preflight()

Formatting the command line to via self.params

Merge CSVS 1_0_0

class ursgal.wrappers.merge_csvs_1_0_0.merge_csvs_1_0_0(*args, **kwargs)

Merge CSVS 1_0_0 UNode

_execute()

Merges .csv files

for same header, new rows are appended

for different header, new columns are appended

ursgal.resources.platform_independent.arc_independent.merge_csvs_1_0_0.merge_csvs_1_0_0.main(csv_files=None, output=None)

Merges ident csvs

MzidLib 1_6_10

class ursgal.wrappers.mzidentml_lib_1_6_10.mzidentml_lib_1_6_10(*args, **kwargs)

MzidLib 1_6_10 UNode

‘Reisinger F, Krishna R, Ghali F, Ríos D, Hermjakob H, Vizcaíno JA, Jones AR. (2012) jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data.’

Java program to convert results to .mzIdentML and .mzIdentML to .csv

preflight()

Convert .mzid result files from different search engines into .csv result files

For X!Tandem result files first need to be converted into .mzid with raw2mzid

raw2mzid(search_engine=None, translations=None)

Convert raw result files into .mzid result files

MzidLib 1_6_11

class ursgal.wrappers.mzidentml_lib_1_6_11.mzidentml_lib_1_6_11(*args, **kwargs)

MzidLib 1_6_11 UNode

Import functions from mzidentml_lib_1_6_10

class ursgal.wrappers.mzidentml_lib_1_6_10.mzidentml_lib_1_6_10(*args, **kwargs)

MzidLib 1_6_10 UNode

‘Reisinger F, Krishna R, Ghali F, Ríos D, Hermjakob H, Vizcaíno JA, Jones AR. (2012) jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data.’

Java program to convert results to .mzIdentML and .mzIdentML to .csv

preflight()

Convert .mzid result files from different search engines into .csv result files

For X!Tandem result files first need to be converted into .mzid with raw2mzid

raw2mzid(search_engine=None, translations=None)

Convert raw result files into .mzid result files

Percolator 2_08

class ursgal.wrappers.percolator_2_08.percolator_2_08(*args, **kwargs)

Percolator 2_08 UNode

q-value and posterior error probability calculation by a semi-supervised learning algorithm that dynamically learns to separate target from decoy peptide-spectrum matches (PSMs)

Reference: Käll L, Canterbury JD, Weston J, Noble WS, MacCoss MJ. (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets.

postflight()

read the output and merge in back to the ident csv

preflight()

Formating the command line to via self.params

Plot pyGCluster heatmap from CSV 1_0_0

class ursgal.wrappers.plot_pygcluster_heatmap_from_csv_1_0_0.plot_pygcluster_heatmap_from_csv_1_0_0(*args, **kwargs)

plot_pygcluster_heatmap_from_csv_1_0_0 UNode

_execute()

qvality 2_02

class ursgal.wrappers.qvality_2_02.qvality_2_02(*args, **kwargs)

qvality_2_02 UNode

q-value and posterior error probability calculation from score distributions

Reference: Kテ、ll L, Storey JD, Noble WS (2009) QVALITY: non-parametric estimation of q-values and posterior error probabilities.

postflight()

Parse the qvality output and merge it back into the csv file

preflight()

Formating the command line to via self.params

Sanitize CSV 1_0_0

class ursgal.wrappers.sanitize_csv_1_0_0.sanitize_csv_1_0_0(*args, **kwargs)

sanitize_csv_1_0_0 UNode

_execute()

Result files (.csv) are sanitized following defined parameters. That means, for each spectrum PSMs are compared and the best spectrum (spectra) is (are) chosen

Input file has to be a .csv

Creates a _sanitized.csv file and returns its path.

Note

If not specified, the validation_score_field and bigger_scores_better parameters are determined from the last engine. Therefore, if sanitize_csv_1_0_0 is applied to merged or processed result files, both parameters need to be specified.

Available parameters:

  • score_diff_threshold (float): minimum score difference between
    the best PSM and the first rejected PSM of one spectrum
  • threshold_is_log10 (bool): True, if log10 scale has been used for
    score_diff_threshold.
  • accept_conflicting_psms (bool): If True, multiple PSMs for one
    spectrum can be reported if their score difference is below the threshold. If False, all PSMs for one spectrum are removed if the score difference between the best and secondbest PSM is not above the threshold, i.e. if there are conflicting PSMs with similar scores.
  • num_compared_psms (int): maximum number of PSMs (sorted by score,
    starting with the best scoring PSM) that are compared
  • remove_redundant_psms (bool): If True, redundant PSMs (e.g.
    the same identification reported by multiple engined) for the same spectrum are removed. An identification is defined by the combination of ‘Sequence’, ‘Modifications’ and ‘Charge’.
ursgal.resources.platform_independent.arc_independent.sanitize_csv_1_0_0.sanitize_csv_1_0_0.main(input_file=None, output_file=None, grouped_psms=None, validation_score_field=None, bigger_scores_better=None, score_diff_threshold=2.0, log10_threshold=True, accept_conflicting_psms=False, num_compared_psms=2, remove_redundant_psms=False)

Spectra with multiple PSMs are sanitized, i.e. only the PSM with best PEP score is accepted and only if the best hit has a PEP that is at least two orders of magnitude smaller than the others

Unify CSV v1_0_0

class ursgal.wrappers.unify_csv_1_0_0.unify_csv_1_0_0(*args, **kwargs)

unify_csv_1_0_0 UNode

_execute()

Result files from search engines are unified to contain the same informations in the same style

Input file has to be a .csv

Creates a _unified.csv file and returns its path

ursgal.resources.platform_independent.arc_independent.unify_csv_1_0_0.unify_csv_1_0_0.main(input_file=None, output_file=None, scan_rt_lookup=None, params=None, search_engine=None, score_colname=None)
Parameters:
  • input_file (str) – input filename of csv which should be unified
  • output_file (str) – output filename of csv after unifying
  • scan_rt_lookup (dict) – dictionary with entries of scanID to retention time under key ‘scan_2_rt’
  • force (bool) – force True or False
  • params (dict) – params as passed by ursgal
  • search_engine (str) – the search engine the csv file stems from
  • score_colname (str) – the column names of the search engine’s score (i.e. ‘OMSSA:pvalue’)

List of fixes

All engines
  • Retention Time (s) is correctly set using _ursgal_lookup.pkl During mzML conversion to mgf the retention time for every spec is stored in a internal lookup and used later for setting the RT.
  • All modifications are checked if they were given in params[‘modifications’], converted to the name that was given there and sorted according to their position.
  • Fixed modifications are added in ‘Modifications’, if not reported by the engine.
  • The monoisotopic m/z for for each line is calculated (uCalc m/z), since not all engines report the monoisotopic m/z
  • Mass accuracy calculation (in ppm), also taking into account that not always the monoisotopic peak is picked
  • Rows describing the same PSM (i.e. when two proteins share the same peptide) are merged to one row.
X!Tandem
  • ‘RTINSECONDS=’ is stripped from Spectrum Title if present in .mgf or in search result.
Myrimatch
  • Spectrum Title is corrected
  • 15N label is not formatted correctly these modifications are removed for further analysis.
  • When using 15N modifications on amino acids and Carbamidomethyl myrimatch reports sometimes Carboxymethylation on Cystein.
MS-GF+
  • 15N label is not formatted correctly these modifications are removed for further analysis.
  • ‘Is decoy’ column is properly set to true/false
  • Carbamidomethyl is updated and set if label is 15N
OMSSA
  • Carbamidomethyl is updated and set
MS-Amanda
  • multiple protein ID per peptide are splitted in two entries. (is done in MS-Amanda postflight)
MSFragger
  • 15N modification have to be removed from Modifications and the merged modifications have to be corrected.

Upeptide mapper v1_0_0

class ursgal.wrappers.upeptide_mapper_1_0_0.upeptide_mapper_1_0_0(*args, **kwargs)

upeptide_mapper_1_0_0 UNode

_execute()

Peptides from search engine csv file are mapped to the given database(s)

ursgal.resources.platform_independent.arc_independent.upeptide_mapper_1_0_0.upeptide_mapper_1_0_0.main(input_file=None, output_file=None, params=None)

Peptide mapping implementation as Unode.

Parameters:
  • input_file (str) – input filename of csv
  • output_file (str) – output filename
  • params (dict) – dictionary containing ursgal params
Results and fixes
  • All peptide Sequences are remapped to their corresponding protein, assuring correct start, stop, pre and post aminoacid.
  • It is determined if the corresponding proteins are decoy proteins. These peptides are reported after the mapping process.
  • Non-mappable peptides are reported. This can e.g. due to ‘X’ in protein sequences in the fasta file or other non-standard amino acids. These are sometimes replaced/interpreted/interpolated by the search engine. A recheck is performed if the peptides can be mapped containing an ‘X’ at any position. These peptides are also reported. If peptides can still not be mapped after re-mapping, these are reported as well.

Mapper class v4 (dev)

class ursgal.resources.platform_independent.arc_independent.upeptide_mapper_1_0_0.upeptide_mapper_1_0_0.UPeptideMapper_v4(fasta_database)

UPeptideMapper V4

Improved version of class version 3 (changes proposed by Christian)

Note

Uses the implementation of Aho-Corasick algorithm pyahocorasick. Please refer to https://pypi.python.org/pypi/pyahocorasick/ for more information.

cache_database(fasta_database)

Function to cache the given fasta database.

Parameters:fasta_database (str) – path to the fasta database

Note

If the same fasta_name is buffered again all info is purged from the class.

map_peptides(peptide_list)

Function to map a given peptide list in one batch.

Parameters:peptide_list (list) – list with peptides to be mapped
Returns:
Dictionary containing
peptides as keys and lists of protein mappings as values of the given fasta_name
Return type:peptide_2_protein_mappings (dict)

Note

Based on the number of peptides the returned mapping dictionary can become very large.

Warning

The peptide to protein mapping is resetted if a new list o peptides is mapped to the same database (fasta_name).

Examples:

peptide_2_protein_mappings['PEPTIDE']  = [
    {
        'start' : 1,
        'end'   : 10,
        'pre'   : 'K',
        'post'  : 'D',
        'id'    : 'BSA'
    }
]

Mapper class v3 (dev)

class ursgal.resources.platform_independent.arc_independent.upeptide_mapper_1_0_0.upeptide_mapper_1_0_0.UPeptideMapper_v3(fasta_database)

UPeptideMapper V3

New improved version which is faster and consumes less memory than earlier versions. Is the new default version for peptide mapping.

Note

Uses the implementation of Aho-Corasick algorithm pyahocorasick. Please refer to https://pypi.python.org/pypi/pyahocorasick/ for more information.

Warning

The new implementation is still in beta/testing phase. Please use, check and interpret accordingly

cache_database(fasta_database, fasta_name)

Function to cache the given fasta database.

Parameters:
  • fasta_database (str) – path to the fasta database
  • fasta_name (str) – name of the database (e.g. os.path.basename(fasta_database))

Note

If the same fasta_name is buffered again all info is purged from the class.

map_peptides(peptide_list, fasta_name)

Function to map a given peptide list in one batch.

Parameters:
  • peptide_list (list) – list with peptides to be mapped
  • fasta_name (str) – name of the database (e.g. os.path.basename(fasta_database))
Returns:

Dictionary containing

peptides as keys and lists of protein mappings as values of the given fasta_name

Return type:

peptide_2_protein_mappings (dict)

Note

Based on the number of peptides the returned mapping dictionary can become very large.

Warning

The peptide to protein mapping is resetted if a new list o peptides is mapped to the same database (fasta_name).

Examples:

peptide_2_protein_mappings['BSA1']['PEPTIDE']  = [
    {
        'start' : 1,
        'end'   : 10,
        'pre'   : 'K',
        'post'  : 'D',
        'id'    : 'BSA'
    }
]
purge_fasta_info(fasta_name)

Purges regular sequence lookup and fcache for a given fasta_name

Mapper class v2 (deprecated)

class ursgal.resources.platform_independent.arc_independent.upeptide_mapper_1_0_0.upeptide_mapper_1_0_0.UPeptideMapper_v2(word_len=None)

UPeptideMapper class offers ultra fast peptide to sequence mapping using a fast cache, hereafter referred to fcache.

The fcache is build using the build_lookup_from_file or build_lookup functions. The fcache can be queried using the UPeptideMapper.map_peptide() function.

Note

This is the deprectaed version of the peptide mapper which can be used by setting the parameter ‘peptide_mapper_class_version’ to ‘UPeptideMapper_v2’. Otherwise the new mapper class version (‘UPeptideMapper_v3’) is used as default.

_create_fcache(id=None, seq=None, fasta_name=None)

Updates the fast cache with a given sequence

_format_hit_dict(seq, start, end, id)

Creates a formated dictionary from a single mapping hit. At the same time evaluating pre and pos amino acids from the given sequence Final output looks for example like this:

{
    'start' : 12,
    'end'   : 18,
    'id'    : 'Protein Id passed to the function',
    'pre'   : 'A',
    'post'  : 'V',
}

Note

If the pre or post amino acids are N- or C-terminal, respectively, then the reported amino acid will be ‘-‘

build_lookup(fasta_name=None, fasta_stream=None, force=True)

Builds the fast cache and regular sequence dict from a fasta stream

build_lookup_from_file(path_to_fasta_file, force=True)

Builds the fast cache and regular sequence dict from a fasta stream

return the internal fasta name, i.e. dirs stripped away from the path

map_peptide(peptide=None, fasta_name=None, force_regex=False)

Maps a peptide to a fasta database.

Returns a list of single hits which look for example like this:

{
    'start' : 12,
    'end'   : 18,
    'id'    : 'Protein Id passed to the function',
    'pre'   : 'A',
    'post'  : 'V',
}
map_peptides(peptide_list, fasta_name=None, force_regex=False)

Wrapper function to map a given peptide list in one batch.

Parameters:
  • peptide_list (list) – list with peptides to be mapped
  • fasta_name (str) – name of the database
purge_fasta_info(fasta_name)

Purges regular sequence lookup and fcache for a given fasta_name

Venn Diagram v1_0_0

class ursgal.wrappers.venndiagram_1_0_0.venndiagram_1_0_0(*args, **kwargs)

Venn Diagram uNode

_execute()

Plot Venn Diagramm for a list of .csv result files (2-5)

Arguments are set in uparams.py but passed to the engine by self.params attribute

Returns:results for the different areas e.g. dict[‘C-(A|B|D)’][‘results’]

Output file is written to the common_top_level_dir

Return type:dict
ursgal.resources.platform_independent.arc_independent.venndiagram_1_0_0.venndiagram_1_0_0.main(*args, **kwargs)

Creates a simple SVG VennDiagram requires 2, 3, 4 or 5 sets as arguments

Keyword Arguments:
 
  • output_file
  • header
  • label_A
  • label_B
  • label_C
  • label_D
  • label_E
  • color_A – e.g. #FF8C00
  • color_B
  • color_C
  • color_D
  • color_E
  • font

the function returns a dict with the following keys were the results can be accesse by e.g. dict[‘C-(A|B|D)’][‘results’]

‘A&B-(C|D)’ ‘C&D-(A|B)’ ‘B&C-(A|D)’ ‘A&B&C&D’ ‘A&C-(B|D)’ ‘B&D-(A|C)’ ‘A&D-(B|C)’ ‘(A&C&D)-B’ ‘(A&B&D)-C’ ‘(A&B&C)-D’ ‘(B&C&D)-A’ ‘A-(B|C|D)’ ‘D-(A|B|C)’ ‘B-(A|C|D)’ ‘C-(A|B|D)’

or for 2 or 3 or 5 VennDiagrams the appropriate combinations ...