Converter Engines¶
Convert CSV to SSL 1_0_0¶
-
class
ursgal.wrappers.csv2ssl_1_0_0.
csv2ssl_1_0_0
(*args, **kwargs)¶ csv2ssl_1_0_0 UNode
-
_execute
()¶ Result files (.csv) are converted to spectrum sequence list (.ssl) files. These .ssl can be used as input files for BiblioSpec.
Input file has to be a .csv
Creates a _converted.csv file and returns its path.
-
-
ursgal.resources.platform_independent.arc_independent.csv2ssl_1_0_0.csv2ssl_1_0_0.
main
(input_file=None, output_file=None, score_column_name=None, score_type=None)¶ Convert csvs to ssl
Convert CSV to Counted Results¶
-
class
ursgal.wrappers.csv2counted_results_1_0_0.
csv2counted_results_1_0_0
(*args, **kwargs)¶ csv2counted_results_1_0_0 UNode
-
_execute
()¶ Results (.csv) are summarized as table (.csv) containing all identified proteins, peptides, or other specified identifiers. For each sample, the peptide or spectral count for each identifier is given.
Input file has to be a .csv
Creates a _counted.csv file and returns its path.
Columns containing the elements that should be counted (identifiers) are given as a list of headers using uc.params[“identifier_column_names”]. Columns defining a unique countable element (e.g. “Sequence”, “Spectrum ID”) are given as a list of headers using uc.params[“count_column_names”].
This can be used to create a SFINX (http://sfinx.ugent.be/) input file, using:
uc.params[“convert_to_sfinx”]=True uc.params[“identifier_colum_names”]=[“Protein ID”] uc.params[“count_column_names”]=[“Sequence”]
-
-
ursgal.resources.platform_independent.arc_independent.csv2counted_results_1_0_0.csv2counted_results_1_0_0.
main
(input_file=None, output_file=None, identifier_colum_names=None, count_column_names=None, count_by_file=True, convert2sfinx=False, keep_column_names=None)¶ Results (.csv) are summarized as table (.csv) containing all identified proteins, peptides, or other specified identifiers. For each sample, the peptide or spectral count for each identifier is given.
This can be used to convert .csv files to SFINX input files.
This is a .csv file containing unique peptide counts for all identified proteins. However, this can be modified using the keywords “identifier_colum_names” and “count_column_names”
Keyword Arguments: - input_file (str) – name including path for the input file
- output_file (str) – name including path for the output file
- identifier_colum_names (list) – list of column headers that define the identifier. Multiple column names are joined for combined identifiers.
- count_column_names (list) – list of column headers which are used for counting.
- count_by_file (bool) – the number of unique hits for each identifier is given in seperate columns for each raw file (file name as defiened in Spectrum Title)
- convert2sfinx (bool) – If True, the header of the identifier column is “rownames”. If False, the joined header name will be used.
- keep_column_names (list) – list of column headers which are not used as identifiers but kept in the output, e.g. when counting [‘Sequence’, ‘Modifications’] the column [‘Protein ID’] could be specified here. Multiple entries for one identifier (e.g. when identifier_column_names = [‘Potein ID’] and keep_column_names = [‘Sequence’]) are seperated by ‘<#>’.
Convert Mascot DAT to CSV¶
-
class
ursgal.wrappers.mascot_dat2csv_1_0_0.
mascot_dat2csv_1_0_0
(*args, **kwargs)¶ Dummy to merge mascot data into usgal workflow
-
ursgal.resources.platform_independent.arc_independent.mascot_dat2csv_1_0_0.mascot_dat2csv_1_0_0.
main
(input_file, output_file)¶
Convert MS-GF+ MZID to CSV¶
-
class
ursgal.wrappers.msgfplus2csv_py_v1_0_0.
msgfplus2csv_py_v1_0_0
(*args, **kwargs)¶ msgfplus2csv_py v1.0.0 UNode
-
class
ursgal.wrappers.msgfplus2csv_v1_2_1.
msgfplus2csv_v1_2_1
(*args, **kwargs)¶ msgfplus2csv_v1.2.1 UNode Parameter options at https://omics.pnl.gov/software/ms-gf
Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
-
postflight
()¶ Convert .tsv result file to .csv and translates headers
-
preflight
()¶ mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+
Input file has to be a .mzid or .mzid.gz
Creates a .csv file and returns its path
Mzid to Tsv Converter Usage: MzidToTsvConverter -mzid:”mzid path” [-tsv:”tsv output path”] [-unroll|-u] [-showDecoy|-sd]
- Required parameters:
- ‘-mzid:path’ - path to mzid[.gz] file; if path has spaces, it must be in quotes.
- Optional parameters:
- ‘-tsv:path’ - path to tsv file to be written; if not specified, will be output to same location as mzid ‘-unroll|-u’ signifies that results should be unrolled - one line per unique peptide/protein combination in each spectrum identification ‘-showDecoy|-sd’ signifies that decoy results should be included in the result tsv
-
-
class
ursgal.wrappers.msgfplus2csv_v1_2_0.
msgfplus2csv_v1_2_0
(*args, **kwargs)¶ msgfplus_C_mzid2csv_v1.2.0 UNode Parameter options at https://omics.pnl.gov/software/ms-gf
Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
-
class
ursgal.wrappers.msgfplus2csv_v2017_07_04.
msgfplus2csv_v2017_07_04
(*args, **kwargs)¶ msgfplus_C_mzid2csv_v2017_07_04 UNode Parameter options at https://omics.pnl.gov/software/ms-gf
Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
-
postflight
()¶ Convert .tsv result file to .csv and translates headers
-
preflight
()¶ mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+
Input file has to be a .mzid or .mzid.gz
Creates a .csv file and returns its path
Mzid to Tsv Converter Usage: MzidToTsvConverter -mzid:”mzid path” [-tsv:”tsv output path”] [-unroll|-u] [-showDecoy|-sd]
- Required parameters:
- ‘-mzid:path’ - path to mzid[.gz] file; if path has spaces, it must be in quotes.
- Optional parameters:
- ‘-tsv:path’ - path to tsv file to be written; if not specified, will be output to same location as mzid ‘-unroll|-u’ signifies that results should be unrolled - one line per unique peptide/protein combination in each spectrum identification ‘-showDecoy|-sd’ signifies that decoy results should be included in the result tsv
-
-
class
ursgal.wrappers.msgfplus2csv_v2017_01_27.
msgfplus2csv_v2017_01_27
(*args, **kwargs)¶ msgfplus2csv_v2017_01_27 UNode Parameter options at https://omics.pnl.gov/software/ms-gf
Reference:
Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
-
class
ursgal.wrappers.msgfplus2csv_v2016_09_16.
msgfplus2csv_v2016_09_16
(*args, **kwargs)¶ msgfplus2csv_v2016_09_16 UNode Parameter options at https://omics.pnl.gov/software/ms-gf
Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
-
postflight
()¶ Convert .tsv result file to .csv
-
preflight
()¶ mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+
Input file has to be a .mzid
Creates a .csv file and returns its path
-
Convert MZML to MGF¶
The mzML to mgf converter version 2.0.0 requires pymzML 2.0, while the previous version can be used with older pymzML versions.
-
class
ursgal.wrappers.mzml2mgf_2_0_0.
mzml2mgf_2_0_0
(*args, **kwargs)¶ mzml2mgf_2_0_0 UNode
Version two works only with pymzML version 2.0.0 or higher!
Converts .mzML files into .mgf files
-
ursgal.resources.platform_independent.arc_independent.mzml2mgf_2_0_0.mzml2mgf_2_0_0.
main
(mzml=None, mgf=None, i_decimals=5, mz_decimals=5, machine_offset_in_ppm=None, scan_exclusion_list=None, scan_inclusion_list=None, prefix=None, scan_skip_modulo_step=None, ms_level=2, precursor_min_charge=1, precursor_max_charge=5, ion_mode='+', spec_id_attribute=None, signal_to_noise_threshold=None)¶
-
class
ursgal.wrappers.mzml2mgf_1_0_0.
mzml2mgf_1_0_0
(*args, **kwargs)¶ mzml2mgf_1_0_0 UNode
Converts .mzML files into .mgf files
Convert X!Tandem XML to CSV 1_0_0¶
-
class
ursgal.wrappers.xtandem2csv_1_0_0.
xtandem2csv_1_0_0
(*args, **kwargs)¶ xtandem2csv_1_0_0 UNode
-
ursgal.resources.platform_independent.arc_independent.xtandem2csv_1_0_0.xtandem2csv_1_0_0.
main
(input_file=None, decoy_tag=None, output_file=None)¶ Converts xTandem.xml files into .csv We need to do this on our own, because mzidentml_lib reports wrong positions for modifications (and it is also not able to convert the piledriver.mzid into csv)
It should be noted that - xtandem groups are not merged (since it is not the same as protein groups) - multiple domains (multiple occurence of a peptide in the same protein) are not reported
MzidLib¶
-
class
ursgal.wrappers.mzidentml_lib_1_7.
mzidentml_lib_1_7
(*args, **kwargs)¶ MzidLib 1_7UNode
Import functions from mzidentml_lib_1_6_10
Note
Please download and install manually from http://www.proteoannotator.org/?q=installation
-
class
ursgal.wrappers.mzidentml_lib_1_6_11.
mzidentml_lib_1_6_11
(*args, **kwargs)¶ MzidLib 1_6_11 UNode
Import functions from mzidentml_lib_1_6_10
-
class
ursgal.wrappers.mzidentml_lib_1_6_10.
mzidentml_lib_1_6_10
(*args, **kwargs)¶ MzidLib 1_6_10 UNode
‘Reisinger F, Krishna R, Ghali F, Ríos D, Hermjakob H, Vizcaíno JA, Jones AR. (2012) jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data.’
Java program to convert results to .mzIdentML and .mzIdentML to .csv
-
preflight
()¶ Convert .mzid result files from different search engines into .csv result files
For X!Tandem result files first need to be converted into .mzid with raw2mzid
-
raw2mzid
(search_engine=None, translations=None)¶ Convert raw result files into .mzid result files
-
pParse 2.0¶
-
class
ursgal.wrappers.pparse_2_0.
pparse_2_0
(*args, **kwargs)¶ Unode for pParse included in pGlyco 2.2.0 For further information visit http://pfind.ict.ac.cn/software/pParse/#Downloads
Note
Please download pParse manually as part of pGlyco 2.2.0 https://github.com/pFindStudio/pGlyco2
Reference: Yuan ZF, Liu C, Wang HP, Sun RX, Fu Y, Zhang JF, Wang LH, Chi H, Li Y, Xiu LY, Wang WP, He SM (2012) pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra. Proteomics 12(2)
-
postflight
()¶ Rename output file, since that naming the output file is not properly working in pParse
-
preflight
()¶ Formatting the command line via self.params
Returns: self.params Return type: dict - Command line options:
-D datapath default (D:data) -L logfilepath default (the same with datapath) -O outputpath default (the same with datapath) -W isolation_width default (2) -F input_format default (raw) optional (wiff) -C co-elute default (1) -S cut_similiar_mono default (1) -I ipv_file default (.IPV.txt) -M mars_model default (4) -T trainingset default (.TrainingSet.txt) -m output_mgf default (1) -p output_pf default (1) -d delete_msn default (0) -a check_activationcenter default (1) -g debug_mode default (0) -r rewrite_files default (0) -t mars_threshold default (-0.34) -u export_unchecked_mono default (0) -y output_all_mars_y default (0) -Y output_mars_y default (0) -s output_trainingdata default (0) -R recalibrate_window default (7) -v outputsvmlight default (0) -z m/z default (5) -i Intensity default (1)
-
ThermoRawFileParser¶
-
class
ursgal.wrappers.thermo_raw_file_parser_1_1_2.
thermo_raw_file_parser_1_1_2
(*args, **kwargs)¶ Unode for ThermoRawFileParser For further information visit https://github.com/compomics/ThermoRawFileParser
Note
Please download ThermoRawFileParser manually from https://github.com/compomics/ThermoRawFileParser
Reference: Hulstaert N, Sachsenberg T, Walzer M, Barsnes H, Martens L and Perez-Riverol Y (2019) ThermoRawFileParser: modular, scalable and cross-platform RAW file conversion. bioRxiv https://doi.org/10.1101/622852
-
preflight
()¶ Formatting the command line via self.params
Returns: self.params Return type: dict - ThermoRawFileParser.exe usage is (use -option=value for the optional arguments):
-h, --help Prints out the options. -i, --input=VALUE The raw file input. -o, --output=VALUE The output directory. -f, --format=VALUE The output format for the spectra (0 for MGF, 1 for mzML, 2 for indexed mzML, 3 for Parquet, 4 for MGF with profile data excluded) -m, --metadata=VALUE The metadata output format (0 for JSON, 1 for TXT). -g, --gzip GZip the output file if this flag is specified ( without value). - -u, –s3_url[=VALUE] Optional property to write directly the data into
- S3 Storage.
- -k, –s3_accesskeyid[=VALUE]
- Optional key for the S3 bucket to write the file
- output.
- -t, –s3_secretaccesskey[=VALUE]
- Optional key for the S3 bucket to write the file
- output.
- -n, –s3_bucketName[=VALUE]
- S3 bucket name
-v, --verbose Enable verbose logging. -e, --ignoreInstrumentErrors Ignore missing properties by the instrument.
-