Converter Engines

Convert CSV to SSL 1_0_0

class ursgal.wrappers.csv2ssl_1_0_0.csv2ssl_1_0_0(*args, **kwargs)

csv2ssl_1_0_0 UNode

_execute()

Result files (.csv) are converted to spectrum sequence list (.ssl) files. These .ssl can be used as input files for BiblioSpec.

Input file has to be a .csv

Creates a _converted.csv file and returns its path.

ursgal.resources.platform_independent.arc_independent.csv2ssl_1_0_0.csv2ssl_1_0_0.main(input_file=None, output_file=None, score_column_name=None, score_type=None)

Convert csvs to ssl

Convert CSV to Counted Results

class ursgal.wrappers.csv2counted_results_1_0_0.csv2counted_results_1_0_0(*args, **kwargs)

csv2counted_results_1_0_0 UNode

_execute()

Results (.csv) are summarized as table (.csv) containing all identified proteins, peptides, or other specified identifiers. For each sample, the peptide or spectral count for each identifier is given.

Input file has to be a .csv

Creates a _counted.csv file and returns its path.

Columns containing the elements that should be counted (identifiers) are given as a list of headers using uc.params[“identifier_column_names”]. Columns defining a unique countable element (e.g. “Sequence”, “Spectrum ID”) are given as a list of headers using uc.params[“count_column_names”].

This can be used to create a SFINX (http://sfinx.ugent.be/) input file, using:

uc.params[“convert_to_sfinx”]=True uc.params[“identifier_colum_names”]=[“Protein ID”] uc.params[“count_column_names”]=[“Sequence”]
ursgal.resources.platform_independent.arc_independent.csv2counted_results_1_0_0.csv2counted_results_1_0_0.main(input_file=None, output_file=None, identifier_colum_names=None, count_column_names=None, count_by_file=True, convert2sfinx=False, keep_column_names=None)

Results (.csv) are summarized as table (.csv) containing all identified proteins, peptides, or other specified identifiers. For each sample, the peptide or spectral count for each identifier is given.

This can be used to convert .csv files to SFINX input files.

This is a .csv file containing unique peptide counts for all identified proteins. However, this can be modified using the keywords “identifier_colum_names” and “count_column_names”

Keyword Arguments:
 
  • input_file (str) – name including path for the input file
  • output_file (str) – name including path for the output file
  • identifier_colum_names (list) – list of column headers that define the identifier. Multiple column names are joined for combined identifiers.
  • count_column_names (list) – list of column headers which are used for counting.
  • count_by_file (bool) – the number of unique hits for each identifier is given in seperate columns for each raw file (file name as defiened in Spectrum Title)
  • convert2sfinx (bool) – If True, the header of the identifier column is “rownames”. If False, the joined header name will be used.
  • keep_column_names (list) – list of column headers which are not used as identifiers but kept in the output, e.g. when counting [‘Sequence’, ‘Modifications’] the column [‘Protein ID’] could be specified here. Multiple entries for one identifier (e.g. when identifier_column_names = [‘Potein ID’] and keep_column_names = [‘Sequence’]) are seperated by ‘<#>’.

Convert Mascot DAT to CSV

class ursgal.wrappers.mascot_dat2csv_1_0_0.mascot_dat2csv_1_0_0(*args, **kwargs)

Dummy to merge mascot data into usgal workflow

ursgal.resources.platform_independent.arc_independent.mascot_dat2csv_1_0_0.mascot_dat2csv_1_0_0.main(input_file, output_file)

Convert MS-GF+ MZID to CSV

class ursgal.wrappers.msgfplus2csv_py_v1_0_0.msgfplus2csv_py_v1_0_0(*args, **kwargs)

msgfplus2csv_py v1.0.0 UNode

class ursgal.wrappers.msgfplus2csv_v1_2_1.msgfplus2csv_v1_2_1(*args, **kwargs)

msgfplus2csv_v1.2.1 UNode Parameter options at https://omics.pnl.gov/software/ms-gf

Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.

postflight()

Convert .tsv result file to .csv and translates headers

preflight()

mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+

Input file has to be a .mzid or .mzid.gz

Creates a .csv file and returns its path

Mzid to Tsv Converter Usage: MzidToTsvConverter -mzid:”mzid path” [-tsv:”tsv output path”] [-unroll|-u] [-showDecoy|-sd]

Required parameters:
‘-mzid:path’ - path to mzid[.gz] file; if path has spaces, it must be in quotes.
Optional parameters:
‘-tsv:path’ - path to tsv file to be written; if not specified, will be output to same location as mzid ‘-unroll|-u’ signifies that results should be unrolled - one line per unique peptide/protein combination in each spectrum identification ‘-showDecoy|-sd’ signifies that decoy results should be included in the result tsv
class ursgal.wrappers.msgfplus2csv_v1_2_0.msgfplus2csv_v1_2_0(*args, **kwargs)

msgfplus_C_mzid2csv_v1.2.0 UNode Parameter options at https://omics.pnl.gov/software/ms-gf

Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.

class ursgal.wrappers.msgfplus2csv_v2017_07_04.msgfplus2csv_v2017_07_04(*args, **kwargs)

msgfplus_C_mzid2csv_v2017_07_04 UNode Parameter options at https://omics.pnl.gov/software/ms-gf

Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.

postflight()

Convert .tsv result file to .csv and translates headers

preflight()

mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+

Input file has to be a .mzid or .mzid.gz

Creates a .csv file and returns its path

Mzid to Tsv Converter Usage: MzidToTsvConverter -mzid:”mzid path” [-tsv:”tsv output path”] [-unroll|-u] [-showDecoy|-sd]

Required parameters:
‘-mzid:path’ - path to mzid[.gz] file; if path has spaces, it must be in quotes.
Optional parameters:
‘-tsv:path’ - path to tsv file to be written; if not specified, will be output to same location as mzid ‘-unroll|-u’ signifies that results should be unrolled - one line per unique peptide/protein combination in each spectrum identification ‘-showDecoy|-sd’ signifies that decoy results should be included in the result tsv
class ursgal.wrappers.msgfplus2csv_v2017_01_27.msgfplus2csv_v2017_01_27(*args, **kwargs)

msgfplus2csv_v2017_01_27 UNode Parameter options at https://omics.pnl.gov/software/ms-gf

Reference:

Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
class ursgal.wrappers.msgfplus2csv_v2016_09_16.msgfplus2csv_v2016_09_16(*args, **kwargs)

msgfplus2csv_v2016_09_16 UNode Parameter options at https://omics.pnl.gov/software/ms-gf

Reference: Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.

postflight()

Convert .tsv result file to .csv

preflight()

mzid result files from MS-GF+ are converted to CSV using the MzIDToTsv converter from MS-GF+

Input file has to be a .mzid

Creates a .csv file and returns its path

Convert MZML to MGF

The mzML to mgf converter version 2.0.0 requires pymzML 2.0, while the previous version can be used with older pymzML versions.

class ursgal.wrappers.mzml2mgf_2_0_0.mzml2mgf_2_0_0(*args, **kwargs)

mzml2mgf_2_0_0 UNode

Version two works only with pymzML version 2.0.0 or higher!

Converts .mzML files into .mgf files

ursgal.resources.platform_independent.arc_independent.mzml2mgf_2_0_0.mzml2mgf_2_0_0.main(mzml=None, mgf=None, i_decimals=5, mz_decimals=5, machine_offset_in_ppm=None, scan_exclusion_list=None, scan_inclusion_list=None, prefix=None, scan_skip_modulo_step=None, ms_level=2, precursor_min_charge=1, precursor_max_charge=5, ion_mode='+', spec_id_attribute=None, signal_to_noise_threshold=None)
class ursgal.wrappers.mzml2mgf_1_0_0.mzml2mgf_1_0_0(*args, **kwargs)

mzml2mgf_1_0_0 UNode

Converts .mzML files into .mgf files

Convert X!Tandem XML to CSV 1_0_0

class ursgal.wrappers.xtandem2csv_1_0_0.xtandem2csv_1_0_0(*args, **kwargs)

xtandem2csv_1_0_0 UNode

ursgal.resources.platform_independent.arc_independent.xtandem2csv_1_0_0.xtandem2csv_1_0_0.main(input_file=None, decoy_tag=None, output_file=None)

Converts xTandem.xml files into .csv We need to do this on our own, because mzidentml_lib reports wrong positions for modifications (and it is also not able to convert the piledriver.mzid into csv)

It should be noted that - xtandem groups are not merged (since it is not the same as protein groups) - multiple domains (multiple occurence of a peptide in the same protein) are not reported

MzidLib

class ursgal.wrappers.mzidentml_lib_1_7.mzidentml_lib_1_7(*args, **kwargs)

MzidLib 1_7UNode

Import functions from mzidentml_lib_1_6_10

Note

Please download and install manually from http://www.proteoannotator.org/?q=installation

class ursgal.wrappers.mzidentml_lib_1_6_11.mzidentml_lib_1_6_11(*args, **kwargs)

MzidLib 1_6_11 UNode

Import functions from mzidentml_lib_1_6_10

class ursgal.wrappers.mzidentml_lib_1_6_10.mzidentml_lib_1_6_10(*args, **kwargs)

MzidLib 1_6_10 UNode

‘Reisinger F, Krishna R, Ghali F, Ríos D, Hermjakob H, Vizcaíno JA, Jones AR. (2012) jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data.’

Java program to convert results to .mzIdentML and .mzIdentML to .csv

preflight()

Convert .mzid result files from different search engines into .csv result files

For X!Tandem result files first need to be converted into .mzid with raw2mzid

raw2mzid(search_engine=None, translations=None)

Convert raw result files into .mzid result files

pParse 2.0

class ursgal.wrappers.pparse_2_0.pparse_2_0(*args, **kwargs)

Unode for pParse included in pGlyco 2.2.0 For further information visit http://pfind.ict.ac.cn/software/pParse/#Downloads

Note

Please download pParse manually as part of pGlyco 2.2.0 https://github.com/pFindStudio/pGlyco2

Reference: Yuan ZF, Liu C, Wang HP, Sun RX, Fu Y, Zhang JF, Wang LH, Chi H, Li Y, Xiu LY, Wang WP, He SM (2012) pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra. Proteomics 12(2)

postflight()

Rename output file, since that naming the output file is not properly working in pParse

preflight()

Formatting the command line via self.params

Returns:self.params
Return type:dict
Command line options:
-D datapath default (D:data)
-L logfilepath default (the same with datapath)
-O outputpath default (the same with datapath)
-W isolation_width
 default (2)
-F input_format
 default (raw) optional (wiff)
-C co-elute default (1)
-S cut_similiar_mono
 default (1)
-I ipv_file default (.IPV.txt)
-M mars_model default (4)
-T trainingset default (.TrainingSet.txt)
-m output_mgf default (1)
-p output_pf default (1)
-d delete_msn default (0)

-a check_activationcenter default (1) -g debug_mode default (0) -r rewrite_files default (0) -t mars_threshold default (-0.34) -u export_unchecked_mono default (0) -y output_all_mars_y default (0) -Y output_mars_y default (0) -s output_trainingdata default (0) -R recalibrate_window default (7) -v outputsvmlight default (0) -z m/z default (5) -i Intensity default (1)

ThermoRawFileParser

class ursgal.wrappers.thermo_raw_file_parser_1_1_2.thermo_raw_file_parser_1_1_2(*args, **kwargs)

Unode for ThermoRawFileParser For further information visit https://github.com/compomics/ThermoRawFileParser

Note

Please download ThermoRawFileParser manually from https://github.com/compomics/ThermoRawFileParser

Reference: Hulstaert N, Sachsenberg T, Walzer M, Barsnes H, Martens L and Perez-Riverol Y (2019) ThermoRawFileParser: modular, scalable and cross-platform RAW file conversion. bioRxiv https://doi.org/10.1101/622852

preflight()

Formatting the command line via self.params

Returns:self.params
Return type:dict
ThermoRawFileParser.exe usage is (use -option=value for the optional arguments):
-h, --help Prints out the options.
-i, --input=VALUE
 The raw file input.
-o, --output=VALUE
 The output directory.
-f, --format=VALUE
 The output format for the spectra (0 for MGF, 1 for mzML, 2 for indexed mzML, 3 for Parquet, 4 for MGF with profile data excluded)
-m, --metadata=VALUE
 The metadata output format (0 for JSON, 1 for TXT).
-g, --gzip GZip the output file if this flag is specified ( without value).
-u, –s3_url[=VALUE] Optional property to write directly the data into
S3 Storage.
-k, –s3_accesskeyid[=VALUE]
Optional key for the S3 bucket to write the file
output.
-t, –s3_secretaccesskey[=VALUE]
Optional key for the S3 bucket to write the file
output.
-n, –s3_bucketName[=VALUE]
S3 bucket name
-v, --verbose Enable verbose logging.
-e, --ignoreInstrumentErrors
 Ignore missing properties by the instrument.