Chemical Composition

class ursgal.ChemicalComposition(sequence=None, aa_compositions=None, isotopic_distributions=None, monosaccharide_compositions=None)

Chemical composition class. The actual sequence or formula can be reset using the add function.

Keyword Arguments:
 
  • sequence (str) – Peptide or chemical formula sequence
  • aa_compositions (Optional[dict]) – amino acid compositions
  • isotopic_distributions (Optional[dict]) – isotopic distributions

Keyword argument examples:

sequence - Currently this can for example be::
[ ‘+H2O2H2-OH’, ‘+{0}’.format(‘H2O’), ‘{peptide}’.format(pepitde=’ELVISLIVES’), ‘{peptide}+{0}’.format(‘PO3’, peptide=’ELVISLIVES’), ‘{peptide}#{unimod}:{pos}’.format( peptide = ‘ELVISLIVES’, unimod = ‘Oxidation’, pos = 1 ) ]
Examples::
>>> c = ursgal.ChemicalComposition()
>>> c.use("ELVISLIVES#Acetyl:1")
>>> c.hill_notation()
'C52H90N10O18'
>>> c.hill_notation_unimod()
'C(52)H(90)N(10)O(18)'
>>> c
{'O': 18, 'H': 90, 'C': 52, 'N': 10}
>>> c.composition_of_mod_at_pos[1]
defaultdict(<class 'int'>, {'O': 1, 'H': 2, 'C': 2})
>>> c.composition_of_aa_at_pos[1]
{'O': 3, 'H': 7, 'C': 5, 'N': 1}
>>> c.composition_at_pos[1]
defaultdict(<class 'int'>, {'O': 4, 'H': 9, 'C': 7, 'N': 1})
>>> c = ursgal.ChemicalComposition('+H2O2H2')
>>> c
{'O': 2, 'H': 4}
>>> c.subtract_chemical_formula('H3')
>>> c
{'O': 2, 'H': 1}

Note

We did not include mass calculation, since pyQms will calculate masses much more accurately using unimod and other element enrichments.

add_chemical_formula(chemical_formula, factor=1)

Adds chemical formula to the instance

Parameters:chemical_formula (str) – chemical composition given as Hill notation
Keyword Arguments:
 factor (int) – multiplication factor to add the same chemical formula multiple times
add_glycan(glycan)

Adds a glycan to the instance.

Parameters:glycan (str) – sequence of monosaccharides given in unimod format, e.g.: HexNAc(2)Hex(3)dHex(1)Pent(1), available monosaccharides are listed in chemical_composition_kb
add_peptide(peptide)

Adds peptide sequence to the instance

clear()

Resets all lookup dictionaries and self

One class instance can be used analysing a series of sequences, thereby avoiding class instantiation overhead

composition_at_pos = None

chemical composition at given peptide position incl modifications (if peptide sequence was used as input or using the use function)

Note

Numbering starts at position 1, since all PSM search engines use this nomenclature.

Type:dict
composition_of_aa_at_pos = None

chemical composition of amino acid at given peptide position (if peptide sequence was used as input or using the use function)

Note

Numbering starts at position 1, since all PSM search engines use this nomenclature.

Type:dict
composition_of_mod_at_pos = None

chemical composition of unimod modifications at given position (if peptide sequence was used as input or using the use function)

Note

Numbering starts at position 1, since all PSM search engines use this nomenclature.

Type:dict
hill_notation(include_ones=False, cc=None)

Formats chemical composition into Hill notation string.

Parameters:cc (dict, optional) – can format other element dicts as well.
Returns:
Hill notation format of self.
For examples::
C50H88N10O17
Return type:str
hill_notation_unimod(cc=None)

Formats chemical composition into Hill notation string adding unimod features.

Parameters:cc (dict, optional) – can format other element dicts as well.
Returns:
Hill notation format including unimod format rules of self.
For example::
C(50)H(88)N(10)O(17)
Return type:str
subtract_chemical_formula(chemical_formula, factor=1)

Subtract chemical formula from instance

Parameters:chemical_formula (str) – chemical composition given as Hill notation
Keyword Arguments:
 factor (int) – multiplication factor to add the same chemical formula multiple times
subtract_peptide(peptide)

Subtract peptide from instance

use(sequence)

Re-initialize the class with a new sequence

This is helpful if one wants to use the same class instance for multiple sequence since it remove class instantiation overhead.

Parameters:sequence (str) – See top for possible input formats.