Amino Acid Sequence to SMILES

cheminformatics.smiles_to_amino_acids(smiles)

Algorithm

amino_acids_sequence = {
    "A": "C",
    "R": "CCCCNC(N)=N",
    "N": "CC(N)=O",
    "D": "CC(O)=O",
    "B": "CC(O)=O",
    "C": "CS",
    "E": "CCC(O)=O",
    "Q": "CCC(N)=O",
    "Z": "CCC(N)=O",
    "G": "[H]",
    "H": "CC1=CNC=N1",
    "I": "C(CC)([H])C",
    "L": "CC(C)C",
    "K": "CCCCN",
    "M": "CCSC",
    "F": "CC1=CC=CC=C1",
    "P": "C2CCCN2",
    "S": "CO",
    "T": "C(C)([H])O",
    "W": "CCC1=CNC2=C1C=CC=C2",
    "Y": "CC1=CC=C(O)C=C1",
    "V": "C(C)C"
}

So when the user passes in a string of Amino Acids, we build the peptide string in accordance to the length of the string and fill the slots appropriately with amino acid fragments.

RSTEFGHIKLADPQ
NC(CCCCNC(N)=N)C(NC(CO)C(NC(C(C)([H])O)C(NC(CCC(O)=O)C(NC(CC1=CC=CC=C1)C(NC([H])C(NC(CC1=CNC=N1)C(NC(C(CC)([H])C)C(NC(CCCCN)C(NC(CC(C)C)C(NC(C)C(NC(CC(O)=O)C(NC(C2CCCN2)C(NC(CCC(N)=O)C(NCC(O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O

To convert the SMILES string back into an amino acid sequence a reverse list is created and a regex pattern is used to search for the slots within a SMILES string.

pattern = re.compile('NC\(.*?\)C\(', flags=re.MULTILINE)

Then replaces the fragment with the amino acid.

NC(CCCCNC(N)=N)C(NC(CO)C(NC(C(C)([H])O)C(NC(CCC(O)=O)C(NC(CC1=CC=CC=C1)C(NC([H])C(NC(CC1=CNC=N1)C(NC(C(CC)([H])C)C(NC(CCCCN)C(NC(CC(C)C)C(NC(C)C(NC(CC(O)=O)C(NC(C2CCCN2)C(NC(CCC(N)=O)C(NCC(O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O
RSTEFGHIKLADPQ

References

  1. Sharif, Suliman. “Cocktail Shaker: An Open Source Drug Expansion and Enumeration Library for Peptides.” Journal of Open Source Software, vol. 5, no. 52, Aug. 2020, p. 1992. joss.theoj.org, https://doi.org/10.21105/joss.01992.

Last updated