Amino Acid Sequence to SMILES
cheminformatics.smiles_to_amino_acids(smiles)
Algorithm
amino_acids_sequence = {
"A": "C",
"R": "CCCCNC(N)=N",
"N": "CC(N)=O",
"D": "CC(O)=O",
"B": "CC(O)=O",
"C": "CS",
"E": "CCC(O)=O",
"Q": "CCC(N)=O",
"Z": "CCC(N)=O",
"G": "[H]",
"H": "CC1=CNC=N1",
"I": "C(CC)([H])C",
"L": "CC(C)C",
"K": "CCCCN",
"M": "CCSC",
"F": "CC1=CC=CC=C1",
"P": "C2CCCN2",
"S": "CO",
"T": "C(C)([H])O",
"W": "CCC1=CNC2=C1C=CC=C2",
"Y": "CC1=CC=C(O)C=C1",
"V": "C(C)C"
}
So when the user passes in a string of Amino Acids, we build the peptide string in accordance to the length of the string and fill the slots appropriately with amino acid fragments.
RSTEFGHIKLADPQ
NC(CCCCNC(N)=N)C(NC(CO)C(NC(C(C)([H])O)C(NC(CCC(O)=O)C(NC(CC1=CC=CC=C1)C(NC([H])C(NC(CC1=CNC=N1)C(NC(C(CC)([H])C)C(NC(CCCCN)C(NC(CC(C)C)C(NC(C)C(NC(CC(O)=O)C(NC(C2CCCN2)C(NC(CCC(N)=O)C(NCC(O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O
To convert the SMILES string back into an amino acid sequence a reverse list is created and a regex pattern is used to search for the slots within a SMILES string.
pattern = re.compile('NC\(.*?\)C\(', flags=re.MULTILINE)
Then replaces the fragment with the amino acid.
NC(CCCCNC(N)=N)C(NC(CO)C(NC(C(C)([H])O)C(NC(CCC(O)=O)C(NC(CC1=CC=CC=C1)C(NC([H])C(NC(CC1=CNC=N1)C(NC(C(CC)([H])C)C(NC(CCCCN)C(NC(CC(C)C)C(NC(C)C(NC(CC(O)=O)C(NC(C2CCCN2)C(NC(CCC(N)=O)C(NCC(O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O
RSTEFGHIKLADPQ
References
Sharif, Suliman. “Cocktail Shaker: An Open Source Drug Expansion and Enumeration Library for Peptides.” Journal of Open Source Software, vol. 5, no. 52, Aug. 2020, p. 1992. joss.theoj.org, https://doi.org/10.21105/joss.01992.
Last updated