GlobalChem: Your Chemical Knowledge Graph
  • Welcome to the GlobalChem Documentation!
  • Quick Start
  • Extensions
  • API
    • GlobalChem
    • Graph Algorithm
  • Mother Nature
    • Mother Nature Commands
    • Discord Roles
  • Cheminformatics
    • SMILES Validation
    • Decoding Fingeringprints and SMILES to IUPAC
    • SMILES to PDF And Back
    • Drug Design Filters
    • Deep Layer Scatter
    • Identifier SMARTS
    • Protonating SMILES
    • Sunbursting SMILES
    • Visualizing SMARTS
    • One-Hot Encoding SMILES
    • Principal Component Analysis SMILES
    • GlobalChem Graph to Networkx Graph
    • Amino Acid Sequence to SMILES
    • Scaffold Graph Adapter
  • Bioinformatics
    • GlobalChem Protein
    • GlobalChem RNA
    • GlobalChem DNA
    • GlobalChem Bacteria
    • GlobalChem Monoclonal Antibody
  • Quantum Chemistry
    • Z-Matrix Store
    • Psi4Parser & Orbital Visualizer
  • ForceFields
    • GlobalChem Molecule
    • CGenFF Molecule
    • GAFF2 Molecule
    • CGenFF Dissimilarity Score
  • Development Operations
    • Open Source Database Monitor
  • Graphing Templates
    • Plotly
Powered by GitBook
On this page
  1. Cheminformatics

Amino Acid Sequence to SMILES

cheminformatics.smiles_to_amino_acids(smiles)

Algorithm

amino_acids_sequence = {
    "A": "C",
    "R": "CCCCNC(N)=N",
    "N": "CC(N)=O",
    "D": "CC(O)=O",
    "B": "CC(O)=O",
    "C": "CS",
    "E": "CCC(O)=O",
    "Q": "CCC(N)=O",
    "Z": "CCC(N)=O",
    "G": "[H]",
    "H": "CC1=CNC=N1",
    "I": "C(CC)([H])C",
    "L": "CC(C)C",
    "K": "CCCCN",
    "M": "CCSC",
    "F": "CC1=CC=CC=C1",
    "P": "C2CCCN2",
    "S": "CO",
    "T": "C(C)([H])O",
    "W": "CCC1=CNC2=C1C=CC=C2",
    "Y": "CC1=CC=C(O)C=C1",
    "V": "C(C)C"
}
amino_acids_sequence = {
            "C" :"A",
            "CCCCNC(N)=N":"R",
            "CC(N)=O":"N",
            "CC(O)=O":"D",
            "CS": "C",
            "CCC(O)=O":"E",
            "CCC(N)=O":"Q",
            "[H]":"G",
            "CC1=CNC=N1" :"H",
            "C(CC)([H])C" :"I",
            "CC(C)C" :"L",
            "CCCCN" :"K",
            "CCSC" :"M",
            "CC1=CC=CC=C1" :"F",
            "C2CCCN2" :"P",
            "CO" :"S",
            "C(C)([H])O" :"T",
            "CCC1=CNC2=C1C=CC=C2" :"W",
            "CC1=CC=C(O)C=C1":"Y",
            "C(C)C" :"V",
}

So when the user passes in a string of Amino Acids, we build the peptide string in accordance to the length of the string and fill the slots appropriately with amino acid fragments.

RSTEFGHIKLADPQ
NC(CCCCNC(N)=N)C(NC(CO)C(NC(C(C)([H])O)C(NC(CCC(O)=O)C(NC(CC1=CC=CC=C1)C(NC([H])C(NC(CC1=CNC=N1)C(NC(C(CC)([H])C)C(NC(CCCCN)C(NC(CC(C)C)C(NC(C)C(NC(CC(O)=O)C(NC(C2CCCN2)C(NC(CCC(N)=O)C(NCC(O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O

To convert the SMILES string back into an amino acid sequence a reverse list is created and a regex pattern is used to search for the slots within a SMILES string.

pattern = re.compile('NC\(.*?\)C\(', flags=re.MULTILINE)

Then replaces the fragment with the amino acid.

NC(CCCCNC(N)=N)C(NC(CO)C(NC(C(C)([H])O)C(NC(CCC(O)=O)C(NC(CC1=CC=CC=C1)C(NC([H])C(NC(CC1=CNC=N1)C(NC(C(CC)([H])C)C(NC(CCCCN)C(NC(CC(C)C)C(NC(C)C(NC(CC(O)=O)C(NC(C2CCCN2)C(NC(CCC(N)=O)C(NCC(O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O)=O
RSTEFGHIKLADPQ

References

  1. Sharif, Suliman. “Cocktail Shaker: An Open Source Drug Expansion and Enumeration Library for Peptides.” Journal of Open Source Software, vol. 5, no. 52, Aug. 2020, p. 1992. joss.theoj.org, https://doi.org/10.21105/joss.01992.

PreviousGlobalChem Graph to Networkx GraphNextScaffold Graph Adapter

Last updated 3 years ago