GlobalChem: Your Chemical Knowledge Graph
  • Welcome to the GlobalChem Documentation!
  • Quick Start
  • Extensions
  • API
    • GlobalChem
    • Graph Algorithm
  • Mother Nature
    • Mother Nature Commands
    • Discord Roles
  • Cheminformatics
    • SMILES Validation
    • Decoding Fingeringprints and SMILES to IUPAC
    • SMILES to PDF And Back
    • Drug Design Filters
    • Deep Layer Scatter
    • Identifier SMARTS
    • Protonating SMILES
    • Sunbursting SMILES
    • Visualizing SMARTS
    • One-Hot Encoding SMILES
    • Principal Component Analysis SMILES
    • GlobalChem Graph to Networkx Graph
    • Amino Acid Sequence to SMILES
    • Scaffold Graph Adapter
  • Bioinformatics
    • GlobalChem Protein
    • GlobalChem RNA
    • GlobalChem DNA
    • GlobalChem Bacteria
    • GlobalChem Monoclonal Antibody
  • Quantum Chemistry
    • Z-Matrix Store
    • Psi4Parser & Orbital Visualizer
  • ForceFields
    • GlobalChem Molecule
    • CGenFF Molecule
    • GAFF2 Molecule
    • CGenFF Dissimilarity Score
  • Development Operations
    • Open Source Database Monitor
  • Graphing Templates
    • Plotly
Powered by GitBook
On this page
  1. API

GlobalChem

First initialize the class must be initialized:

gc = GlobalChem()

Print the Network

gc.print_globalchem_network()

                                ┌solvents─common_organic_solvents
             ┌organic_synthesis─└protecting_groups─amino_acid_protecting_groups
             │          ┌polymers─common_monomer_repeating_units
             ├materials─└clay─montmorillonite_adsorption
             │                            ┌privileged_kinase_inhibtors
             │                            ├privileged_scaffolds
             ├proteins─kinases─┌scaffolds─├iupac_blue_book_substituents
             │                 │          └common_r_group_replacements
             │                 └braf─inhibitors
             │              ┌vitamins
             │              ├open_smiles
             ├miscellaneous─├amino_acids
             │              └regex_patterns
global_chem──├environment─emerging_perfluoroalkyls
             │          ┌schedule_one
             │          ├schedule_four
             │          ├schedule_five
             ├narcotics─├pihkal
             │          ├schedule_two
             │          └schedule_three
             ├interstellar_space
             │                    ┌cannabinoids
             │                    │         ┌electrophillic_warheads_for_kinases
             │                    ├warheads─└common_warheads_covalent_inhibitors
             └medicinal_chemistry─│      ┌phase_2_hetereocyclic_rings
                                  └rings─├iupac_blue_book_rings
                                         └rings_in_drugs
                                         

Check the available nodes in GlobalChem:

nodes_list = gc.check_available_nodes()
print (nodes_list)
'emerging_perfluoro_alkyls', 
'montmorillonite_adsorption', 
'common_monomer_repeating_units',
 'electrophilic_warheads_for_kinases',

Retrieve all the Nodes

nodes_list = gc.get_all_nodes()

Get the Tree Depth of GlobalChem

depth = gc.get_depth_of_globalchem()

Get GlobalChem All Nodes SMILES

depth = gc.get_all_smiles()

Get GlobalChem All Nodes SMARTS

depth = gc.get_all_smarts()

Get GlobalChem All Nodes Names

depth = gc.get_all_names()

Get SMILES Definition by IUPAC Name

This function fetches the distance between two words using the Levenshtein distance with a distance tolerance number. It removes both grammar and upper case letters automatically and tries to match the best fitting word against the query and return their dedicated paths. Users have the option to return the exact definition or partial definitions.

definition = gc.get_smiles_by_iupac(
    'benzene',   
    distance_tolerance=7,
    return_partial_definitions=True                       
)
print (definition)
[{'methylbenzoate': 'c1ccc(C(=O)OC)cc1', 'network_path': 'global_chem.medicinal_chemistry.scaffolds.common_r_group_replacements', 'levenshtein_distance': 7}]

You have the option to do a fuzzy reconstruction of the SMILES from the IUPAC used stripped grammar and functional groups:

definition = gc.get_smiles_by_iupac(
        '(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol',
        distance_tolerance=2,
        return_partial_definitions=False,
        reconstruct_smiles=True,
)

print (definition)
C12=C(C=NC=C2)C=CC=C1.C12=C(C=NC=C2)C=CC=C1.[CH2]CCCCCCCCCCCCCCC.[CH2]C.[CH3].SC.c1cncc2ccccc12.C12=CC=CC=C1C=NC=C2.C.CC.C[C@H]1[C@H](C(C)C)CC[C@@H](C)C1.[CH3].S

Build the GlobalChem Network and Print it Out

gc.build_global_chem_network(
    print_output=True,          # Print the network out
    debugger=False,             # For Developers mostly to see all node values
)
'global_chem': {
    'children': [
        'environment',
        'miscellaneous',
        'organic_synthesis',
        'medicinal_chemistry',
        'narcotics',
        'interstellar_space',
        'proteins',
        'materials'
    ],
    'name': 'global_chem',
    'node_value': <global_chem.global_chem.Node object at 0x10f60eed0>,
    'parents': []
},

The algorithm uses a series of parents/children to connect nodes instead of "edges" as in traditional graph networks. This just makes it easier to code if the graph database lives as a 1-dimensional with lists of parents and children's connected in this fashion.

Fetch a Node

gc = GlobalChem()
gc.build_global_chem_network()
node = gc.get_node('emerging_perfluoroalkyls')
print (node)
{
    'node_value': <global_chem.global_chem.Node object at 0x117fee210>, 
    'children': [], 
    'parents': ['emerging_perfluoroalkyls'], 
    'name': 'emerging_perfluoroalkyls'
}

Fetch the IUPAC:SMILES/SMARTS Data from the Node

gc = GlobalChem()
gc.build_global_chem_network()
smiles = gc.get_node_smiles('emerging_perfluoroalkyls')
smarts = gc.get_node_smarts('emerging_perfluoroalkyls')

print ("Length of Perfluoroalkyls: %s " % len(smiles))
from global_chem import GlobalChem

gc = GlobalChem(verbose=False)
gc.initiate_network()
gc.add_node('global_chem', 'common_monomer_repeating_units')
gc.add_node('common_monomer_repeating_units','electrophilic_warheads_for_kinases')
values = gc.get_node_smiles('common_monomer_repeating_units')

print (values)
'3′-bromo-2-chloro[1,1′:4′,1′′-terphenyl]-4,4′′':
 'ClC1=CC=CC=C1C2=CC=C(C3=CC=CC=C3)C(Br)=C2'

Creating a Deep Layer Chemical Graph Networks (DGN) & Print it Out:

This is for the more advanced users of building networks and how to manage sets of layers.

# Create a Deep Layer Network

gc = GlobalChem()
gc.initiate_deep_layer_network()
gc.add_deep_layer(
    [
        'emerging_perfluoroalkyls',
        'montmorillonite_adsorption',
        'common_monomer_repeating_units'
    ]
)
gc.add_deep_layer(
    [
        'common_warheads_covalent_inhibitors',
        'privileged_scaffolds',
        'iupac_blue_book'
    ]
)

gc.print_deep_network()
                                      ┌common_warhead_covalent_inhibitors
            ┌emerging_perfluoroalkyls─├privileged_scaffolds
            │                         └iupac_blue_book
            │                           ┌common_warhead_covalent_inhibitors
global_chem─├montmorillonite_adsorption─├privileged_scaffolds
            │                           └iupac_blue_book
            │                               ┌common_warhead_covalent_inhibitors
            └common_monomer_repeating_units─├privileged_scaffolds
                                            └iupac_blue_book

Compute a Common Score

Common Score Algorithm:

  1. Datamine the current graph network of GlobalChem

  2. Get the object weights of each mention

  3. Determine the mention weight

  4. Sum the Weight's and that is how common the molecule is.

The higher the value the higher the common score tied with it's IUPAC name.

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
gc.compute_common_score('benzene', verbose=True)
GlobalChem Common Score: 7.139921294271971

To TSV

The network returned in all CSV format for interoperability for web application development mostly but can also be used to search.

gc = GlobalChem()
gc.to_tsv('global_chem.tsv')

PreviousExtensionsNextGraph Algorithm

Last updated 2 years ago