GlobalChem: Your Chemical Knowledge Graph
  • Welcome to the GlobalChem Documentation!
  • Quick Start
  • Extensions
  • API
    • GlobalChem
    • Graph Algorithm
  • Mother Nature
    • Mother Nature Commands
    • Discord Roles
  • Cheminformatics
    • SMILES Validation
    • Decoding Fingeringprints and SMILES to IUPAC
    • SMILES to PDF And Back
    • Drug Design Filters
    • Deep Layer Scatter
    • Identifier SMARTS
    • Protonating SMILES
    • Sunbursting SMILES
    • Visualizing SMARTS
    • One-Hot Encoding SMILES
    • Principal Component Analysis SMILES
    • GlobalChem Graph to Networkx Graph
    • Amino Acid Sequence to SMILES
    • Scaffold Graph Adapter
  • Bioinformatics
    • GlobalChem Protein
    • GlobalChem RNA
    • GlobalChem DNA
    • GlobalChem Bacteria
    • GlobalChem Monoclonal Antibody
  • Quantum Chemistry
    • Z-Matrix Store
    • Psi4Parser & Orbital Visualizer
  • ForceFields
    • GlobalChem Molecule
    • CGenFF Molecule
    • GAFF2 Molecule
    • CGenFF Dissimilarity Score
  • Development Operations
    • Open Source Database Monitor
  • Graphing Templates
    • Plotly
Powered by GitBook
On this page
  1. ForceFields

GlobalChem Molecule

PreviousPsi4Parser & Orbital VisualizerNextCGenFF Molecule

Last updated 2 years ago

We can use the example located in the forcefield demo:

Load the Package

from global_chem_extensions import GlobalChemExtensions
ff = GlobalChemExtensions().forcefields()

Load the Molecule

We can use the perflurobutanoic acid located in the repository as an example where we load the SMILES and the stream file where the rank ordering is preserved.

# Load the Molecule

global_chem_molecule = ff.initialize_globalchem_molecule(
    'FC(F)(C(F)(C(O)=O)F)C(F)(F)F',
    stream_file='global-chem/example_data/forcefield_parameterization/perfluorobutanoic_acid.str',
    # frcmod_file='gaff2.frcmod',
)

Determine the Name

Determine the name of the compound based on the dictionary.

global_chem_molecule.determine_name()
name = global_chem_molecule.name
print (name)
perfluorobutanoic acid

Determine the Attributes of a Molecule

attributes = global_chem_molecule.get_attributes()
for k, v in attributes.items():
  print (f'{k}: {v}')
Attributes: {
'name': 'cyclopentane', 'smiles': 'C1CCCC1',
'molecular_weight': 70.07825032, 
'logp': 1.9505000000000001,
'h_bond_donor': 0,
'h_bond_acceptors': 0,
'rotatable_bonds': 0, 
'number_of_atoms': 5,
'molar_refractivity': 23.084999999999994,
'topological_surface_area_mapping': 0.0,
'formal_charge': 0,
'heavy_atoms': 5,
'num_of_rings': 1
}

Convert to Different Interoperable mol or SMILES objects

To allow for any interoperability with different software we can provide a conversion into their respective objects.

pysmiles_mol = global_chem_molecule.get_pysmiles()
rdkit_mol = global_chem_molecule.get_rdkit_molecule()
partial_smiles_mol = global_chem_molecule.get_partial_smiles()
deep_smiles_mol = global_chem_molecule.encode_deep_smiles()
selfies_mol = global_chem_molecule.encode_selfies()
validate_smiles = global_chem_molecule.validate_molvs()

print ("RDKit Mol: %s" % rdkit_mol)
print ("PySMILES: %s" % pysmiles_mol)
print ("Partial SMILES: %s" % partial_smiles_mol)
print ("DeepSMILES: %s" % deep_smiles_mol)
print ("Selfies Mol: %s" % selfies_mol)
print ("Validation: %s" % validate_smiles)
RDKit Mol: <rdkit.Chem.rdchem.Mol object at 0x7f7259a7cb20>
PySMILES: Graph with 5 nodes and 5 edges
Partial SMILES: Molecule(atoms=['Atom(elem=6,chg=0,idx=0)', 'Atom(elem=6,chg=0,idx=1)', 'Atom(elem=6,chg=0,idx=2)', 'Atom(elem=6,chg=0,idx=3)', 'Atom(elem=6,chg=0,idx=4)'])
DeepSMILES: CCCCC5
Selfies Mol: [C][C][C][C][C][Ring1][Branch1]
Validation: ['INFO: [FragmentValidation] pentane is present']

If a failure at the conversion happens, it must be because perhaps the SMILES is invalid. Please check with the validation module.

Draw CGenFF Molecule

global_chem_molecule.draw_cgenff_molecule(
    height=1000, 
    width=1000
 )

Create a Curly SMILES representation of SMILES and Atom Types

CurlySMILES contain their features inside of curly braces and is stable SMILES to pass through RDKit.

curly_smiles = global_chem_molecule.get_curly_smiles()
F{FGA2}C{CG312}(F{FGA2})(C{CG312}(F{FGA2})(C{CG2O2}(O{OG311})=O{OG2D1})F{FGA2})C{CG302}(F{FGA3})(F{FGA3})F{FGA3}

Get CGenFF CXSMILES

CXSMILES is a atoms in the SMILES string followed by a | and then a set of features, or metadata, to accommodate each atom. This allows the user to embed whatever information they wish into the molecule.

We decided to embed atom-type information since the atom-type ordering is preserved.

print (global_chem_molecule.get_cgenff_cxsmiles())
O=C(O)C(F)(F)C(F)(F)C(F)(F)F |atomProp:0.atom_type.OG2D1:0.atom_idx.7:1.atom_type.CG2O2:1.atom_idx.5:2.atom_type.OG311:2.atom_idx.6:3.atom_type.CG312:3.atom_idx.3:4.atom_type.FGA2:4.atom_idx.4:5.atom_type.FGA2:5.atom_idx.8:6.atom_type.CG312:6.atom_idx.1:7.atom_type.FGA2:7.atom_idx.0:8.atom_type.FGA2:8.atom_idx.2:9.atom_type.CG302:9.atom_idx.9:10.atom_type.FGA3:10.atom_idx.10:11.atom_type.FGA3:11.atom_idx.11:12.atom_type.FGA3:12.atom_idx.12|

Get CGenFF CXSMARTS

CXSMARTS is a atoms in the SMARTS string followed by a | and then a set of features, or metadata, to accommodate each atom. This allows the user to embed whatever information they wish into the molecule.

cx_smarts = global_chem_molecule.get_cgenff_cxsmarts()
[#9]-[#6](-[#9])(-[#6](-[#9])(-[#6](-[#8])=[#8])-[#9])-[#6](-[#9])(-[#9])-[#9] |atomProp:0.atom_type.FGA2:0.atom_idx.0:1.atom_type.CG312:1.atom_idx.1:2.atom_type.FGA2:2.atom_idx.2:3.atom_type.CG312:3.atom_idx.3:4.atom_type.FGA2:4.atom_idx.4:5.atom_type.CG2O2:5.atom_idx.5:6.atom_type.OG311:6.atom_idx.6:7.atom_type.OG2D1:7.atom_idx.7:8.atom_type.FGA2:8.atom_idx.8:9.atom_type.CG302:9.atom_idx.9:10.atom_type.FGA3:10.atom_idx.10:11.atom_type.FGA3:11.atom_idx.11:12.atom_type.FGA3:12.atom_idx.12|
Google Colaboratory
Logo