We can use the example located in the forcefield demo:
Load the Package
Copy from global_chem_extensions import GlobalChemExtensions
ff = GlobalChemExtensions().forcefields()
Load the Molecule
We can use the perflurobutanoic acid located in the repository as an example where we load the SMILES and the stream file where the rank ordering is preserved.
Copy # Load the Molecule
global_chem_molecule = ff.initialize_globalchem_molecule(
'FC(F)(C(F)(C(O)=O)F)C(F)(F)F',
stream_file='global-chem/example_data/forcefield_parameterization/perfluorobutanoic_acid.str',
# frcmod_file='gaff2.frcmod',
)
Determine the Name
Determine the name of the compound based on the dictionary.
Copy global_chem_molecule.determine_name()
name = global_chem_molecule.name
print (name)
Copy perfluorobutanoic acid
Determine the Attributes of a Molecule
Copy attributes = global_chem_molecule.get_attributes()
for k, v in attributes.items():
print (f'{k}: {v}')
Copy Attributes: {
'name': 'cyclopentane', 'smiles': 'C1CCCC1',
'molecular_weight': 70.07825032,
'logp': 1.9505000000000001,
'h_bond_donor': 0,
'h_bond_acceptors': 0,
'rotatable_bonds': 0,
'number_of_atoms': 5,
'molar_refractivity': 23.084999999999994,
'topological_surface_area_mapping': 0.0,
'formal_charge': 0,
'heavy_atoms': 5,
'num_of_rings': 1
}
Convert to Different Interoperable mol or SMILES objects
To allow for any interoperability with different software we can provide a conversion into their respective objects.
Copy pysmiles_mol = global_chem_molecule.get_pysmiles()
rdkit_mol = global_chem_molecule.get_rdkit_molecule()
partial_smiles_mol = global_chem_molecule.get_partial_smiles()
deep_smiles_mol = global_chem_molecule.encode_deep_smiles()
selfies_mol = global_chem_molecule.encode_selfies()
validate_smiles = global_chem_molecule.validate_molvs()
print ("RDKit Mol: %s" % rdkit_mol)
print ("PySMILES: %s" % pysmiles_mol)
print ("Partial SMILES: %s" % partial_smiles_mol)
print ("DeepSMILES: %s" % deep_smiles_mol)
print ("Selfies Mol: %s" % selfies_mol)
print ("Validation: %s" % validate_smiles)
Copy RDKit Mol: <rdkit.Chem.rdchem.Mol object at 0x7f7259a7cb20>
PySMILES: Graph with 5 nodes and 5 edges
Partial SMILES: Molecule(atoms=['Atom(elem=6,chg=0,idx=0)', 'Atom(elem=6,chg=0,idx=1)', 'Atom(elem=6,chg=0,idx=2)', 'Atom(elem=6,chg=0,idx=3)', 'Atom(elem=6,chg=0,idx=4)'])
DeepSMILES: CCCCC5
Selfies Mol: [C][C][C][C][C][Ring1][Branch1]
Validation: ['INFO: [FragmentValidation] pentane is present']
Draw CGenFF Molecule
Copy global_chem_molecule.draw_cgenff_molecule(
height=1000,
width=1000
)
Create a Curly SMILES representation of SMILES and Atom Types
CurlySMILES contain their features inside of curly braces and is stable SMILES to pass through RDKit.
Copy curly_smiles = global_chem_molecule.get_curly_smiles()
Copy F{FGA2}C{CG312}(F{FGA2})(C{CG312}(F{FGA2})(C{CG2O2}(O{OG311})=O{OG2D1})F{FGA2})C{CG302}(F{FGA3})(F{FGA3})F{FGA3}
Get CGenFF CXSMILES
CXSMILES is a atoms in the SMILES string followed by a |
and then a set of features, or metadata, to accommodate each atom. This allows the user to embed whatever information they wish into the molecule.
We decided to embed atom-type information since the atom-type ordering is preserved.
Copy print (global_chem_molecule.get_cgenff_cxsmiles())
Copy O=C(O)C(F)(F)C(F)(F)C(F)(F)F |atomProp:0.atom_type.OG2D1:0.atom_idx.7:1.atom_type.CG2O2:1.atom_idx.5:2.atom_type.OG311:2.atom_idx.6:3.atom_type.CG312:3.atom_idx.3:4.atom_type.FGA2:4.atom_idx.4:5.atom_type.FGA2:5.atom_idx.8:6.atom_type.CG312:6.atom_idx.1:7.atom_type.FGA2:7.atom_idx.0:8.atom_type.FGA2:8.atom_idx.2:9.atom_type.CG302:9.atom_idx.9:10.atom_type.FGA3:10.atom_idx.10:11.atom_type.FGA3:11.atom_idx.11:12.atom_type.FGA3:12.atom_idx.12|
Get CGenFF CXSMARTS
CXSMARTS is a atoms in the SMARTS string followed by a |
and then a set of features, or metadata, to accommodate each atom. This allows the user to embed whatever information they wish into the molecule.
Copy cx_smarts = global_chem_molecule.get_cgenff_cxsmarts()
Copy [#9]-[#6](-[#9])(-[#6](-[#9])(-[#6](-[#8])=[#8])-[#9])-[#6](-[#9])(-[#9])-[#9] |atomProp:0.atom_type.FGA2:0.atom_idx.0:1.atom_type.CG312:1.atom_idx.1:2.atom_type.FGA2:2.atom_idx.2:3.atom_type.CG312:3.atom_idx.3:4.atom_type.FGA2:4.atom_idx.4:5.atom_type.CG2O2:5.atom_idx.5:6.atom_type.OG311:6.atom_idx.6:7.atom_type.OG2D1:7.atom_idx.7:8.atom_type.FGA2:8.atom_idx.8:9.atom_type.CG302:9.atom_idx.9:10.atom_type.FGA3:10.atom_idx.10:11.atom_type.FGA3:11.atom_idx.11:12.atom_type.FGA3:12.atom_idx.12|