
AI FOR MEDICINE
Analytics for Discovery
Merging the world of atoms with the
world of digital bits through modeling

![]() | ![]() | ![]() |
---|
FILES
Click to Download

White paper
BME TOOLKIT Toolkit Demonstration
​
​
​
​
​
​
Why we are unique-- Out of the Box "Easy AI" Office Software
Subscribe to us on Medium and our newsletter for updates on the intersection of AI and pharma! Learn about how artificial intelligence works and see how general pharmacology and chemical engineering can be really cool.
​
​
​
​
Case Study
White Paper
Case Study
Subscribe to our cloud for $50 a month or download the basic edition virtualbox desktop software for $400
​
Email pc419714@ohio.edu to set
up the cloud and purchase above.
​
1 year or 2 year cloud contracts, Desktop edition, hourly contracting, 1 month cloud
​
**With the cloud contracts we can come up with one on one strategies and schedules for your goals
​
WHAT THE SOFTWARE CAN DO
Basic Edition (requires Virtualbox)
Target Prediction
Lookup smiles by typing in molecule name and pressing submit.
Support Vector Machine
46 models. Metrics for test data:
accuracy 97.59 +/- 2.41
sensitivity 91.9 +/- 8.1
specificity 98.6 +/- 1.4
215/247 87% correct mechanism on independent test set.
======================================
Chembl Target Prediction
Multiclass Classifier
Number of unique targets 560
Ion channel 5
kinase 96
nuclear receptor 21
GPCR 180
Others 258
accuracy .87
auc .92
sensitivity .76
specificity .92
precision .82
225/225 100% correct mechanism on independent test set. Note-- 1 is considered positive and zero is negative for a given target.
======================================
Interpreting output: For the target predictions, the green represents a positive region for the molecule, the red represents a negative region of the molecule for a tested property, and gray represents no detection. For more on this method please read Similarity maps-- a visualization strategy for molecular fingerprints and machine learning methods.
======================================
Inside the target prediction folder, there should be .png images for each of the smiles in the output folder. Make sure to change the directory to the output directory of the targetprediction folder under the images menu. Since there are 46 models it is best to only use a few smiles at a time.
======================================
Creating your own models: https://pubchem.ncbi.nlm.nih.gov/#query=interferon&tab=assay, Also see chembl bioassays. These assays must be saved as .txt files with two columns-- the first for the smiles and the next column for either 1 or zero (active and inactive respectively). The text file with the smiles and 1's and 0's should be in the targetprediction folder. The text file names should contain the name of the assay. You want a model with both good sensitivity and specificity (as close to one as possible). It is important to note that a model can appear highly accurate but if sensitivity is zero, then the model does not detect positives.
======================================
confusion matrix
tn fp
fn tp
It is important to note that column 1, row 1 is NOT true positive as you might expect from stats class. Sensitive models will not have 0 in the bottom right corner. If you are not getting good sensitivity and specificity, then you may want to change the penalty C=500000 to some other value. By default the SVC is set up to use a RBF best fit but this can be changed as per the scikit learn documentation. The output files will be saved as .pkl files that can later be loaded for future use.
======================================
Pan Assay Interference
See Seven Year Itch: Pan-Assay Interference Compounds (PAINS) in 2017—Utility and Limitations New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. Pan Assay Interference Compounds commonly result in false positives in biological screening assays.
Since they bind everything, they are not selective and therefore do not make good drug targets. We found that the higher the drug score in Data Warrior http://www.openmolecules.org/datawarrior/ the lower the frequency of compounds containing PAINS. Using data warrior’s evolutionary algorithm (be sure to use the wand tool if you want to fix the scaffold), evolve a few runs by taking the compounds with the top drug scores (macro → run macro → calculate properties) by taking the top 5 scoring compounds as starting points for evolution until you get drug scores greater than .9. Select based on skelsphere similarity and the algorithm will generate a large number of compounds that have high drug scores, which are oftentimes painless.
The program will tell you what functional groups for each compound were responsible for a positive PAINFUL test result. The program also tells you the fraction of SP3 hybridized carbons. Compounds with scores > .47 are more selective binders. Note that double bonds reduce the fraction of sp3 hybridization, as they make the compound more flat. See Escape from flatland: increasing saturation as an approach to improving clinical success. Pains are defined as follows:
Doveston R, et al. A Unified Lead-oriented Synthesis of over Fifty Molecular Scaffolds. Org Biomol Chem 13 (2014) 859D65. doi:10.1039/C4OB02287D
Jadhav A, et al, Quantitative Analyses of Aggregation, Autofluorescence, and Reactivity Artifacts in a Screen for Inhibitors of Thiol Protease. J Med Chem 53 (2009) 37D51. doi:10.1021/jm901070c
======================================
Fragmenter
Input a list of smiles. These will be recombined into new combinations. When you take the lowest energy ligands from a docking program and recombine these there may be some compounds that bind with lower energy than the original.
======================================
Make Spreadsheet
Input smiles. The output will be a spreadsheet called test.xlsx in the target prediction folder that contains images of the molecules.
======================================
Solubility
Predicts log S. Log S greater than -4 is soluble.
Root mean square error of 1.27 on a scale from -4 to 4.
linear regression
======================================
Build a SAR model
cross entropy- default loss function for binary classification problems. Summarizes the average difference between the actual and predicted probability.
hinge- alternative to cross entropy binary classification developed with SVM models used with support vector machine models
mse-default loss to use for regression problems. calculated as the average of the squared differences between the predicted and actual values
mae-for regression problems. used in cases where there are outliers. average of the absolute difference between actual and predicted values
=============
Substructure Search
=============
Wash Library
=============
Library Creation
=============
3D Coordinate SDF Creator
==============
Molecular Descriptors
==============
Compound Report Card
======================
Autodock
======================
Draw Compounds
=====================
QED calculation and Molecular Property Viewer
======================
Grid Search
=========================
Lennard Jones Energy Profile of Protein Ligand Complex
=======================
Quantum Energy Minimization
=====================
Autoencoder Chemical Generation
​
​
​
​
modulator of glutamate NMDA receptor-- pink is inactive, green is active
46 pre-build models

Partner With Us!
DIGITAL DRUG DISCOVERY SERVICES
Add us to your pipeline!
If you're an academic lab, startup, clinical trial company, chemical company, regulatory agency, AI/software company, or patent attorney we would love to hear from you! In recent years, contract research organizations like ours have partnered with pharmaceutical companies to provide data mining and custom algorithm creation for their customer's pipelines. Whether it is a a long term contract for us to screen your compounds, or just a one time job, we would love to hear from you. We could create compound report cards for you, and design libraries.
​
Compare Us to Our Competitors--
https://blog.benchsci.com/startups-using-artificial-intelligence-in-drug-discovery
​
Example Project Outcome
​
For my master's project at Ohio University, I designed a compound library based on the
core structures for the lab's Interferon Regulatory Factor 3 inhibitor patent space.
We generated hundreds of thousands of possible structures based on 8 core scaffolds in Data Warrior. We then docked these using Rosetta to the IRF3 protein to find the best binders.
We narrowed this down to 64 best compounds. Using the toolkit we developed we found
potential side effects and potential off target proteins. We generated QSAR toxicity maps for
each of the compounds, built machine learning models based on the best binders, calculated
solubility, druglikeness, and potential energy.
​
In the future, we hope to build upon this by adding algorithms for adsorption, distribution,
metabolism, and excretion, dosage curves etc. We would like to partner with companies that have a need for analytics in clinical trials or initial stages of drug discovery.
​
​
​
​
​
​
​
​
CONTACT US
​
​
​
C
Contact us for a Customer Success Strategy!!
​
Partnering with us means we could do medical writing for you and write grants. It's usually for a long term project
with a high volume of compounds and customers.
What are your biggest challenges?
What metrics is your CEO most interested in?
Why are you looking for new software?
When can we give you a demonstration or a sales presentation?
How many compounds do you have to screen? What is the problem worth to you?
​
Tell us what you would like in the software for your particular chemical need and we will build it!!
​
Tell us about your problem and your business!
We can set up a business plan with you.
​
Request a free demo
in your browser
with Chrome Remote Desktop with me for up to an hour
​
440-897-6916

CV
Background Details
Chief Technology Officer
BA BIOLOGY
Aug 2008- May 2012
Education
minor – cognitive linguistics, major-Biology b.a. Case Western Reserve University 2012
FULBRIGHT/SWISS GOVERNMENT SCHOLARSHIP
Aug 2012-May 2013
École polytechnique fédérale de Lausanne
POST-BACC
Aug 2013-May 2015
Cleveland State University. Took classes and prepared for the MCAT
MEDICAL SCHOOL Y1 AND Y2
May 2015- Mar 2018
Ohio University Heritage College of Osteopathic Medicine. I completed years one and 2 of medical school. I was close to passing my board exam, and rather than continuing to struggle, I decided to pursue other scientific strengths. I was always interested in AI, and fortunately for me, this was the right time to make a switch as a lot of medical fields are headed this direction.
MS- BIOMEDICAL ENGINEERING
2020
Ohio University Russ College of Engineering

CV
Background Details
Chief Financial Officer, CEO
May 2012
Bachelor of Science Chemistry, May 2012
2012-2013
Geochemical Simulations Technician, Global Resource Engineering, Aurora, Colorado
2013-2014
Organic Extractions Technician, Curtis & Thompkins Ltd., Berkeley, California
2014-2015
Laboratory Manager, Environmental and Plant Biology, Ohio University, Athens, Ohio
2015-2018
Research Assistant, Environmental and Plant Biology, Ohio University, Athens, Ohio
2018-Present
Research Assistant, Ohio University Genomics Facility, Ohio University, Athens, Ohio
PhD, Molecular and Cell Biology
2019 Ohio University

About Us
We have not launched yet-- under construction. The software was created in the lab of Dr. Sumit Sharma and Dr. Douglas Goetz by Patrick Chirdon as part of his master's thesis. We created a virtual compound library using Data Warrior and performed protein ligand docking with Rosetta. However, there was not a tool that easily integrated QSAR models for target prediction with existing pipelines in an easy to use GUI format, so we created this toolkit. The software is based in Tensorflow, Keras, and RDKIT python modules. The project began in 2019.