top of page

AI FOR MEDICINE

Analytics for Discovery

Merging the world of atoms with the

world of digital bits through modeling

FILES

Click to Download

instructions.png

White paper

BME TOOLKIT                Toolkit Demonstration

Why we are unique-- Out of the Box "Easy AI" Office Software

Subscribe to us on Medium and our newsletter for updates on the intersection of AI and pharma!  Learn about how artificial intelligence works and see how general pharmacology and chemical engineering can be really cool.

 

Case Study

Subscribe to Our Newsletter

Thanks for submitting!

White Paper

Case Study

Subscribe to our cloud for $50 a month or download the basic edition virtualbox desktop software for $400

Email pc419714@ohio.edu to set

up the cloud and purchase above.

1 year or 2 year cloud contracts, Desktop edition, hourly contracting, 1 month cloud

**With the cloud contracts we can come up with one on one strategies and schedules for your goals

WHAT THE SOFTWARE CAN DO

Basic Edition (requires Virtualbox)

Target Prediction

Lookup smiles by typing in molecule name and pressing submit.
Support Vector Machine
46 models. Metrics for test data:
accuracy 97.59 +/- 2.41
sensitivity 91.9 +/- 8.1
specificity 98.6 +/- 1.4
215/247 87% correct mechanism on independent test set.
======================================
Chembl Target Prediction
Multiclass Classifier
Number of unique targets 560
Ion channel 5
kinase 96
nuclear receptor 21
GPCR 180
Others 258
accuracy .87
auc .92
sensitivity .76
specificity .92
precision .82
225/225 100% correct mechanism on independent test set. Note-- 1 is considered positive and zero is negative for a given target.
======================================
Interpreting output:  For the target predictions, the green represents a positive region for the molecule, the red represents a negative region of the molecule for a tested property, and gray represents no detection. For more on this method please read Similarity maps-- a visualization strategy for molecular fingerprints and machine learning methods.
======================================
Inside the target prediction folder, there should be .png images for each of the smiles in the output folder. Make sure to change the directory to the output directory of the targetprediction folder under the images menu. Since there are 46 models it is best to only use a few smiles at a time.
======================================
Creating your own models:  https://pubchem.ncbi.nlm.nih.gov/#query=interferon&tab=assay, Also see chembl bioassays. These assays must be saved as .txt files with two columns-- the first for the smiles and the next column for either 1 or zero (active and inactive respectively).  The text file with the smiles and 1's and 0's should be in the targetprediction folder.  The text file names should contain the name of the assay.  You want a model with both good sensitivity and specificity (as close to one as possible).  It is important to note that a model can appear highly accurate but if sensitivity is zero, then the model does not detect positives.
======================================
confusion matrix
tn fp
fn tp
It is important to note that column 1, row 1 is NOT true positive as you might expect from stats class.  Sensitive models will not have 0 in the bottom right corner.  If you are not getting good sensitivity and specificity, then you may want to change the penalty C=500000 to some other value.  By default the SVC is set up to use a RBF best fit but this can be changed as per the scikit learn documentation.  The output files will be saved as .pkl files that can later be loaded for future use.
======================================
Pan Assay Interference
See Seven Year Itch: Pan-Assay Interference Compounds (PAINS) in 2017—Utility and Limitations New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. Pan Assay Interference Compounds commonly result in false positives in biological screening assays.
Since they bind everything, they are not selective and therefore do not make good drug targets.  We found that the higher the drug score in Data Warrior http://www.openmolecules.org/datawarrior/ the lower the frequency of compounds containing PAINS.  Using data warrior’s evolutionary algorithm (be sure to use the wand tool if you want to fix the scaffold), evolve a few runs by taking the compounds with the top drug scores (macro → run macro → calculate properties) by taking the top 5 scoring compounds as starting points for evolution until you get drug scores greater than .9.  Select based on skelsphere similarity and the algorithm will generate a large number of compounds that have high drug scores, which are oftentimes painless.
The program will tell you what functional groups for each compound were responsible for a positive PAINFUL test result.  The program also tells you the fraction of SP3 hybridized carbons.  Compounds with scores > .47 are more selective binders.  Note that double bonds reduce the fraction of sp3 hybridization, as they make the compound more flat.  See Escape from flatland: increasing saturation as an approach to improving clinical success. Pains are defined as follows:
Doveston R, et al. A Unified Lead-oriented Synthesis of over Fifty Molecular Scaffolds. Org Biomol Chem 13 (2014) 859D65. doi:10.1039/C4OB02287D
Jadhav A, et al, Quantitative Analyses of Aggregation, Autofluorescence, and Reactivity Artifacts in a Screen for Inhibitors of Thiol Protease.  J Med Chem 53 (2009) 37D51. doi:10.1021/jm901070c
======================================
Fragmenter
Input a list of smiles.  These will be recombined into new combinations.  When you take the lowest energy ligands from a docking program and recombine these there may be some compounds that bind with lower energy than the original.
======================================
Make Spreadsheet
Input smiles.  The output will be a spreadsheet called test.xlsx in the target prediction folder that contains images of the molecules.
======================================
Solubility
Predicts log S.  Log S greater than -4 is soluble.
Root mean square error of 1.27 on a scale from -4 to 4.
linear regression
======================================
Build a SAR model
cross entropy- default loss function for binary classification problems. Summarizes the average difference between the actual and predicted probability.
hinge- alternative to cross entropy binary classification developed with SVM models used with support vector machine models
mse-default loss to use for regression problems. calculated as the average of the squared differences between the predicted and actual values
mae-for regression problems.  used in cases where there are outliers. average of the absolute difference between actual and predicted values
=============

Substructure Search

=============

Wash Library

=============

Library Creation

=============

3D Coordinate SDF Creator

==============

Molecular Descriptors

==============

Compound Report Card

======================

Autodock

======================

Draw Compounds

=====================

QED calculation and Molecular Property Viewer

======================

Grid Search

=========================

Lennard Jones Energy Profile of Protein Ligand Complex

=======================

Quantum Energy Minimization

=====================

Autoencoder Chemical Generation


 

modulator of glutamate NMDA receptor-- pink is inactive, green is active

46 pre-build models

Partner With Us!

DIGITAL DRUG DISCOVERY SERVICES

Add us to your pipeline!

If you're an academic lab, startup, clinical trial company, chemical company, regulatory agency, AI/software company, or patent attorney we would love to hear from you!  In recent years, contract research organizations like ours have partnered with pharmaceutical companies to provide data mining and custom algorithm creation for their customer's pipelines.  Whether it is a a long term contract for us to screen your compounds, or just a one time job, we would love to hear from you.  We could create compound report cards for you, and design libraries.

Compare Us to Our Competitors--

https://www.owler.com/company/schrodinger?fbclid=IwAR1IdA34VSP6sYOd3fFTT6npIPjjSn5GYaHOnlhuBHdIwO49op8kEaBmQ9s

 

https://blog.benchsci.com/startups-using-artificial-intelligence-in-drug-discovery

Example Project Outcome

For my master's project at Ohio University, I designed a compound library based on the

core structures for the lab's Interferon Regulatory Factor 3 inhibitor patent space.

We generated hundreds of thousands of possible structures based on 8 core scaffolds in Data Warrior.  We then docked these using Rosetta to the IRF3 protein to find the best binders. 

We narrowed this down to 64 best compounds.  Using the toolkit we developed we found

potential side effects and potential off target proteins.  We generated QSAR toxicity maps for

each of the compounds, built machine learning models based on the best binders, calculated

solubility, druglikeness, and potential energy.

In the future, we hope to build upon this by adding algorithms for adsorption, distribution,

metabolism, and excretion, dosage curves etc.  We would like to partner with companies that have a need for analytics in clinical trials or initial stages of drug discovery. 

CONTACT US

C

Contact us for a Customer Success Strategy!!

Partnering with us means we could do medical writing for you and write grants.  It's usually for a long term project

with a high volume of compounds and customers.

 

What are your biggest challenges?

What metrics is your CEO most interested in?

Why are you looking for new software?

When can we give you a demonstration or a sales presentation?

How many compounds do you have to screen?  What is the problem worth to you?

Tell us what you would like in the software for your particular chemical need and we will build it!!

Tell us about your problem and your business!

We can set up a business plan with you.

Request a free demo

in your browser

with Chrome Remote Desktop with me for up to an hour

440-897-6916

  • facebook

Thanks for submitting!

myprofile.png

CV

Background Details

Chief Technology Officer

BA BIOLOGY

Aug 2008- May 2012

Education
minor – cognitive linguistics, major-Biology b.a. Case Western Reserve University 2012

FULBRIGHT/SWISS GOVERNMENT SCHOLARSHIP

Aug 2012-May 2013

École polytechnique fédérale de Lausanne

POST-BACC

Aug 2013-May 2015

Cleveland State University. Took classes and prepared for the MCAT

MEDICAL SCHOOL Y1 AND Y2

May 2015- Mar 2018

Ohio University Heritage College of Osteopathic Medicine. I completed years one and 2 of medical school. I was close to passing my board exam, and rather than continuing to struggle, I decided to pursue other scientific strengths. I was always interested in AI, and fortunately for me, this was the right time to make a switch as a lot of medical fields are headed this direction.

MS- BIOMEDICAL ENGINEERING

2020

Ohio University Russ College of Engineering

thumbnail_IMG_4796.jpg

CV

Background Details

Chief Financial Officer, CEO

May 2012

Bachelor of Science Chemistry, May 2012

2012-2013

Geochemical Simulations Technician, Global Resource Engineering, Aurora, Colorado

2013-2014

Organic Extractions Technician, Curtis & Thompkins Ltd., Berkeley, California

2014-2015

Laboratory Manager, Environmental and Plant Biology, Ohio University, Athens, Ohio

2015-2018

Research Assistant, Environmental and Plant Biology, Ohio University, Athens, Ohio

2018-Present

Research Assistant, Ohio University Genomics Facility, Ohio University, Athens, Ohio

PhD, Molecular and Cell Biology

2019 Ohio University

69973571_10156654262962865_7188385135799

About Us

We have not launched yet-- under construction.  The software was created in the lab of Dr. Sumit Sharma and Dr. Douglas Goetz by Patrick Chirdon as part of his master's thesis.  We created a virtual compound library using Data Warrior and performed protein ligand docking with Rosetta.  However, there was not a tool that easily integrated QSAR models for target prediction with existing pipelines in an easy to use GUI format, so we created this toolkit.  The software is based in Tensorflow, Keras, and RDKIT python modules.  The project began in 2019.

bottom of page