MACHINE LEARNING AND

Identifying Chemical Compounds (Avoiding Redundancies)

Big data is also helping pharmaceutical companies identify chemical compounds that are most likely to produce the outcome they seek. As a result, companies will discover medications that they otherwise would never have encountered and they will waste less time and money trying out drugs that prove to be ineffective or dangerous.


Traditionally, when researchers began to design a trial and form hypotheses, they were mainly drawing upon their own knowledge of medical literature and past experiments. Now, they are increasingly able to draw upon decades of different studies and trials. Just as important, data science tools empower them to identify patterns in that massive data set that might lead them down a different path in the lab.


In a 2016 interview with Applied Clinical Trials, James Streeter of Oracle Health Sciences described how data analysis has accelerated the work that groups like the National Cancer Institute can do.

“The National Cancer Institute was able to recently cross-reference the relationships between 15,000 genes and five major cancer types, across 20 million medical publication abstracts. It also cross-referenced genes from 60 million patients. This enabled NCI to gain a deeper understanding of the network of gene-cancer interactions and the state of research in relation to cohort groups treated.”

Heuristic models also introduce a great deal of bias; for example, last- or first-click models can place unwarranted emphasis on retargeting or Google search as effective ad targeting platforms. OK, so if heuristic models are ineffective, what is effective? Again, this guidebook doesn’t cover every possible approach and model, but the most popular and effective options that data scientists at Dataiku have tested with real-life customers.