Blog

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Updates

Mendel AI Joins NVIDIA Inception Program to Accelerate AI Innovations in Life Sciences

Mendel AI, a leader in clinical AI for life sciences, has joined NVIDIA Inception, a program that nurtures startups with advanced technology. This collaboration grants Mendel access to NVIDIA's cutting-edge tools, including NIM inference microservices, enhancing its Hypercube AI solution. This will enable Mendel to develop sophisticated, reliable, and explainable AI tailored for medical decision-making. CEO Karim Galil highlighted this milestone in improving the platform’s capability to process complex medical data. Supported by investors like Oak HC/FT and DCM, Mendel AI combines large language models with a proprietary clinical hypergraph to deliver scalable, explainable clinical reasoning.

Mendel AI Joins NVIDIA Inception Program to Accelerate AI Innovations in Life Sciences

SAN JOSE, Calif. - July 15, 2024 /PRNewswire/

Mendel AI, a leader in clinical AI for the life sciences industry, today announced it has joined NVIDIA Inception, a program that nurtures startups revolutionizing industries with technological advancements.

By joining NVIDIA Inception, Mendel will receive access to NVIDIA’s industry-leading technology, including the latest NVIDIA NIM inference microservices to accelerate Mendel's Hypercube AI solution, and technical expertise in artificial intelligence, deep learning and data science. The resources from the program will help Mendel bring sophisticated, reliable and explainable AI solutions to the healthcare sector that are not only powerful but also responsible and tailored to meet the high stakes of medical decision-making.

"We are incredibly excited to share that Mendel AI has joined NVIDIA Inception, marking a significant milestone in our ongoing efforts to redefine the possibilities of AI in healthcare," said Karim Galil, CEO of Mendel AI. "By integrating NVIDIA's cutting-edge AI tools, we aim to enhance our platform's capabilities, specifically in processing and understanding complex, unstructured medical data at an unprecedented scale. With the program’s resources and together with other forward-thinking companies, we are committed to pushing the boundaries of what AI can achieve in medical research and patient care, aiming to ensure that our technology continues to lead the way in efficiency and effectiveness."

Mendel’s innovative approach to creating clinician-like AI was included in a recent NVIDIA blog post, highlighting Mendel’s applications across clinical research, real-world evidence generation and cohort selection.

NVIDIA Inception helps startups during critical stages of product development, prototyping and deployment. Every Inception member gets a custom set of ongoing benefits, such as NVIDIA Training credits, preferred pricing on NVIDIA hardware and software, and technological assistance, which provides startups with the fundamental tools to help them grow.

About Mendel AI:

Mendel AI supercharges clinical data workflows by coupling large language models with a proprietary clinical hypergraph, delivering scalable clinical reasoning without hallucinations and ensuring 100% explainability. Headquartered in San Jose, California, Mendel is backed by blue-chip investors, including Oak HC/FT and DCM.

For more information, visit www.mendel.ai or contact marketing@mendel.ai

‍

Media Contact

Sylvia Aranda (on behalf of Mendel)

saranda@realchemistry.com

‍

Updates

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Read →

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

San Jose, CA - July 2, 2024 - Mendel, a leader in Clinical AI, today announced the results of its latest research on Neuro-Symbolic AI where Mendel’s Clinical AI system can automate the identification of patient cohorts from unstructured and structured EMRs, outperforming GPT-4 in several benchmarks. Mendel’s unique clinical AI approach couples large language models (LLMs) with its proprietary hypergraph reasoning engine. The research unveiled by Mendel showed how it is able to power significant advancements in Automatic Cohort Retrieval (ACR), a fundamental task for clinical research and patient care. This research can be read in full here.

Transforming Cohort Retrieval

Identifying patient cohorts is essential for clinical trials, retrospective studies, and other healthcare applications. Traditional methods relying on automated queries of structured data combined with manual curation are time-consuming and often yield low-quality results. Mendel’s AI offerings utilize a unique approach that couples a world-class clinical LLM trained to understand structured and unstructured text with a proprietary reasoning engine infused with medical knowledge reviewed by medical professionals to apply a clinician’s mind to complex and varied medical situations. This ability to apply clinical reasoning to ACR has been demonstrated to offer significant improvements over existing Retrieval-Augmented Generation (RAG) and LLM techniques.

“Our latest research at Mendel marks a significant milestone in the field of AI in general, and healthcare in particular,” said Wael Salloum, Cofounder and Chief Science Officer at Mendel. “We are the leader in clinical reasoning by coupling LLMs with our hypergraph reasoning, enhancing both the effectiveness and efficiency of patient cohort retrieval. This work is critical in paving the way for more robust and scalable clinical reasoning. This breakthrough underscores our commitment to advance the AI field to transform clinical research and improve patient outcomes.”

Key findings of the study include:

This research introduces two types of reasoning to the AI field:

Longitudinal Reasoning: Mendel’s neuro-symbolic architecture outperformed pure LLM approaches by efficiently handling the longitudinal nature of unstructured Electronic Medical Records (EMRs). As a patient’s record unfolds over time, the system reasons over the emerging facts, contrasting, rejecting, and consolidating them into a symbolic patient journey. Unlike LLM-only approaches, this approach processes a patient’s EMR just once, offline, to construct a journey that can be queried repeatedly at minimal cost.
Large-Scale Reasoning: Mendel's integration of real-time hypergraph reasoning and a clinical LLM achieved higher Precision and Recall in cohort retrieval tasks. Unlike LLM-only solutions, which process the entire patient database for each query—making them infeasible for healthcare applications—Mendel’s approach maintains a fixed cost per query, regardless of the database size.

Benchmark and evaluation

Mendel’s research introduces a new benchmark task for ACR, featuring a comprehensive query dataset and an evaluation framework. The study compared the performance of Retrieval Augmented Generation (RAG) and LLM-based solutions and Mendel’s neuro-symbolic systems, providing a detailed analysis of their effectiveness and efficiency.

In the evaluation, Mendel had a 1.4K patient data set, and Mendel evaluated several embeddings and found Ada outperformed others. The evaluation report compares Ada with GPT4 (RAG) to Mendel’s Neuro-symbolic System, Hypercube. F1 score is the key metric used to evaluate the accuracy of models, balancing both precision (how many of the results are relevant) and recall (how many relevant results were identified). This score provides a comprehensive measure of the model’s performance.

Below are the sample results of F1 scores:

Medium complexity with most queries (52)
Broad cohort size
Document count is greater than 74

Future implications

The findings underscore the transformative potential of its Neuro-Symbolic AI system by combining LLMs with domain-specific knowledge embedded in hypergraphs. This approach enhances the accuracy and efficiency of cohort retrieval, facilitating more precise patient stratification and targeted interventions for new therapies. It also paves the way for broader applications in clinical research and patient care.

Mendel is offering a free “TryMe” demo to showcase the company's Neuro-Symbolic AI system.

About Mendel

For more information, visit www.mendel.ai or contact marketing@mendel.ai

‍

Updates

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Delays in clinical trial enrollment and difficulties enrolling representative samples continue to vex sponsors, sites, and patient populations. Here we investigated use of an artificial intelligence-powered technology, Mendel.ai, as a means of overcoming bottlenecks and potential biases associated with standard patient prescreening processes in an oncology setting.

Read →

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI-Assisted vs Standard Methods in 3 Oncology Trials

Methods:
Mendel.ai was applied retroactively to 2 completed oncology studies (1 breast, 1 lung), and 1 study that failed to enroll (lung), at the Comprehensive Blood and Cancer Center, allowing direct comparison between results achieved using standard prescreening practices and results achieved with Mendel.ai. Outcome variables included the number of patients identified as potentially eligible and the elapsed time between eligibility and identification.

Results:
For each trial that enrolled, use of Mendel.ai resulted in a 24% to 50% increase over standard practices in the number of patients correctly identified as potentially eligible. No patients correctly identified by standard practices were missed by Mendel.ai. For the nonenrolling trial, both approaches failed to identify suitable patients. An average of 19 days for breast and 263 days for lung cancer patients elapsed between actual patient eligibility (based on clinical chart information) and identification when the standard prescreening practice was used. In contrast, ascertainment of potential eligibility using Mendel.ai took minutes.

Conclusions:
This study suggests that augmentation of human resources with artificial intelligence could yield sizable improvements over standard practices in several aspects of the patient prescreening process, as well as in approaches to feasibility, site selection, and trial selection.

Download full paper

Updates

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

The application of Artificial Intelligence (AI) in healthcare has been revolutionary, especially with the recent advancements in transformer-based Large Language Models (LLMs). However, the task of understanding unstructured electronic medical records remains a challenge given the nature of the records (e.g., disorganization, inconsistency, and redundancy) and the inability of LLMs to derive reasoning paradigms that allow for comprehensive understanding of medical variables. In this work, we examine the power of coupling symbolic reasoning with language modeling toward improved understanding of unstructured clinical texts. We show that such a combination improves the extraction of several medical variables from unstructured records. In addition, we show that the state-of-the-art commercially-free LLMs enjoy retrieval capabilities comparable to those provided by their commercial counterparts. Finally, we elaborate on the need for LLM steering through the application of symbolic reasoning as the exclusive use of LLMs results in the lowest performance.

Read →

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Download full paper

Updates

Mendel’s New Look: Website Update

We’ve changed our look. Our goal remains the same: make medicine objective. The new site highlights the way our proprietary AI enables organizations to achieve quality and scale when structuring unstructured data. It comes down supercharging your clinical abstraction. We’ve validated that our human in the loop abstraction approach can support a machine that understands medical context like a physician. In our own experiments, the number of variables needing correction decreased by 40%. High quality abstraction = high quality data for cohort selection, real-world evidence, and registries.

Read →

Mendel’s New Look: Website Update

We’ve changed our look.

Our goal remains the same: make medicine objective.

The new site highlights the way our proprietary AI enables organizations to achieve quality and scale when structuring unstructured data.

It comes down supercharging your clinical abstraction. We’ve validated that our human in the loop abstraction approach can support a machine that understands medical context like a physician. In our own experiments, the number of variables needing correction decreased by 40%. High quality abstraction = high quality data for cohort selection, real-world evidence, and registries.

Learn more about our three solutions:

De-identification Engine

‍Cohort Selection

Analytics and Research

Case study

How a diagnostic company was able to build a clinico-genomic database in a week

The customer, a key player in the genomics space, had a strategic initiative to build a clinic genomic database to support their life sciences customers.

Read →

How a diagnostic company was able to build a clinico-genomic database in a week

Introduction

The customer, a key player in the genomics space, had a strategic initiative to build a clinic genomic database to support their life sciences customers.

The Problem

The customer’s problem came down to time and scale. They had a list of 60k patients and were looking for 120 variables. Their team of 4 abstractors took 1.5 hours per patient and abstracted between 200 to 400 patients a month. They wanted to be able to scale their abstraction efforts and move faster, increasing the team’s productivity.

Solution

Mendel helped the customer scale their abstraction efforts. The team leveraged Mendel’s proprietary AI pipeline and abstraction workspace to index their unstructured data and extract key clinical attributes such as Cancer Diagnosis, Staging, Metastasis, Date of Diagnosis, Biomarkers, Surgeries, Diagnostic Procedures, Outcomes, and Response.

Results

By partnering with Mendel, the diagnostics company was able to process 60,000 patient records in 8 days. Without Mendel, this would have taken them approximately 5 years with a team of 4 abstractors.

The variables that caused difficulty were concepts like surgery, outcome, and response. In these cases, a human abstractor stepped in and leveraged targeted abstraction. The Human + AI approach exceeded the quality of a human only approach for every variable we studied. This is not surprising, since leveraging the AI output gives the human abstractor a significant advantage–making abstraction teams more efficient.

Case study

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

One clinical trial organization was using manual chart review and was looking to reduce the time it takes to find eligible patients.

Read →

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

Introduction

One clinical trial organization was using manual chart review and was looking to reduce the time it takes to find eligible patients.

The Problem

In order to identify eligible patients for clinical trials, research coordinators typically use information from structured data such as ICD codes. Clinicians then combine that knowledge with the information gathered from reviewing a patient’s chart. This manual process takes hundreds of hours and is riddled with inefficiencies and errors.

Consequently, 40% of clinical trial sites under-enroll compared to plan, and 10% of sites fail to enroll even a single patient

The Goal

This customer wanted to explore if Mendel could replace manual chart review and deliver on two goals:

reduce the amount of time it takes to detect eligible patients
identify more patients overall

The Test:

The organization had already manually reviewed 5000 charts in two clinical sites for an oncology trial and identified 26 eligible patients in 1 week.

They tested this against Mendel’s advanced query builder and clinically smart search. Mendel helped the organization index the clinical data of 17,354 patients in the same two clinical research sites in 3 days.

The Results:

Leveraging Mendel’s search alone significantly reduced effort to identify eligible patients
90% of eligible patients confirmed via chart review ranked within top 250 patients from search results
Reduced number of patients that need to be reviewed by 95% vs using structured data alone (e.g., ICD codes)

‍

From the Desk of the AI Team

How to Approach De-Identification

Organizations that use patient data for internal or external research need to take steps to prevent the exposure of PHI to those who are not authorized to view it. They do this by redacting specific categories of identifiers from every patient document. Once the identifiers are masked, the risk profile of these datasets is significantly reduced. But how do you ensure that redaction engines are working to the highest accuracy?

Read →

How to Approach De-Identification

Introduction

But how do you ensure that redaction engines are working to the highest accuracy?

The Mendel Approach

Across multiple assessments and heterogeneous datasets, Mendel’s Redact has been certified to perform well above the 99% HIPAA threshold, satisfying expert determination. In this video, Simran Tiwari, AI Engineer, takes to the white board to get under the hood of this module. She talks about what makes Redact different from other de-identification engines. She explains how Redact is able to protect patient privacy without losing out on important context and knowledge.

Mendel’s mission is to decode the world’s unstructured patient data. Mendel's core is a novel proprietary AI technology, capable of understanding a vast and ever-increasing range of clinical contexts and attributes at human expert level and machine scale.

Do you have questions around Redact and Mendel’s approach? Reach out and join the conversation at hello@mendel.ai

Team Retreat

Mendel Retreat: Adventures and Team Building in Cairo

The Mendel team is still buzzing from our week-long retreat in Cairo. The theme of the retreat was “coming together” and it was the first time the American and other remote employees were united with their Egyptian counterparts. Although there were many adventures–missing flights, seeing the pyramids, haggling at Khan el-Khalili–the highlight of the trip was collaborating together, as one global organization.

Read →

Mendel Retreat: Adventures and Team Building in Cairo

The Egyptian employees went above and beyond to make the American and remote employees feel welcome in Mendel’s Cairo office. The level of hospitality and willingness to put everyone at ease set the tone for the week. Through company-wide meetings, breakout sessions, and long drives after work we gained the conversational abilities that create resilient teams.

Team Mendel outside of our office in Cairo.

One of the best parts of travel is learning how different environments impact the way people communicate and adapting to it. Speaking as an American employee, I loved immersing myself in Cairo’s culture–both work and social. Cairo is truly the city that never sleeps (some of our coworkers went to the eye doctor at 10 pm). The pulse of maze-like streets and the reflection of the bright neon billboards on the Nile creates a constant backward and forward movement in time. In Cairo, you can’t think of the future without thinking of the past and we were able to see that through the eyes of our coworkers. It’s such a special feeling to learn about a city through the amazing people who live there. It was a week that none of us will forget.

Quality and Scale… built to last, just like Mendel.

Some of the Mendel team traveled in style!

Ladies of Mendel + Prag at an ahawa. The cafe culture in Cairo is truly something else and we are still thinking about the sahlab and hazelnut coffee we had here.

Old Cairo Food Tour…food not pictured (it’s in their stomachs)

You can feel the electric energy in the office through the photo.

Can’t talk about Cairo without talking about cats. They’re everywhere and most are very friendly. They sometimes jump on your lap to get pets.

Hussam (one of the Cairo team leads) with the Commerce and Recruiting Team (aka the party people) at the mall.

Portraits from our group drawing team building

‍

Mendel is an NLP pipeline and abstraction workspace that structures patient data at machine-level scale with human-level fidelity. We want to make medicine objective through analytics ready data. Interested in joining our mission? We’re Hiring. Yallah! Let’s go!

Competence via comprehension

AI for healthcare needs clinical reasoning skills

Artificial intelligence (AI) is playing an increasingly important role in the healthcare industry. But to fully leverage the potential of AI, it must be equipped with clinical reasoning skills - the ability to truly comprehend clinical data, or in other words, to read it as a doctor would. When it comes to data processing tools, only a tool capable of clinical reasoning can effectively process unstructured clinical data.

Download Whitepaper →

Competence via comprehension: AI for healthcare needs clinical reasoning skills

A healthcare ontology can help AI develop these clinical reasoning skills by providing a structured and standardized representation of healthcare concepts and their relationships. In healthcare, an ontology can be used to represent the relationships between various clinical concepts, such as diseases, symptoms, diagnoses, medications, and procedures.

The value of a healthcare ontology lies in its ability to help AI systems reason about clinical data and make accurate inferences.

But the standard clinical ontologies don’t reflect how clinicians actually describe medical concepts.

The complexity of healthcare data requires a thorough understanding of medical terminology, disease processes, and healthcare delivery systems. Without this foundational knowledge, it can be challenging to comprehend the data accurately. However, vocabulary isn’t enough – context is imperative. Doctors use shorthand, paraphrase, and assume domain expertise, which can be difficult to map onto an ontology rooted in vocabulary alone. Clinical reasoning requires far more than just word recognition.

As an example, consider the term EGFR. It can stand for “epidermal growth factor receptor” (a gene) or it can mean “estimated glomerular filtration rate,” (a kidney test). An ontology that understands both the vocabulary and the context of healthcare can help AI develop clinical understanding – which is the key to structuring unstructured health data accurately, efficiently, and without losing or misinterpreting crucial information.

Where standard NLP gets stuck

While natural language processing (NLP) can be useful for extracting information from clinical documents, it can struggle to reason like a clinician. Standard NLP often gets stuck at the surface level of the text, lacking the capability to move across documents in time and understand concepts that go beyond the text on the page, such as intent. This can lead to inaccuracy and imprecision when deviations from the standard sequence of events occur.

To be maximally useful, an AI tool must understand not just what is written, but what is implied. In order to support real comprehension, ontologies need to model knowledge in a different way.

At Mendel, we’ve developed a proprietary knowledge representation system that reduces concepts to their most basic entities. Using this approach, we created an ontology that AI can reason with – a breakthrough in the field. Furthermore, by mapping back to the standard ontologies in common use, Mendel's approach supports communication with existing healthcare systems.

For AI to develop clinical reasoning skills, its governing ontology must support learning.

Our reasoning algorithm with a governing ontology does just that – delivering comprehension, not mere recognition. With learning and comprehension support, AI can begin to unlock the full potential of healthcare data.

Our approach to knowledge representation could mean a more accurate and efficient structuring of your healthcare data.

Read our white paper to learn how.

Webinar

Large Language Models

Sailu Challapalli, our Chief Product Officer, spoke at a recent Harvard Business School Healthcare panel. The event brought together different healthcare and AI experts to discuss large language models and their impact.

Watch →

GPT3 and Large Language Models - an inflection point for AI

Missed the event? You can watch a recording here.

Here are some key takeaways from the conversation:

So much depends on the use case! What does your organization want to do? What do you want to automate? What do you want to scale?‍
Large Language Models can support healthcare organizations. LLMs have the power to abate physician burnout by summarizing relevant patient information. Clinicians can also use LLMs to answer questions or retrieve information about medical terminology and concepts. For example, one can ask: what is NSCLC or what are common treatments for NSCLC? Large Language Models can also be used to generate standard summaries and outputs, such an ER discharge report.‍
Large Language Models have their limitations. LLMs cannot support complex reasoning, data curation, or real-world evidence generation. They have engineering limitations and they are hard to update based on new knowledge, like new treatments and standards of care. They also don’t tie their results back to the source evidence. The biggest weakness of LLMs is that they are based on language not logic: the answers change based on how questions are asked and often contain bias. How do you stabilize it?

‍

A panel of experts agree: if your healthcare organization is not using AI you’re losing out on value. However, as you can see from the talking points above, there are nuances to understanding how to train and apply LLMs and there are many use cases where LLMs alone are likely not sufficient.

The first step to bringing AI to your healthcare organization is understanding your own needs and use cases.

Large Language Models are only one component of Mendel’s combined approach to AI data processing. Mendel uses both symbolic AI and machine learning to support reasoning since enabling logic is at the center of all our work. Consequently, Mendel’s resulting AI processing pipeline looks at individual patient records to understand individual patient journeys and outcomes.

Want to learn more about Mendel’s approach and if it can help your organization? Contact us at hello@mendel.ai.

Abstractor Spotlight

Shanna Wells, Clinical Abstractor

Manually abstracting patient data at scale is an herculean task for humans alone. It is slow, expensive, difficult, and requires extreme precision and accuracy. Organizations have to choose between breadth and depth when it comes to making data useful for decision making. Because of these challenges, the Mendel team created Carbon. Carbon is an easy to use workspace that allows clinical abstraction teams to efficiently curate high quality clinical datasets at scale. The foundation of Carbon is Mendel’s AI. Carbon pulls directly from Mendel’s AI platform to give abstractors a headstart in identifying relevant data elements within a patient’s chart.

Read →

Abstractor Spotlight – Shanna Wells

Introduction

Because of these challenges, the Mendel team created Carbon. Carbon is an easy to use workspace that allows clinical abstraction teams to efficiently curate high quality clinical datasets at scale. The foundation of Carbon is Mendel’s AI. Carbon pulls directly from Mendel’s AI platform to give abstractors a headstart in identifying relevant data elements within a patient’s chart.

Carbon began as an internal, proprietary tool for Mendel’s clinical abstraction team and AI teams. We didn’t find anything in the market that matched our needs, so we decided to build our own. Carbon makes it easier for these teams to work together to train and develop our models.

Abstractor Spotlight

Today we are spotlighting one of our internal clinical abstractors, Shanna Wells.
‍

Tell me a little bit about your role. What is your title, and your daily responsibilities?

I'm a clinical abstractor with Mendel. My daily responsibilities are varied as our tasks are varied. We run the gamut from actual abstraction of clinical data to verification of the AI's output to training part time team members.
‍

What would you say is the biggest challenge facing clinical abstraction teams today?

The sheer amount of raw data that is available is daunting. There are so many uses for this data that abstraction is really becoming a clinical subspecialty.
‍

Walk me through a project from end-to-end.
Where do you start?

When we first get a project, I typically review my patients to get an idea of what is ahead, then I dive right in. Once I get into a chart, I am going to review all of the documentation to determine if there is duplication of documents or anything within that might cause the chart to be rejected. Once I have determined that the chart is good to go, I begin extracting data and analyzing the information contained for the datapoints which are specified within the individual data model.

These data models are agreed upon "dictionaries" so to speak between our client and the abstraction team to keep our data consistent and only abstract the data points that our client is interested in receiving. After completing the chart, I move on to the next one until the project is complete.
‍

What does it mean to be “finished” with a project?

Depending on the purpose of the project (i.e., creating a gold set to compare with the AI data, or to deliver to our client, then there may be adjudications, or comparisons of abstraction between human/human or human/AI, and then a final decision on the extracted data points to arrive at what we would consider the "absolute truth," thus creating a gold set!
‍

What does success look like for clinical abstraction teams?

Success means meeting our deadline with impeccable accuracy, completeness and truth in data.
‍

How does the Carbon Abstraction Workspace help you better accomplish your goals?

Carbon creates an environment that makes abstraction easily translate unstructured raw data to usable structured and searchable data. It creates a work environment that is intuitive, attractive, and efficient.
‍

Share a tip for customers using Carbon!

Don't be afraid of customization! Having Carbon built to function specifically for your needs is possible!

Thank you, Shanna!

‍

Want to learn more about Mendel’s Abstraction Workspace and how it can help your clinical abstractors? Contact us at hello@mendel.ai.

Gold evaluation

Creating Accurate Regulatory and Reference Data

Within the real world evidence space, the generally accepted process for creating a regulatory grade data set is to have two human abstractors work with the same set of documents and bring in a third reviewer to adjudicate the differences. These datasets also serve a second purpose - as a reference standard against which the performance of human abstractors can be measured. Although this remains the industry standard, it is expensive, time consuming and difficult to scale.

Read →

Creating Accurate Regulatory and Reference Data

At Mendel we extend this framework and layer AI intelligence to build our own reference set as follows.

We start the same way as the industry standard: we have two human abstractors process the patient record independently (average of the 2 noted as H1) and a third human (R1) to adjudicate the differences. Mendel builds on the standard by including an additional abstraction layer which includes AI. We then run the patient record through our AI pipeline and have a human audit and correct the AI output (Human+AI). Finally we have a second review (R2) to adjudicate the difference between both human only and AI+human outputs.

Our goals are to ensure quality internally and for our customers. We use the gold set to measure performance of our AI models across the test cohort, generate a quality report, and conduct multiple types of validation to ensure that the data is clinically useful, has been processed correctly, and that the AI models are not skewed.

During this process, we hypothesized that results from AI + Human collaboration would rival the results generated using the previously described regulatory reference model–two human abstractors adjudicated by a third.

The Evaluation: Does combining human and AI efforts lead to high data quality?

At the end of 2022, we conducted a series of evaluations across therapeutic areas to assess how our models perform. We wanted to explore whether combining human and AI efforts lead to higher data quality than the regulatory standard and by how much.

In this experiment, we looked at a total of 140 patients across three therapeutic areas with the following sample sizes:

Breast - 40 patients
NSCLC - 40 patients
Colon - 60 patients

We calculated an F1 score to compare the performance of the average of one human abstractor or the average of two human abstractors (H1), two human abstractors with adjudication (R1), and the combination of one human and AI.

The F1 score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. We then compared the F1 scores for variables across therapeutic areas.

Understanding the variance across variables

All approaches, whether human only, adjudicated or Human + AI abstraction demonstrate variability in quality across data variable types. When we think about F1 performance it helps to divide a patient’s data variables into four groups:

Variables that have a high complexity for humans, but easier for AI
Ex. Variable is difficult to find due to length of record
Variables that have a high complexity for humans and AI
These variables could be difficult to extract because they are subjective
Variable is easy for both humans and AI to extract
These variable have a clear interpretation
Variables that are easier for humans, but difficult for AI
These variables may require leaps in reasoning

There is also a situation of compound data variables. These variables depend on multiple correct predictions and are difficult for both humans and AI.

Let’s look at the variables specific to Colon Cancer.

Below we compare the F1 scores of the Human+AI approach for colon cancer variables with the F1 scores of the Human only approach. The AI + Human approach F1 score is shown through the bar graph and the Human only F1 score is plotted over it.

The Human + AI approach exceeds the quality of a human only approach for every variable we studied. This is not surprising, since leveraging the AI output gives the human abstractor a significant advantage.

Does this hold up when looking at patient data that has been double abstracted and adjudicated?

Human+AI performs better than a double extracted and adjudicated data set

In the chart below we compare the average of the pooled results across breast, lung, and colon cancers against the gold standard reference set.

The AI + Human approach has an F1 score of 92.2% and the double abstracted and adjudicated (R1) set has an F1 score of 87.8%. Both approaches perform acceptably high vs the gold set. However, the A1 + Human approach’s F1 score shows an increase of 4.77%.

In addition, the AI+Human application is ⅓ of the effort and cost of using three humans, making this approach inherently more scalable. In our next post, we will explore time savings.

Interested in learning more about this evaluation and Mendel’s process? Contact hello@mendel.ai.

Mendel is an end-to-end solution that uses the power of a machine and the nuanced understanding of a clinician to structure unstructured patient data at scale.

From the Desk of the AI Team

How to Approach Document Categorization

AI projects have created tangible results for a wide range of industries. Despite the innovation, it is important to remember that AI is not a magic wand that will solve every problem in every industry with a single wave.

Read →

How to Approach Document Categorization

Introduction

At Mendel, our AI development is informed by three core values: trust, transparency, and learning. We hope sharing our approach will help healthcare organizations find the right tools for the right use cases.

The Mendel Approach

The Mendel pipeline is the core of our AI Driven data processing product. The pipeline leverages a combination of deep learning algorithms and several rule based systems, including our proprietary medical ontology.

Today we will discuss Rectify. Rectify is Mendel’s document segmentation and categorization engine. In this video, Thiago Santos, AI Engineer, takes to the white board to get under the hood of this module.

Do you have questions around Rectify and Mendel’s approach? Reach out and join the conversation at hello@mendel.ai.

Blogpost

Reading Clinical Data Like a Doctor: What’s Missing from DIY Systems

Before embarking on any new endeavor or enterprise, certain questions come to mind: How are we going to handle this? Does our team have the expertise, bandwidth, resources, and time to handle this undertaking on our own? When it comes to finding a scalable way to structure your unstructured healthcare data, the answers to these questions will impact when/whether you deliver a top-tier product for your clients.

Download Whitepaper →

Reading Clinical Data Like a Doctor:

What’s Missing from DIY Systems

‍

In this article, we’ll explore the key factors needed to deliver high-quality, stable outputs from your unstructured data.

Assembling a DIY pipeline

It is certainly possible to piece together your own pipeline for unstructured data processing using off-the-shelf offerings. To do so, a variety of tools will need to be purchased from multiple sources.These include:

Optical Character Recognition (OCR)
PHI redaction
Natural Language Processing (NLP)
Natural Language Understanding (NLU)
Clinical interpretation and reasoning
Human abstractors
Human-in-the-loop workflow

All of these components must then be cobbled together into a reliable pipeline capable of processing unstructured clinical data that delivers results at the required scale and quality for maximum benefit.

Not available off-the-shelf

There are two key characteristics absent from the list above: AI built for healthcare and an end-to-end solution. No matter how advanced the individual components of a DIY assemblage may be, it can never offer these two crucial elements that make all the difference when processing unstructured clinical data.

AI built for healthcare

Unlike open-source options, AI designed for healthcare understands unstructured data with the mind of a physician. Natural Language Processing (NLP) was only designed to understand very short amounts of text, and was not built to decipher healthcare-specific language. Additionally, it lacks the common sense, reasoning, and cognition necessary to accurately decipher the often-idiosyncratic text found within unstructured healthcare data.

However, AI built specifically for healthcare has the ability to read hundreds of pages of documents about a single patient, put all the notes of medical jargon and information together (rather than losing all previous information after a page is turned), and understand it the way a clinician would. This simply doesn’t exist with open-source NLP. And while it’s possible to manipulate various AI components to adapt to certain healthcare considerations, the end result will not measure up to a system built specifically for healthcare from start to finish.

It’s a bit like trying to retrofit a sedan with an upgraded engine, brake job, steering wheel, and set of tires thinking the end result will be a Formula 1 car. Even with the most talented mechanic assembling the parts, the resultant sedan wasn’t designed to handle corners at 120 miles per hour–the chassis will buckle, and fail.

An end-to-end solution

An end-to-end pipeline has high-quality components purposefully designed to function together. Since off-the-shelf assemblage uses parts from multiple sources, there are more opportunities for issues to arise between the various pieces and vendors. Each component must build off the previous one–so if an error occurs, it must not only be dealt with at the source, but throughout the entire pipeline. Presuming, of course, that the error can be detected and pinpointed.

By contrast, when each piece of the pipeline is built and designed under one roof with the same outcomes in mind, stability is the result. This also ensures that if issues do arise, they will be dealt with by one team that understands the system inside and out.

The Mendel difference: We needed it, so we built it

Mendel founders Karim Galil, MD, and Wael Salloum, PhD, had a shared vision to make medicine objective by developing an AI that can read records like a doctor, at scale.

We’ve spent years developing the ontology and models that set Mendel’s system apart, building hierarchical representations of data that allow for objective clinical decision making, and combining symbolic AI and machine learning to recreate the mind of a clinician.

Our built-for-healthcare solution makes unstructured data machine-readable and HIPAA-compliant, and it has the ability to extract patient data with clinical intelligence, all in a white-glove, end-to-end solution.

Download our whitepaper for an in-depth look at this complex problem, and see how Mendel is advancing innovation in this field.

‍

Webinar

What is the Gold Standard? Exploring the challenges of structuring unstructured data in healthcare

Human abstraction has long been considered the gold standard for extracting high quality information from EHR data. With the rise of NLP and machine learning, how should we evaluate these new technologies and are human abstractors still the correct comparison?

This webinar took place on Thursday Oct 27 at 1PM PT/4PM ET, with subject matter experts Zeke Emanuel (UPenn), Viraj Narayanan (Ontada) and Karim Galil (Mendel) as they discussed the evaluation process. This session brought together the academic, commercial, and ethical questions that are the foundation of creating the gold standard for high quality real world evidence datasets.

PODCAST — 40 minutes

Leslie Lamport, 2013 Turing Award Winner on Patientless Podcast #011

Leslie Lamport is known for his fundamental contributions to the theory and practice of distributed and concurrent systems, notably the invention of concepts such as causality and logical clocks, safety and liveness, replicated state machines, and sequential consistency. Full Youtube video: https://youtu.be/rNQFPz2KSzQ

Listen and read →

‍Leslie Lamport, 2013 Turing Award Winner on Patientless Podcast #011

Karim Galil: Welcome to the Patientless podcast. We discuss the good, the bad and the ugly about real world data and AI in clinical research. This is your host, Karim Galil, Co-Founder and CEO of Mendel AI. I invite key thought leaders across the broad spectrum of believers and descenders of AI to share their experiences with actual AI and real world data initiatives.

‍Karim Galil: All right. Hi everyone. This is another episode for our podcast. Today's guest is, is not a household name. I was just telling him that before we get started, but his work has been the foundation of a lot of your day to day computing from your internet search to the infrastructure that Google and Amazon runs with on to almost every AI engine that is working today.

His work has granted him the Turing Award. Interestingly, there is no Nobel Prize for computing, so it's the analog of of, of Nobel Prize in computing. In 2013, he won Turing Award for his work on distributed systems. Our guest today is Leslie Lamport. And we're super fortunate to have him joining our podcast.

Thank you for making the time for this.

Leslie Lamport: Thank you.

Karim Galil: So why don't we start by explaining what are distributed systems? I, I've tried to educate myself on what that means, and the way you described it is it's the failure of your computer. Because of the failure of another computer that you're not even aware of, its existence.

So can we explain more about that and what are distributed systems and how does it affect computing and if there is anything in healthcare?

Leslie Lamport: Well, distributed computing means that you're basically running a program that uses more than one computer. Simple as that.

Karim Galil: The, the, the work that you have done had to do.

time, because I believe one of the things that you've said is like the notion of time for two observers is not the same. And I, I'm, I'm still cannot understand that. And how does that lead to distributed systems?

Leslie Lamport: Well, I guess I need to explain special relativity to you.

Karim Galil: We have an hour.

Leslie Lamport: Okay.

I can really explain it very simply. You may have read, you know, articles or books or something that says, oh, this is really strange, and meter sticks shrink when something is moving, and all stuff like that. What the thing that. It's the basis of relativity that makes relativity different from Newtonian mechanics is the realization by Einstein that what.

It means for two things to happen at the same time is not some invariant notion that's the same for everybody. It depends on two different observers. We'll have a different notion of what it means for two things to happen at the same time if they're moving relative to one. And that's it. And special relativity simply comes from that observation and the observation that no matter these two people who are moving with respect to one another, when they measure the speed with which a light beam is traveling, they both get the same speed, you know, 300,000 kilometers per second.

And you take. That fact. And basically the rest of special relativity falls from that, follows from that and well, somebody built designed an algorithm for, well, it was an algorithm for distributed database. And distributed database means you have a single database, but two people using different computers may be accessing the same data.

And so you need some way of synchronizing them. And particular if two people.

Issue a command at about the same time, the system has to decide which one occurred first. If what is setting a value, changing a value, and the other is reading the value. They have to decide whether the read comes before the change or or after it. And they had an algorithm for doing that. And what I realized is that the, his algorithm.

Their algorithm was a pair of people, violated causality. what that means is that, even though, you know, one command should have happened before the other they, the commands. For example, I might send you a message saying, Hey, I've just added, you know, a thousand dollars to your bank account. Now you can go withdraw it and you try to withdraw it. And it's the system says, no, you can't, because it decides that your withdrawal request came after my deposit request, even though.

It shouldn't have, of course. And so that, caused me to realize that in order to be able to synchronize what's going on the tutor from computers, you have to have a way for them to agree on you know, what event, whether an event in one computer happened before or after an event in another computer.

And I realized that there's an analogy between special relativity and distributed computing. The analogy is that if you think in what in special relativity, the notion of coming before means that one event comes before another. If communicating at the speed of light the first event can influence the other event, namely, it's possible for something that the exist, it's the first event to be communicated to the second event.

Well, there's an obvious analogy to that in distributed systems, namely one event comes before and other. If it's possible for that event to influence, you know, the first event to influence the second, not by sending light beams, but by sending messages, but not by light beams that could be sent, but by messages that actually are sent in the system.

And then using that notion of causality, I was able to modify those two, the algorithm those two guys had so that it, satisfies causality.

Well, the other idea is that I realized that this applies not just to distributed databases, but that the whole key to building any distributed system, which is getting the different computers to cooperate with one another can be solved by an algorithm that globally orders the things that happen in the in different machines.

in a consistent way that they all agree on.

Wael Salloum: So that's the partial order that's worked on. Yeah. And this is, this is fascinating cause this is like an instance of consid, like how you can abstract from outside of computer science from special relativity into computer science. I also heard that some physicists looked at your work and brought it back to physics.

Leslie Lamport: Yeah, there are physicists as one particular physicist. I can't remember his name offhand, who, thinks that this is very important and I've never been able to understand what he's doing and you know, what the point of it is. And so I can't I can't say what whether that is really, you know, some important physics or, or not.

This is not in some sense a, you know, the, relation between physics and, you know, computing here is, is should not be surprising because computers are basically physical devices and, you know, the laws of physics apply to them. And one thing that has I think distinguished what things that I've done from what most other computer scientists working in this field of concurrent computing or, you know, concurrency is that.

They tend to view it as a mathematical problem. You know, concurrency is a mathematical problem and I tend to view it as a physical problem. You've got two things happening at the same time, that's physics. That has just inspired me to look at things in a, in a way that, you know, different from, little different from the way other people have.

That is really interesting. Yeah.

Wael Salloum: And that's basically your work on, on temporal logic and on TLA. Can you talk a little bit more about TLA and how it revolutionized the work, even including some recent work on how competing at Amazon other patients?

And how important it is to define systems in a formal way to make sure that you reduce the number of bugs and you know the system is actually doing what it's supposed to do.

Leslie Lamport: Well I think it should be obvious that you know, we'd like to remove bugs from programs. You know, the, I don't know if you get as frustrated as I do at all of the stupidities in that I see in in the programs that I use, but I got into it because concurrent algorithms are simply very hard to get right and.

You can take a concurrent algorithm, you should say algorithms rather than programs. Concurrent programs are hard to get right, but I've worked on, you know, I started working on algorithms, not programs, and you can write a, you know, just an algorithm in a few lines that, you know, looks sort of so obvious and it can have.

You know, a very, it could be wrong, you know, simply it has a bug doesn't do what it's supposed to do. So I realized that, I needed, a way of reasoning rigorously about, concurrent algorithms Well, I guess the basic thrust of what I did in the, in the course of, you know, 10 or 15 years was come around to the realization that reasoning about algorithms and reasoning means mathematics. And if I'm in a reason about something mathematically. The best way to to do it is to describe it mathematically, and I basically developed a way of describing algorithms, me particular concurrent algorithms mathematically, and I discovered that.

It really worked well. I mean, the TLA is the particular way I just found that makes it work well for both describing mathematically concurrent algorithm and proving things about it and.

You know, while that was going on, people also began to realize that this people, we started out proving prop things, properties of particular algorithms, but then people started worrying about, well, what is exactly is this algorithm supposed to accomplish? And so, we also started looking at the problem of how you describe precisely what it means for this algorithm to be correct.

and when you describe the, the algorithm correctly, and then you describe what it's, I'm sorry. When you describe the math, the algorithm mathemat. And then you describe what it's supposed to do mathematically, then it's correctness becomes a, you know, a very simple mathematic, well principle, simple mathematical formula.

Namely that the formula that describes the algorithm implies the formula that describes what it's supposed to do. And so everything gets reduced very beautifully to mathematics.

But I realized that, this way of thinking about, first of all, about mathematically about what something is supposed to do is useful in practice. Because when people build distributed systems, the first thing they should do is figure out what that system should. And what a system should do, especially when it comes to a concurrent system, is,

I mean, it can be a subtle matter and it really helps building the system to, you know, to get it right. I mean, I noticed. When people were describing when engineers were describing standards for, you know, some kind of communication standards or something like that, you want to have, you know, a precise notion of, of what it means.

So two people, you know, two people can go off, if it's communication standard. And build, you know, separately build the systems and, you know, and if they follow the standard correctly, you want to make sure that they will do, you know, what, you know, what they're supposed to do, what the two systems will work together properly.

And then also, I realized that algorithms, well, the people, Writing programs. something Tony Ho said wrote years ago that I didn't understand at the time. He said, inside every big program there's a little program trying to get out and what he realized, what he meant, and I confirmed this with him a few years ago, is, Inside every algorithm, or I would even say not inside of it, but, what that algorithm is trying to do is implement.

What the program is trying to do is implement an algorithm a more, a simpler, you know, more abstract thing, that. And if that algorithm is correct then and you correctly implement it, then the program will be correct. And that the best way in some sense to write the program is you write the algorithm first and then make sure that the algorithm works right and then.

Can then know that all that if you implement it right then the program is gonna work. Right.

Karim Galil: I think that's why you said writing is nature's way of showing you how stupid or, or like how flawed your thinking is or something like that.

Leslie Lamport: Yeah. That's actually, that's a quote from somebody else. The quite quote is writing is nature's way of showing you how sloppy your, how your thinking is.

And my addition to that is math is nature's way of showing you how sloppy your writing is. So you know, before you write a program, you do some thinking about you know, what it's gonna do and, and you know what you know and how it should do it. You should write that . You know, you should write and you should not.

You know, just think because it's easy to fool yourself when you're just thinking that you've you. What you thinking makes sense. But if you start writing it out carefully, then that's when you discover that that nonsense, you know, you know you've made a mistake or what you're saying, you know, doesn't work or, or something.

And

if you writing things in math is, is first it's more precise. It makes it, you know, you can actually, you know, prove whether what you're suing makes sense. And also you can build, because it's precise. You can build tools to check its correctness. And so that's what TLE has for, it's basically for writing.

For writing down precisely the thinking that you should do before you write the program. And then check that this thinking that this algorithm, I, I, I don't call it an algorithm anymore because when people think about algorithms, they think about, you know, things that are in textbooks, but those are the algorithms that you know.

The algorithms that go into programs are you seldom find them in a textbook because, you know, they're just used for this particular problem. So I, you know at the moment I'm calling them abstract programs, but it's like an algorithm. It's something that's higher level, more abstract than the program, and it's something that you implement with the program.

Karim Galil: You have very famous quotes. Or like thinking thoughts around that programming is not quo and an algorithm would not approve this conjecture. And seems that this is within the same theme of, of that.

Leslie Lamport: Yeah. Well, I guess, you know, what that reveals is that, you know, I was trained as a mathematician, , and.

I just sort of slipped into computer science,

Karim Galil: but in today, programming and coding are almost synonyms. And there isn't like really a key distinguishing between someone who's writing a program versus someone who's just implementing and, and, and coding that, that

Leslie Lamport: program. Yeah. Well, what I've said Is that, or what I've been recorded as saying is that you know, coding is to programming, what typing is to writing.

Now that's it's not a terribly accurate metaphor because coding involves, you know, thinking more than you know, and typing, you know, is just a mechanical action, but there's a kernel of truth. In the sense that coding should be the easy part of programming. And the hard part of programming is sort of figuring out what the program should do and then figuring out how it should do it.

And the metaphor that I've just stumbled upon, I think is better is that a program is like a book. and the code is like the words on the page. Now the code is written in particular language, but if you take the book and you translate it into a different language, it's still the same book. And it's the same book because the ideas what it's expressing are the same. And if you wanna, so if you really wanna talk about what the book well, we were talking about books, you know, you know, war in Peace or something. We don't have any better way of talking about, you know, what this book is really about other than by, you know, writing cliff notes or something , but, you know, in some natural language. But because what programs are about, Should be precise. There's a better way. We don't have to talk about them in programming languages. We can talk, we can talk about it in math because that is a nat, that is a language that is specifically designed to express the kinds of things that, the ideas that are behind.

Many of the aspects of programming is particularly the idea of what it means for this program to be correct. And so that's now my, the way I, you know, will try to sell TLA is that, It's independent, something that's independent of, you know, the particular implementation, you know, the particular code you write, it's telling you what's in, you know, explaining the essence of what your program is doing. Now there are other ways of doing that have been proposed, and a lot of them are really good for what I call traditional programs.

Programs that are sequential programs, they, you know, do one thing at a time, and what they do is they, you know, take some input, compute and produce an output. I call those the traditional program. And there are some, you know, really nice ways of, of doing that. But those ways, Don't work for concurrent algorithms, concurrent programs, but you know, that don't just do something that simple that first of all, they generally run forever and they're interacting with their environment.

They're not just, you know, producing an answer. And secondly they don't do things one at a time, but they enter, you know, the different pieces inter, you know, different processes interact with one another. And so I had to develop something that was different from these methods that were developed for traditional programs.

Although the ideas, you know Came from things that were done for traditional programs, but needed to be extended in, in some ways.

Karim Galil: So it's, it's as As a physician, I can't help but think of like the analogy of a human, so I, as I'm hearing you, I'm thinking of a program as the whole, like physiology of a human or like the whole human body.

And as the algorithm or abstract program as you said is an organ, They're acting in a concurrent way. In, in separate, in, in, they're acting separately, but also in concurrency to, for the body to kind of live and, and, and, and, and do what it's supposed

Leslie Lamport: to do, basically. Okay. Well it's close, but what the different organs are, the different processes that are running concurrently.

And the algorithm is a description of how. Cooperating should work, but it's a higher level description. Instead of talking about, you know, individual nerve pulses or, you know, individual blood cells or something you're talking about you know, the brain gives the stomach, you know, that, that they have gets information from the.

What is it? Your, your throat that, you know, something is heading that way and it sends, you know, that information goes to the brain and the information is sent to the stomach, which does something which will then, you know, send some kind of hormonal signal to something else. And so the algorithm is like this high level view of, of.

of how it works and the code, you know, the actual nerve fibers and blood vessels and cells and stuff that are the signal

Karim Galil: s being tasked, right? Mm-hmm. , that, that makes a lot of sense. I think for our audience, it's analogy.

Wael Salloum: So you're also touching on. Or proposing math as an abstraction of old programming languages that, like for example, if you're an author writing a novel as figuring out the characters and how they interact with each other on a very high level, regardless of whether you write it in English or French, like you will enjoy more in peace In French or in Arabic, or in Hebrew or, yeah.

English the same way. Exactly right. So math is a higher level representation. It's like a form of inter ling. A term we use in machine translation as like old languages are, like there is one language originally that old languages kind of evolved and located from. And you are talking about like the thinking that that math is kind of the, that you need to learn math.

And you mentioned that if, if you define the program mathematically or the behavior of the program before even you write the. That can give you ideas about edge cases or how actually to even write a better quote.

Leslie Lamport: Oh, give you a, some you missed, said a couple of words that I missed, right? Just at the very end.

Wael Salloum: Yeah. I said like when you, when you write the program Mathematically first, like you define it, let's say TLA trust. Mm-hmm. , you define the formulas that say this behavior of the system is acceptable and those behaviors are not. These are only the acceptable behaviors that it could open your eyes into building the program in a much better way.

Like it can, like, as opposed to starting to writing the code and then writing the unit test, which most people do today. Write the code, write unit, test, and integration test. What you're saying or what you mentioned with TLA is first, write the formulas in math, make sure that these are the acceptable behaviors, but that could also influence the way you can write the code itself.

Leslie Lamport: Well, there's this sort of view

People sometimes give that, you know, first, you know, you start with what the program is supposed to do and you don't care at all about, you know, how it's being gonna be implemented. And then you write this description and then you say, okay, now I've done this. Now let's see how I can implement it. The real world doesn't work that way.

You have an idea. Because it's very easy to specify something to, you know, say the program should do something that's impossible or that's simply going to be too expensive. So in practice, you generally have an idea of, you know, roughly what it's supposed to do, and roughly. How it's going to do it. And the actual process is a

it's an interaction between these two. The, the cool thing, you know, a cool thing about TLA is because, In an algorithm is a mathematical formula, and you know what it's supposed to do is a mathematical formula. So you can describe what it's supposed to do as a higher level algorithm. And then how it does it, you know, is a, you know, the general algorithm sort, the general idea of, you know, how it's going, the computers gonna work, how the, how the program is gonna work.

You know, can be a lower level algorithm described mathematically, and then that can then be coded. And what happens in, in actual practice is that this this second layer of, of algorithm is, becomes a useful design document. , and it's not, it, it generally tells you the really the important things about how the program should work and, and in more precisely in TLA, what you use it for generally is.

It to explain those aspects of the program that involve communication between the different parts, between the different pieces, between the different things that are acting concurrently. And the reason important reason for doing that at the high level is. Those things are a hard to get correct and b, hard to detect the errors at the lower level and the code, because They're hard to test for because there are concurrency introduces so many more possibilities because there's so many more orders in which different things can happen in different pieces. And in testing, you're unlikely to be able to catch all of them and particular, because you're not testing on the real system.

the particular orderings of things that you test are not likely to be the same as the ones that will occur in the real system. And therefore you can get, you know, you're very likely to get bugs that don't appear in testing, but appear, In the real system. and those can in fact be really hard to, to, to figure out what's going on.

I mean, I know of cases where, in order to try to, you know, there's been a, you know, a bug that they have been able to find . And so what they did was to write the high level TLA des. Of what the system is supposed to be doing and seeing if that bug occurs. And they then will have found the bug at the high level and say, oh, and then they know, you know, what the problem is in the program.

Well, of course, if they started with this high level, they would've found that bug before, you know, writing all this code and, you know, going to the trouble of, of debugging. A colleague of mine Gives a talk on TLA that's is titled How to Save Two Hours of TLA With Two Weeks of Debugging

Wael Salloum: you, you mentioned Sony Ho who did whole Logic, and you work, you work on logic, you work on logic and merge it. Logic of action, getting the obviously have you thought about working on logic and reasoning for AI purposes?

Leslie Lamport: No. Because, you know why? Well logic you know, logic at means, formalism and, and.

I don't start with the formalism. I mean, TLA came from my years of experience, of reason actually writing algorithms and trying to prove that they're correct, and so the logic came after. You know, after the practice, you know, after how it should be done. And, you know, I knew things were right because, you know, when, I was able in TLA to make precisely rigorous the kind of reasoning that I had been doing, Now I can't think about a logic for AI without having done AI

It's that simple.

Wael Salloum: So I can give you an example of what I'm working on, for example. So let's say you got a patient received, had cancer, and you got the date of diagnosis, and then you got the surgery for that cancer and the date of surgery, and you got that the date of surgery is before the date of cancer.

So that's obviously illogical, right? And your partial over actually would work this way. Like it can, there is an application with a lot of massaging of that technology into let's say a knowledge graph. Not a multi, not a multi system or distributed system, but maybe a multi, multi Asian system, like multi cognitive Asian system where each one of them has a certain belief of, let's say, what happened to a patient.

and there would be conflicts, there would be paradoxes. They can be detected using something like TLA or something like temporal logic or the different species of temporal logic. Right? So there are applications that I find your work very inspiring in that domain. You said once that you like to work on conflict more than collaboration in terms of.

On conflict more than collaboration in terms of like less on parallelization or concurrency and more on. basically something that p access system. Can you talk a little bit about like how you work on P system? How revolutionized distributed systems and the banking transactions?

Leslie Lamport: Well, hold it.

You went from two things, you know, from AI to p and which do you want me to talk about?

Wael Salloum: If you wanna talk about AI please talk about AI.

Leslie Lamport: I, well, What I wanna do is not talk about AI, but, many years ago, actually, when I was in grad school, I had a roommate who was you know, studying linguistics.

In fact, he's George Lakoff who's become a very well known linguist, of course. How did, and you know, what he would do is he was actually taking a class, I think from Chomsky who was at MIT, and The, he would come and say, I wanna do this. Is there any math that can do it for me? And you know what I had to tell him is that, you know, math doesn't solve the problems for you like that.

What you have to do is figure out. What the solution to your problem is. And then you can say, well, is there a math that will, that I can then describe this solution in? And then maybe, you know, I could then use math. That's, that's been developed. So the advice I would give to you is start with, you know, some very particular problem, you know, and as small as you can make it or as that, you know,

that would make a, a solution that it's for which a solution would teach you to something, teach you something. Then figure out how are you gonna solve this problem? You know what the problem is, how are you gonna solve it? And then when you see the solution to this very simple problem, say, well, how do we generalize this?

What is the, you know, what is the math that's going on under there? But in some sense, don't start with the problem and then look for the math that's gonna give you a solution. Look for the solution, and then say, you know, to a very simple problem and say, what is the math that's behind the solution?

Wael Salloum: Yeah.

Then you formulate it with math. Yeah. That, that makes perfect sense.

Leslie Lamport: Okay. Now what did you wanna know about Paxos?

Wael Salloum: So that's the, the idea of a multi-systems multiple systems working together on one point census. I, I find a lot of applications to that in also in knowledge representation and in in AI.

But if you wanna talk about it from a distributed system perspective, here's the

Karim Galil: super interesting thing that I'm noticing, right? Everyone claims that they're doing AI or they know something about AI or they want to talk about ai, but actually people with knowledge like you, you sent us an email, we asked you a lot of questions, some of which were about AI.

And one of your comments where I don't like to speak about things that I don't understand or something like around those lines. And I was like, wow, you're saying that, but when we're we go, actually in the industry, we're seeing the opposite. Where everyone's claiming that they know something about the AI or two.

Leslie Lamport: Yeah, that's well, there's something called that I call lamp port's law. You may have heard of Peter's principle which is that people get promoted to their level of incompetence. , they do a good job at something, then they get promoted, and if they do a good job, they get promoted. And when they do a lousy job, they stay there.

Well, that's actually a corollary of lamp port's law, which is that the qu the. Qualifications for at position are uncorrelated to the qualifications of actually performing the tasks that that position. So, and the example that, that I use that you know is relevant here is that, you know, because I've, you know done, Some, because I've gotten a touring award for getting, you know, doing some particular work.

I get invited to speak about all sorts of things that I know nothing about and completely incompetent to talk about. And I have the good sense to decline those invitations.

Karim Galil: I, I, it doesn't really make sense. I, I think I, I, we're, we're, we're coming to the end of the hour and I wanna be conscious of your time, but one other thing that I thought was very interesting when I was looking into your thoughts and, and, and, and writings is you're saying scientists will find more things in the industry than staying in academia or in, in the lab.

And it was almost like kind of an invitation for scientists to move towards industry rather than stay on, on the academic side of, of the equation.

Leslie Lamport: Well, remember that that was a comment I made about my career, which was at a particular point in time in a particular discipline. You know, namely computer science.

For example, I see no sign that a physicist needs to go to industry in order to you know, do elementary particle physics. Don't take that, you know, too seriously. You know, especially not in disciplines unrelated to computing.

Karim Galil: It's not an Abstractable algorithm. To borrow them to borrow like what we're talking about. We're coming into the end of the hour. We're super appreciative for you, like taking our invitation. We just sent a cold email. We actually were super surprised when we got the response back and, it, it just, I learned a lot in the process of what it means to actually be a good scientist just from our interactions.

And I'm super appreciative for you being generous with your time with us today.

Leslie Lamport: Well, I'm happy to talk to you. Thank you.

Karim Galil: Thank you.

PODCAST — 60 minutes

Eze Abosi, VP of New Products at Optum Life Sciences on Patientless Podcast #010

Eze Abosi is VP of New Products at Optum Life Sciences. Eze and Karim Galil, M.D. covered topics such as career background and the healthcare technical ecosystem. They also talked about creative solutions that entrepreneurs and companies are creating with access to data. The conversation also touched on unstructured data, the webinar with Guardant Health, clinical genomics, and NLP. Watch the full Youtube video here: https://youtu.be/95Kv64SyE0M

Listen and read →

‍Eze Abosi, VP of New Products at Optum Life Sciences on Patientless Podcast #010

All right. Welcome to another episode of Patientless and today we, have a very interesting guest from a very interesting company. I'm happy to welcome Eze Abosi to our podcast. Eze is vice President of New Products at Optum Life Sciences. Why don't you introduce yourself Eze

Eze Abosi: Thank you so much, Karim.

Been a pleasure to, to interact with you and the broader Mendel team. I'm excited to be part of the Patientless Podcast. And so in terms of my background generally, I've been in data analytics services, associated just solutions supporting life sciences for about, 16 years now. I spent 11 of my 16 ish years in the industry, with one company called Decision Resources Group, which has since been acquired by a public firm known as Clarivate.

In addition to my time at Decision Resources Group or DRG, as it's well known as, I've also worked for some very large organizations in this space, such as IQVIA. Likewise, I've been part of startup solutions and, and employers. Which I'll have one general theme, leveraging data analytics and insights to ultimately support pharma in terms of the various needs, in their workflows.

And so whether it's discovery, whether it's medical R&D or even commercial use cases, I've been able to basically collaborate with my colleagues as well as with, my pharmaceutical clients, to find solutions that help them, streamline their business issues. But currently I'm launching new products for life sciences on behalf of Optum Life Science.

Really cool role. in particular to new products. I'm laser focused on clinical genomics, and so being able to take high quality, genomic data, integrate that, in a very, a kind of appropriate way into Optum's core EHR and claims assets, and using that insight to derive, really just noval ways of discovering molecules, to derive novel ways of, of really gauging access or being able to support clients with, the proper stories that they can really kinda communicate to their stakeholders such as regulators. We're also launching some very unique and powerful NLP solutions, natural language processing, which of course is near and dear to your heart and, clinical trial solutions. So above and beyond those products, I also do support partnerships for Optum Life Sciences, which gives me the opportunity to interact with a variety of different innovative companies. So that's a, that's a concise summary of, of what I'm currently doing and kind of where I've been over my career.

Karim Galil: Tons of, exciting things. The clinical genomic one in specific is probably where we're gonna spend a lot of time. It is, it's, I think it's the holy grail of data is combining the phenotypic and the genotypic data together.

But before we get into that, so part of your role, if I understand correctly, is you're able to reach out to pretty much everyone that's working on the next big thing or on an exciting solution in healthcare, and you are able to basically understand what they do and see if there is any sort of, uh, fit into the Optum ecosystem, uh, whether through partnership, commercial agreement or acquisition, and.

You obviously have the luxury of an app Optum.com email, which means everyone is gonna respond to that. Is that part of what you do?

Eze Abosi: That is spot on. I'm, I'm incredibly surprised every day at how willing individuals are to respond to my emails.

Karim Galil: Well, if you have 80 million patient lives, everyone will respond to your emails. Which basically means like you have firsthand access to what's actually happening in healthcare. Right? So the way I look at the healthcare ecosystem, right? It's not only the regular payers, providers, pharma, the three P thing, there is now a whole ecosystem of small, medium size and even big startups that are changing healthcare as we know it today.

And there's two kind of people that you can get, like get to know those companies from their eyes. One is the investment community, the VC community. The other is folks like you who actually understand healthcare are actually doing healthcare and are like, I wouldn't say interrogating, but like investigating what's happening out there.

So how, how, how do you look at the healthcare technical ecosystem today? What's exciting, what's working, what's not working?

Eze Abosi: Well, that's a great question. Three main pillars. Data, analytics, consulting.

There are legacy players and certainly kind of innovative entrepreneurs and fairly large startups that are incredibly impactful at each of those core pillars. And of course there's an interplay. There could be some great, notable companies with fantastic data assets and also great platform.

But generally, I try to think about the marketplace and those three core silos. That being said, I think the analytical pillar can really be kind of peeled back. And so within the analytical pillar, I think about, I consider, basically the method. And so, the actual, analytical expertise or IP that differentiates how a certain company or a certain entrepreneur in some cases analyzes data.

I compare that to the actual technology that allows the client, whoever it may be, to interact with the data. And then I think what's also, very important is the visualization. Um, because ultimately we are analyzing this data, to derive insights that'll support how we address a business issue. And so that analytical layer can be assessed from my perspective, into technology versus analytical methods versus visualization platforms.

Karim Galil: That's very interesting. So you're basically looking at it as like a tech stack. Some sort of interfaces and a layer in between that actually kind of intricates the data and asks the, the right kind of questions.

So what's exciting? Like what are you seeing that you are personally excited about now?

Eze Abosi: So if you just kind of think about everything we've just talked about and, and just also kind of pay, pay unique attention to those three core pillars. For every single kind of workflow within healthcare, broadly, they're gonna be very, some, they're gonna be some very unique, players that are either legacy or just starting up in each of those core pillars.

Likewise, if you wanna get one level deeper for the life sciences, um, my perspective on, for example, innovative analytical platforms will differ if we're referring to the discovery use case versus medical versus clinical trial optimization versus, for example, commercial. And so that's what's most exciting is that depending on.

The primary kind of objective, of your business question or need. Um, there could be a wide variety of different intellectual properties, different platforms or different companies, um, that you may want to consider if not interacting with and that's what I find so exciting.

Karim Galil: the foundation of all of that is, at the end of the day, access to the data, right?

So you can have all the best interfaces and tech stack and IP, but if you don't have access to the data, you, you, you're pretty much like, um, it's like you have a Google, you build the best page ranking system, but you don't have access to the internet, right? Um, so are, are you guys at Optum willing and open to like, collaborate on that because you have a very significant access to data.

Also, what are you seeing entrepreneurs and companies solving for? So obviously everyone is not gonna go to Optum and say, Hey, can we have access to data? So what creative solutions are you seeing out there that has to do with interoperability and access to the data so that you're able to power those applications?

Eze Abosi: That's an incredibly good question. So the, I guess the, the simple answer is that our clients, are requesting at a more, at a higher and higher frequency, that we be, become more flexible in the way that we can allow them to partner with other key suppliers in their value chains. Um, and so ultimately the clients are driving it.

That being said, Optum is such a relevant stakeholder in many different aspects of the, of the healthcare marketplace, especially here in the us, that we have a very unique opportunity to collaborate and drive efficiencies across the broader healthcare ecosystem. Because of those opportunities, we're constantly assessing, how we can better partner, um, and leverage our data and analytics and technology to, for example, improve the way drugs are delivered to patients, by channel.

Versus, um, being able to support how data analysts and technology can help not only providers, but also pharma, uh, keep the patients aware of clinical trials as a care option, and certainly helping or using the aforementioned, again, data and analytic technology, um, to support how, um, a pharma company, um, can leverage evidence, um, to, to speak to the value of their technology or their drug, um, to a regulator or to a payer.

So the, the opportunity to collaborate is plentiful. From the Optum or the Greater United Health Group perspective.

Karim Galil: He, he, hearing you describing that and getting to know you throughout the last few weeks, I believe you're almost building like an app store on top of the data that you guys have been able to successfully accumulate. Is that like a good way of thinking about it?

Eze Abosi: We're building in that direction.

What's really exciting from my perspective is that my manager, um, who's the Chief Growth Officer of Optum Life Sciences, um, his name is Brian Irwin. He comes from a very unique background. He was, he studied in pharma, working for a Japanese multinational pharmaceutical company. He then moved on, um, to one of the, I think today what may very well be the most notable, uh, data agnostic clinical trial platform in the industry.

From there, after successfully helping the, the management team basically, sell the, the asset off, um, to, to a larger organization. He moved to Optum Ventures. And so he's very strategic in how he views kind of the opportunity in life sciences and he recognizes the importance of platform.

And so when I hear application, what I'm kind of, um, thinking about and rationalizing in my mindset are different platforms that the user, um, can leverage as needed. To ultimately answer a business question. Exactly. And so yes, we are absolutely thinking that way. Whether we realize that through organic or potential kind of strategic strategic initiatives, I think that's gonna be the exciting thing to watch.

Karim Galil: That's super exciting, because you're pretty much building an ecosystem around a very unique asset of data and basically looking at it as a teamwork rather than, it's an Optum thing. It's like, what's best for the end client and what can we bring there? But, um, one, one thing that I hate is when we just say data, because data is a very, like, generic term, what data?

Is it structured? Is it unstructured? Is it images? It's to your point, is it clinical? Is it genomic data? It's a pretty broad term. And um, lately you hear more and more things like data became a commodity, which I agree. If we say. Structured data became a commodity. Claims became a commodity, but data is not a commodity.

Unstructured data remains to be very hard to access. You guys are one of the very few players that did a good job there. Genomic data and combining and marrying it to the clinical unstructured data remains to be very hard to access. So let's speak about this a little bit more. Um, like how, what are you seeing happening in, in, in the unstructured data world and why?

I mean, why is it the future is unstructured data, not the claims data?

Eze Abosi: Great question, Karim. Um, I think what's driving the relevance of unstructured data is the continual focus on specializing drug development to highly precise populations. So let's think about in the nineties when the industry was laser focused on developing, molecules that were often focused on the volume of patients available and making available to patients therapeutics that could support highly prevalent, likely chronic diseases like diabetes, COPD, asthma, et cetera. Um, but as this shift from payer from, um, from volume to value, um, continues to transpire and ultimately it's about triple negative breast cancer. And so being able to take a broader population and then specifically tailor, uh, drug development to accommodate very specialized populations within a broader marketplace that's driving the need and the value, I would say of the unstructured data because the broader structured elements or tables, if you will, in data science, if you will, lingo, the physician notes, the symptoms, et cetera, you may not have the information to, to better understand specific cohorts like the triple negative population within breast cancer. And so that's the value.

It's the industry's, overall emphasis on driving value rather than volume.

Karim Galil: A hundred percent and so here's the thing. I believe in the next 10 years, there is not gonna be any clinical trial that is submitted to the FDA without some element of real world data combined with it.

Like, I'm not saying, and I'm not suggesting that real world data is gonna, um, I I think we tend to always be extreme in healthcare thinking zeros and ones. So it's real world data or traditional clinical trials. I think the answer is some way in between, in that spectrum. It's gonna, you're gonna find phase one, phase two, but phase three, phase four, there's a lot of things that you can get smaller cohort sizes, you can eliminate some end.

Like you can do a lot, you can better design the trial, but there is gonna be an element of real world data in every clinical trial in the next 10 years. And to your point, you're gonna need it to find the cohort of patients, but also the complexity of the endpoints that the FDA is asking for simply does not exist in a structured format.

Like good luck trying to find progression in cancer through like querying some tables. Right? which actually makes me very surprised that still. We are finding most of the data players, like data selling players, right? Or folks who are selling access to data, are selling access to structured data.

And I was like, Hello? Like, are, are you guys seeing the switch that's happening? Do you know of anyone that is actually giving access to unstructured data at volume like you guys at Optum?

Eze Abosi: Wow. So the answer is yes, but it's, it's few and far between when you compare that number, um, to the broader kind of real world data supplier, um, marketplace overall.

And so I'll actually revert back to a comment you made earlier about apps, how you see Optum and other kind of innovative real world data and analytical suppliers to the life sciences moving towards a suite of applications in the very near. I agree because of not only the seamless access you can deliver to, the consumer of that data.

Because in your traditional app kind of mentality, I think about being able to pull out your phone or your tablet and access that insight. Um, and so with, with very complex data like EHR or claims or the integration of both, which is of course the goal, um, it's, it's, it's, it's a little bit more difficult than what I just described.

Um, but the beauty is to be able to seamlessly kind of deliver, um, that insight, um, in real time on demand to the user. That's fantastic for the consumer of the data, but from a, from a, from the supplier perspective, what the application kind of interface allows us to is manage, um, and protect the data accordingly.

And so rather than, for example, delivering that data in your native environment, uh, when we can deliver it through a platform like a SAS based platform, if you will, um, that allows us to ensure that it's being used appropriately. And then most importantly, the confidentiality and the privacy, um, of the, um, ultimate kind of patient or consumers, um, where the insights are being derived that remains protected.

And so that's absolutely key, is just keeping in mind, um, that we must protect the consumer as we, as we look to combine novel data types across the value chain and the ecosystem.

Karim Galil: But when, when, when I look at the unstructured data landscape, um, the problem that I'm seeing is or the argument that folks with access to structured and claims data would say is like, Listen, yeah, I don't have, I have a pixelated idea of the patient through the structured data, but I have 300 million patient lives, so I have the breadth.

Um, and if I go the unstructured data out, I'm bottlenecked by abstraction. So maybe I have higher resolution picture, but only of the 30,000 patient population. Right. You guys are at Optum as, as, as far as I know, like one of the very few folks who have solved that problem where you are able to give access to breadth and depth at the same time.

Eze Abosi: Yeah. And so I wanted to begin commenting on that question with an emphasis on privacy because the unstructured data may contain private information, whether it's personally identifiable information or protected health information. And so the first thing we need to do when we think of we broadly the industry, um, is thinking about commercializing an unstructured data asset is to ensure that it's completely compliance to the best of our ability.

But I think that benchmark is well over 99% at this point, And so assuming that the data is protected, um, I think we then have to consider the delivery modality. Um, being able to take a massive amount of unstructured data that is hopefully compliant, um, and then delivering it to a, a, a clients of yours in their native environments that may be feasible.

Um, but I think that there's a, uh, a level of risk there, um, that some companies may prefer, um, to avoid. Um, and at Optum we tend to be very conservative. So going back to that initial point you made about applications, being able to provide an on-demand resource, um, to answer your business question yet be, um, a very appropriate medium, um, for a company like Optum to deliver, um, insights such as unstructured data that's really interesting.

We're continually investigating kind of, that's paradigm. Um, because at scale across therapeutic areas that may be one potential avenue, um, that we can explore, um, that way, um, we can deliver the insights, allow you to execute, for example, models as you see fit. Um, but we can ultimately just ensure that it's being used in the appropriate way and again that the, and the privacy of the consumer is completely protected.

So if you're going to do that, deliver unstructured data at scale, it must be done in the most appropriate way cuz it's incredibly sensitive information. I would, I think it's potentially even more sensitive than your classic structured elements that you would see in, for example, the EHR tables or in claims.

Karim Galil: Definitely. I mean, in structured data like hash, first name, hash, last name hash social security. So you're basically saying there's two elements of complexity here when it comes to unstructured data. One is PHI is How can you make sure that the unstructured data is phi free at scale?

And to your point, the benchmark here, like the safe harbor, is you need to exceed the 99% accuracy. Like that's the levels of tolerance that HIPAA can, can afford. The second element of complexity is now that this data is PHI free, you need to be able to abstract or, or index this data at scale. Which we today, like the state of art is to do it via humans.

And humans would take, we did the math before. It would take um, half a century and quarter a Tridion dollars just to abstract patients of 2021. If you get all patients, all medical records that were generated in 2021, all unstructured data and hire 80,000 abstracters, which seems to be the total number of abstracters in the us, it will take them that much time and that much money to just do this one single year.

Right. So abstraction at scale is a problem, but your point also, which is an interesting one, is de-identifying at scale. Would you say also there's a third level of complexity, which. How can you also tokenize this data and marry it to like genomic data, structured data? Correct. Would be, would that be also a third level of complexity?

Eze Abosi: That is the dream, um, that is the dream is to take, um, especially when you, when you're thinking about complex markets like autoimmune and certainly oncology, the, the key that value add that our clients are looking for really across all use cases, across the entire values value chain. Whether you're discovering molecules or you're trying to protect the life cycle of your, uh, asset that's about to go off patent, Being able to marry the genotypic and the insights, um, at a high rate of varacity.

Across these disparate data types. That's exactly what we're trying to do and achieve on behalf of our clients. I think the ideal is certainly the claims in the EMR linkage and by emr I'm talking about both the structured as well as the unstructured insights, but also integrating, um, imaging and certainly integrating the appropriate kind of genomic information relevance to that category.

And so at Optum, uh, within oncology, we tend to work with either NGS, high quality NGS data that's derived from either liquid biopsies or IHC, um, within non-oncology, broad whole genome sequencing. And so being able to take those insights and pair it with EHR and claims very powerful. Um, if you can include the images even more powerful.

And I think what differentiates, I don't think, I would certainly argue, um, that what differentiates our approach at. Is that our linkage truly does incorporate not only the genomic elements, but also the clinical as well as, uh, claims elements. Because if you look like, look at a topic like adherence, um, and so when we're talking about these very highly complex, uh, disease populations, um, the therapeutics available for them if they're available, um, tend to have some pretty significant side effects.

And so when you're assessing the patient journey and trying to use that information, um, to sort support your business, um, being able to understand is this patient population not adherence because of, for example, the efficacy of the drug, the side effect profile of the drug, or, um, the affordability of the drug is a major, major issue.

Um, and so by integrating the claims elements, um, we can differentiate versus your, if you will, your standard clinical genomics asset available marketplace today. That's,

Karim Galil: that's very evident even from the dynamics of the market today. I mean, we're, we're seeing companies like Datavant, which was very focused on the structured data elements, now expanding their business to how can you tokenize, how can you merge unstructured and structured data assets, which few years ago wasn't something that we felt was a priority for them.

Today we're seeing that becoming more and more of a priority. Speaking about, um, genomic data, and I wanna come back to unstructured data in a second here, but, um, I just saw on, on your LinkedIn, you have an interesting, um, webinar tomorrow with Guardant Health. , um, what's happening, What is Optum doing with Garden?

Eze Abosi: Yeah, so it's uh, gonna be a very dynamic conversation about, uh, the relevance of verbal evidence in terms of clinical genomics and how we can support, uh, the life sciences. And so this is, this kind of, this theme has been permeating throughout the entire year, especially given the FDA guidance.

Um, guidance is the key word there on how to leverage real world evidence, um, in your regulatory submissions and broadly in your clinical trial design, optimization, et cetera. And so, um, that's gonna be the focus of the conversation and I think it's gonna be a very lively discussion because we're not only gonna have the kind of a thought leadership from Optum and from a clinical genomic standpoint.

Um, but um, very much looking forward to, uh, to speaking with Nuray Yurt, who is the worldwide, um, artificial intelligence lead for Novartis. And she has historically focused on oncology. So this is a topic that certainly kind of resonates, um, with her workflow. And likewise, Naveen Kumar from Gardens, um, who leads, um, their core business of commercializing a liquid biopsy to ultimately expedites a diagnosis, uh, patients.

Karim Galil: Are we gonna see an Optum slash Guardant Health combined data asset of clinical genomic data?

Eze Abosi: We might. We might. And so I, we, Optum, we partner actively with a variety of different firms in the healthcare ecosystem, but we tend to be very conservative and confidential .

Karim Galil: Hey, I'm trying to get an exclusive for this podcast.

Eze Abosi: Yeah, absolutely. Absolutely. But, but I, I, um, I think the audience can read between the lines. Yeah.

Karim Galil: So, the first attempt to combine clinical and genomic data was Foundation Medicine at Flatiron. So I believe few years ago, three or four years ago, they, they wanted to do like a, the first clinical genomic database, and they did some analysis trying to find patients that both exist in the Flatiron network, but has also taken a foundation medicine testing.

I believe the result was like around lesser than 20,000 patients that they were able to put together, um, which was fairly, uh, surprisingly low number given the, the access that the two companies had. Um, what's your take on that? Like what's the, Like what, again, talking about breadths and depth right. Do you think that every clinical genomic database is gonna suffer from the breadth problem we're gonna see only like few thousands of patients?

Or is there any current attempts that you may or may not disclose that you may or may not be aware of where we're gonna see like a hundred thousand patients or a 200,000 patient cohort where you can see sequenced, structured, unstructured data altogether in one platform?

Eze Abosi: Yeah, so great question.

Especially given that we're sitting in New York City very close to the Flatiron District. The key there though, in terms of, in terms of touching on your, your point about breadth is you have essentially one. Very relevant EMR and one flagship, if you will, genomics firm, um, combining efforts. And so love the innovation, but it's, it, those are, those are, those are two unique entities for pan therapeutic insights. In other words, we don't specialize just in oncology.

Uh, we can give you insights that are relevant to oncology as well as autoimmune, as well as neurology, et cetera. And so that is, um, how we are approaching it. That being said, the end for the oncology relevant population today within our clinical genomics asset exceeds a hundred thousand patients.

Wow. And what's that is compelling breadth, but in terms of depth, what really makes our asset very unique, and I talk about this all the time with our product lead internally, um, is, is the, the availability of sequential genomic analyses. In other words, some patients in our assets, um, have up to 20, um, tests, genomic tests associated with their, with their patient profiles.

And so you can, for example, gauge kind of the initial, um, genomic profile and see how that potentially evolves over time after various lines of therapy are initiated. In other words, did this targeted therapeutic work impressive or did this immunotherapy deliver better results at certain lines across the patient journey?

And so that's sequential analysis and being able to temporally, um, provide insights from genomic as well as EHR and claims based perspective. That's what's really interesting. Secondarily, um, because we're pan therapeutic, we can look at this one particular, I don't know, lung cancer patient with like an exon 20 mutation.

And we can, not only understand their interactions with their oncology specialists or an oncology providers, but we can also, for example, understand their interactions with the different specialties or primary care providers that are also managing them to facilitate a true patient journey across the entire kind of healthcare continuum.

Not just within the silo focus of what the oncologist or the oncology nurse has been kind of associating or interacting with the patients.

Karim Galil: And, and, and for that to happen, you have, like, I'm positive you're not partnering with a diagnostic company, you're partnering with an ecosystem Exactly. Of diagnostic companies.

Um, that's a really impressive, uh, product. There is three elements, uh, any scientist would like to see in data, right depth. They wanna see as much data as possible, like structured on structured clinical images, breadth. The third, which he touched on is longitude, ality. It's like, if you give me great access for a snapshot of a patient, patient is like a book, it's a journey.

It doesn't really affect my analysis limits what I can do. So it seems like you guys are hitting on all those three aspects. It's the important aspect, the breadth aspect, and the depth aspect

Eze Abosi: of this all. I, I would agree. I would agree. And in terms of depth, I have to tell you that the unstructured insights are what really differentiate, um, what I think is the, uh, what makes our data at Optum, um, deeper than your traditional suppliers.

Yes, integrating genomic EHR and claims is extremely helpful. It's even better if I can tell the clients, um, what staging, um, the, the cancer patient's tumor may be at within, given, within any given points or time in their journey. Likewise, and what's becoming incredibly powerful is treatment response.

Especially when you start looking at. Markets or disease categories, um, outside of oncology with an emphasis on autoimmune. In other words, we have this very complex patient for this very complex autoimmune disease. Um, it's incredibly valuable to understand their genomic profile. Um, but why did they not respond to this core biologic that seems to work with most other patients, this population, or why were there side effects, um, so much more acute, um, for this particular genomic profile versus that genomic profile.

And so being able to leverage different techniques to extract, um, that, um, that insight via, for example, NLP, uh, for those unique and very complex populations, that is a key differentiator and an area of focus for our clients moving forward.

Karim Galil: NLP, you, you asked for it. I was planning not to talk about it , um, but.

So, um, what are you seeing there? Like, because obviously with with depth, uh, sorry, with breadth comes the complexity off, you can't rely, again, as I told, like, uh, as I've already mentioned, like I, I don't think a human force, a brute force can, can help, even if you have access to the breadth, can help you actually achieve the wide scale indexing or structuring of the data.

So what's exciting in nlp,

Eze Abosi: um, um, so, uh, specifically in oncology, um, staging biomarkers, tumor response, um, the elements required by the FDA for approval of your medication that are not available in tructure data simply put, um, an autoimmune severity, um, is a very, and also treatment response, which is actually very much in tune, um, with my remarks in oncology.

It basically, it resonates very well with the autoimmune categories. Neurology being able to take insights from the pathology reports. , um, redact appropriately and then extract the relevant insights to understand what the pathologist saw in their reports and how could that can help inform your drug developments.

Um, so for instance, within the case of Alzheimer's and PET scans being able to kind of assess whether it's through structured data, whether it's through real world data, whether it's through your own clinical trial data, being able to assess, um, how, what types of patients may respond to those therapies, uh, based on not only what's written in the unstructured data, but also what is available in the structured elements, and of course what's available in the actual images, a major area for drug development and just adding value moving forward.

And I'll lastly just touch on, uh, for instance, the, uh, the ability to, um, extract the unstructured elements via NLP and, um, use it as part of the evidence that you leverage and, for example, a synthetic control arm supporting your regulatory application. I think what's really interesting, is the amount of data, that you'll ultimately, uncover, with any good NLP solution. And so it becomes incredibly important s to sift through the elements that actually matter to your question. Yeah. Um, and so although it's incredibly, I guess it's helpful to understand that, for example, 10 years ago per this EMR record and the unstructured.

Physician notes this patient was diagnosed with a certain disease and was prescribed Tylenol. Is that really that relevant when you're trying to assess, for example, the triple negative breast cancer patient dynamic?

Karim Galil: So it's more of a targeted extraction task.

Eze Abosi: Exactly. I think in terms of unmet need, that is where there is significant amounts of opportunity, for the industry to take advantage.

Um, how can you help me at scale understand what information abstracted from the NLP technology is relevant to my business question so I can run more efficiently? That is kind of the next, um, the next arena that I think novel technology platforms will be able to support in terms of NLP and the real world data.

Karim Galil: Um, same way that you guys, I'm sure suffer from that statement, data is a commodity, which is a very, again, broad statement. Structured data is a commodity, but nobody's saying that. We, we also in the, in the tech side of things, hear a lot, NLP is a commodity. Um, NLP is a solved problem. Uh, we're gonna use GPT-3 or we're gonna use, uh, transformers, we're gonna use Bert.

So there's like a lot of, existing technologies. Do, do you, do you believe in that? Do you think it's an, it's a commodity? Uh, again, like the purpose of this podcast is to bring in, um, folks who are excited and folks who are not excited about AI. It's not in any, in any shape or form, uh, uh, promotion for Mendel.

So I'm not gonna take it first. I truly wanna know, do you believe that NLP today is a commodity or it remains to be an unsolved problem for the use cases, or to your point, the questions that folks are asking today?

Eze Abosi: Yes. NLP is a commodity.

Karim Galil: Okay,

Eze Abosi: what is not a commodity?

Karim Galil: I'm gonna take it personally. .

Eze Abosi: You asked me the word direct feedback and so what's not a commodity is the NLP frameworks that understand the nuances of healthcare.

More specifically, the NLP frameworks that can understand the life sciences workflows. Uh, a fantastic NLP solution should look incredibly different supporting drug developments versus commercial operations. Interesting. And where I was, I was approached by, I won't say their name and NLP platform today.

And they wanted to basically sell me, frankly sell me, um, a solution that we could install locally within our Optum servers. And it'll help us, you know, derive NLP at scale. Well, that sounds fantastic. Please share with me your experience in life sciences and specific disease categories that you have focused on with this NLP framework?

Well, that's not really how it works. We basically sell you a box and what you do with it is up to you. Okay. No delete .

Karim Galil: It's a very interesting point. So you're basically saying if you look at NLP as, as a generic field, it is a commodity. But if you look at NLP that understands the context of drug development and the context of the endpoints that like pharma would be interested in, that becomes a different story.

Eze Abosi: Absolutely. And so it's, so if, if you, theoretically you being a, a potential partner, if you are trying to position an NLP solution to me, um, that can solve all problems in healthcare. That's a red flag.

Karim Galil: A hundred percent.

Eze Abosi: If you are trying to position an NLP project to me, that can solve all life science workflow.

Uh, needs. That's also red flag because I inherently believe that the approach for just generally supporting discovery, medical and R&D kind of that core kind of medical function, it's, it must be distinct and different than being able to support commercial. And so for, let me give you some tangible examples there.

If, if I am an appropriate NLP solution, um, supporting the commercial workflows, um, the, the thought process within the extraction is going to be how can I take this information and better support my field force or HCP engagement? Um, that is just a much, that's just a much different mindset than being able to leverage that same NLP technology, um, to support discovering of new biomarkers Mm, or new ways.

Where we can capture disease severity or new ways to find misdiagnosed or undiagnosed patients, or patients broadly progressing in their disease. Although it's all still pharma, um, from my perspective, the use cases are distinct enough that it can't be a one size fits all, and I've still yet to discover one particular solution that can appease both.

That's so even just within life sciences, pharma, commercial and pharma R&D,

Karim Galil: That's very true. Um, we were interviewing someone, um, and he's from outside healthcare. He, he never touched healthcare before, but he comes in from the tech world and we were explaining like we're giving a demo of our product, like making sure that he understands how it works.

And he asked us that question, it's like, can the NLP extract anything? And our answer was no. Because what we have seen some data variables, same medications, AI can do better than humans because medications like you have to read a thousand pages record. Humans are like, get tired. So AI will do better, but outcomes, it's hard for both humans and AI.

Three humans will not agree on the outcome of a specific patient after reading the record. So you cannot expect a machine to to, to do that. Um, so we, anyways, we were going through that and he was like, Do you guys have, uh, the same thing like autonomous driving L1, L2, L3, L4's? Like what the heck is that?

I was like, Well, in autonomous driving there's four categories, right? There's the L1, which is like fully autonomous, and then there's the L2, which is like a Tesla where your hands are on the wheel, or I forgot the categorization, but the idea was like going on a spectrum from full autonomous to actually just speed control, right?

And there is many things in between. I believe the mistake that is happening in health tech today is packaging NLP as a Google driverless car, fully autonomous, your hands are off the wheel. We got you. Whether you are in Egypt or Nigeria or in New York or California, this car is gonna be fully autonomous.

Where the reality of the matter is, it's somewhere in between. It, it, it really depends on the use case. It depends to your point, even on the therapeutic area. And it, there is a lot of variabilities on how autonomous you can get with an NLP solution. And the more realistic we get about it, I think the better we're gonna be able to tailor solutions and, and, and use NLP as an augmentation tool rather than an L1 rather than a fully autonomous, your hands are off the wheel.

We got you kind of an approach.

Eze Abosi: Yeah, I completely agree. Which actually sparks the thought from my perspective. Um, how in or how interesting would it be? And who will be the first NLP solution for the life sciences, if not just medical R&D, um, that can accommodate multiple languages at scale. Um, I think it makes, it's a very kind of rational approach to, maybe it's just my very myopic western mindset to focus on the English language.

Yeah. Um, at this stage in this particular type of products developments. Um, but it begs a very interesting question that you just touched on as you kind of think about kind of the global rollout of artificial intelligence or basically smarters, uh, that can drive themselves. I just, um, just the nuances in healthcare, although like guidelines generally maybe consistent across, across, um, across cultures.

Um, there, there, there are many, many nuances. Um, especially by language. Especially by dialect. I have a certain language.

Karim Galil: I practiced medicine in Egypt and I, I wrote medical records in English, but in Egypt, I can tell you the way I would approach a patient or the way I would describe the context of a patient is different than what I would do here in the us.

The drugs that we can prescribe there is different. The healthcare system there is different. Um, I wanna make sure the patient get covered. Yeah. So like it's, everything is different. Uh, one of our competitors have, on their website, we support five languages and it was like, maybe you support five languages, but you don't support five different contexts Exactly.

Of how to practice medicine. Right. it is just different. Um, That's actually a very interesting point.

Eze Abosi: Yeah. It actually raises another point that I wanted to share, which is, there's some really unique categories from data, from a data science perspective within neurology because a number of these neurological conditions, especially for the elderly, are literally life changing because they'll not only formally confirm you have a certain disease, but your life will be changed dramatically.

Dementia. Alzheimer's multiple sclerosis, Parkinson's. it's an unwritten rule from my understanding that essentially providers will do everything in their power to elongate the diagnostic journey for those patients. Because once you're formally, diagnosed with Alzheimer's, you ne you can no longer retire.

For example, in a long-term care facility, it must be a facility that accommodates dementia patients. And so there are, there are many kind of ad hoc NLP exercises, um, that I have been a part of throughout my career where we're basically looking for signals of individuals that essentially look like they have Alzheimer's, but have yet to be formally diagnosed.

And this has become incredibly, uh, pertinent as again, the industry looks to specialize the disease in therapeutic developments and. More so focus on tailoring therapies for the early onset neurological patients rather than those that are severe and basically in the later stages of their disease.

Karim Galil: That's very interesting.

Eze Abosi: Yeah, it's very interesting from an NLP perspective, and today, I don't think there's a clear winner in that arena since from my understanding, um, natural language processing tends to, tends to be really focused on not just oncology, but specifically solid tumors. It's another kind of unmet need, um, that I think an inva an innovative entrepreneur will ultimately solve within that very unique category of neurology.

Karim Galil: I can tell you why, uh, most of, most of the AI approach, so a AI and machine learning are used. Synonymously, right? Like people think when you say machine learning, they think it's AI and AI is machine learning, which is not really true. Machine learning is a subset of AI that, and most of of it, relies on statistical modeling.

So you wanna give a lot of data, then the machine learns from this data, certain rules and, and, and things, right? Oncology tends to be the one where there's a lot of data out there. Relatively a lot of data compared to other therapeutic areas where you can. So it becomes a lower hanging fruit for any NLP company to build a statistical model on top of it because you have access to the data, right?

At Mendel of the way we solve that is we, we do statistical model. But half of our platform is not reliant on statistical modeling because simply you don't have enough data out there. My co-founder always makes a joke. He says, If we ask an AI system today, what's the best treatment for something?

Probably it's gonna say Opioids. And that's not The treatment is just, you know, because of the Opioid crisis, learning from the behaviors of humans, AI is gonna learn something that is not actually clinically correct, it just because it's existent in the data at scale. So how can you build an AI system that is not just relying on statistics, but also relying on what is true and what's false from a clinical perspective, from a nuanced clinical perspective becomes a tough problem.

Eze Abosi: Yeah, I completely agree. I would take that same exact mindset and apply it to ultra rare disease where the diagnostic odyssey is often transpiring over 6, 8, 10 plus years. And so, how can you train a model, for population that really, there's not a volume, a lot of volume that exists and to start off with, and the volume that does exist is likely going to be relevant to patients that are undiagnosed.

It's a very complex problem, but that's why we have individuals like you and your team, thank you solving these complex problems for us. In the life sciences,

Karim Galil: The answer relies in, in symbolic AI. And if you actually go to Google Scholar and search machine learning, you're gonna find papers published as, as like yesterday, right?

Like a lot of papers, if you put symbolic AI, probably the last paper published was like late nineties. It's an approach that nobody's using anymore, just because it's not trending, it's not, it's not sexy enough. And, uh, that's the problem when you approach, you know, like if, if you're trying to build a drill without looking at the hole that the customer's trying to actually use the drill for, you end up with single platformed.

I wanna be conscious of your time. Super exciting work that, that you guys are doing on, again, like getting clinical data with unstructured, with structured, The more exciting thing is the fact that you're opening, you have a network effect that a lot of companies don't have and you are opening that network effect for innovation and you're basically getting everyone to participate so that the end user gets the highest value.

And that's what we are seeing as a winning solution today, right? Like we're seeing, um, had iPhone not open the platform for others to build apps or Android, they would have been a Blackberry, right? And healthcare has a lot of Blackberries and it's refreshing to see a company like you, moving towards being the iPhone or the Android of healthcare because at the end of the day, the industry that we're in is touching patient.

I'm sure when you see a PubMed paper published on top of an Optum data asset, it just, you feel like you're your job worthwhile, right?

Eze Abosi: Really just happy, to have the opportunity to kind of, to serve our clients and all of our key stakeholders and really is a privilege. Um, can I just, before we wrap up, I want, I want to ask you just one key question, Karim.

Which is that, diversity, equity and inclusion has just been a major topic, resonating throughout our society here in the US as well as just globally, more so than most throughout the pandemic and as we come outta the pandemic. And that has been certainly pertinent as well to, to healthcare.

And one of the key focuses of our clients moving forward, is ensuring that during the clinical development, the clinical trial process, they have a population that represents the actual potential users of that potential investigational drug. So circling back to your team at Mendel, really curious cause a great way to, from a data science perspective, just to understand the relevance of like health equity and diversity is to look at SDOH or social determinants of health.

And so it, are there any novel approaches, methods et cetera that you can comment on, on how you can extract SDOH variables, using NLP. Since that is an, along with the genotypic and phenotypic insights, that is another data type that we're laser focused on at Optum.

Karim Galil: That's an awesome question. so I, think the future is not gonna be clinical data. It's not gonna be clinical genomic, it's gonna be social clinical genomic data. I, that's actually the reality of it. And there's two, there's two answers for that.

One is obviously in NLP and AI in general, trying to extract certain data variables like ethnicity, and that's usually tough. It, it's, it's not easy. And I can tell you why, like, usually ethnicity exists. in a checkbox that the patient fills in, if any, when they come in and machines detecting a checkbox today is a very tough problem.

It's, it's a really tough problem. You need to have top-notch OCR. You need to have top-notch form detect. There are companies built just on form detection, so ethnicity is an endpoint that remains to exist, unfortunately, into forms and get, getting those out is not easy. But we we're doing a lot of work on that.

The approaches that I'm seeing that I'm personally very excited about, uh, are actually outside the NLP. We're seeing folks getting grocery data and trying to understand from your shopping list. Like if you say get your Safeway data, you can understand how much the patient actually makes, right? If this patient is buying expensive brands or buying cheap brands, if this patient is buying every week or every two weeks. Is there malnutrition? What the quality of the things that they're buying? So getting data from Costco, Safeway and actually marrying it to the EMR data and to the genomic data is pretty impressive. We're seeing even folks who are getting water quality from the zip code that the patient exists in and marrying it to it.

There is a lot of data variables and parameters that we're seeing some of our clients adding it there, but unfortunately not all of them are easily available. But to your point, definitely a move that's happening. The other way of looking at it also is if you start looking at it from an international, to your point earlier, right?

A lot of the data, what you have today is based on white male Caucasians in West Europe or the us, but then take that same drug and go to Egypt, for example, right? It's a whole different story. you raised the point about affordability. Maybe it's a great drug, but nobody can afford It's a hundred thousand dollars drug.

So Aldi was a great example for that. A hundred thousand dollars drug, great for hepatitis C, but no Egyptian have a hundred thousand dollars to spend on a drug and they had to work to, to make this happen. And those elements are still yet not existent in the data. And um, I'm excited to see that this is becoming more a topic that folks are talking about it today.

Eze Abosi: Absolutely. Yeah. Great answer.

Karim Galil: But it's interesting, that every time someone is taking a Pfizer vaccine, Covid vaccine, there is a data element in it that is coming in from Optum. And here is what's exciting. I mean, I'm all for privacy and, and all of that for the record. I'm all for privacy. Yet, when you look at the grand benefit of the population, right, does it really worth prohibiting data sharing for all the concerns about privacy when you can truly, truly save lives if you integrate those data into the day-to-day practice of drug development or commercialization?

I don't know the answer is, uh, is more philosophical than just the 99% accuracy that HIPAA is asking for.

Eze Abosi: Agreed.

Karim Galil: Yeah. Hey, Eze, thank you so much for taking the time, time passed by. It was a great discussion. we touched on a lot of different things. And again, thank you for making the trip and, it's time for us to get some dinners.

So, we're gonna have to wrap up. Any final thoughts or comments?

Eze Abosi: So thank you. Thank you to the Mendel team. This was truly a pleasure truly an organic and just free flowing discussion. Thank you for having me. Looking forward to just, uh, keeping the, uh, the lines of communication open and hopefully we can further collaborate moving forward. Thank you again. Appreciate it.

Karim Galil: Thank you. Today's episode was recorded by Benny Pham. Thank you, Benny. He's behind the camera and yeah, we'll see you in the next episode. Bye.

PODCAST — 45 minutes

Daniel Ciocîrlan, Software Engineer, Founder, and Instructor at Rock the JVM on Patientless Podcast #009

Daniel Ciocîrlan is a Software Engineer, founder, and instructor at Rock the JVM. Watch the full Youtube video here: https://youtu.be/PUMCzgK02p8

Listen and read →

Daniel Ciocîrlan, Software Engineer, Founder, and Instructor at Rock the JVM on Patientless Podcast #009

Karim Galil: Welcome to the Patientless Podcast. We discuss the good, the bad and the ugly about real world data and AI in clinical research. This is your host, Karim Galil, Cofounder and CEO of Mendel AI. I invite key thought leaders across the broad spectrum of believers and descenders of AI to share their experiences with actual AI and real world data initiatives.

Karim Galil: Hi everyone, welcome for another episode of our podcast. Today we have a technical guest. We were just chatting before the podcast, he actually teaches programming languages, specifically Scala. I was like a lot of our audiences never wrote code and he said okay, I’ll do my best. So we’ll try but before we get started, Daniel can you please start by introducing yourself?

Daniel Ciocîrlan: Thank you for having me first, I'm honored to be here. My name is Daniel Ciocîrlan. I am a developer software engineer and a teacher. I teach in esoteric programming language called Scala and a variety of tools and technologies based on Scala that are widely used for distributed computing and functional programming and high load systems.

And I also teach kids to code. So I am an software engineer and a teacher at heart. This is what occupies pretty much all my time now.

Karim Galil: But I see a guitar in the background. So it seems like there is also some music

Daniel Ciocîrlan: Yes

Karim Galil: in your day to day.

Daniel Ciocîrlan: I do play some guitar. I play an absurd amount of Ed Sheeran on, on guitar. I have a training platform called rock the JVM, which teaches everything in the Scala ecosystem and the naming is not coincidental. So it has some, it has something to do with with music. Yeah.

Karim Galil: with the guitar. sweet. So first of all, I'm interested, like I was looking at your profile and usually like someone who writes Scala today is, is, is a rare commodity, like hiring scholar. Engineers is super hard, like hiring good Scala engineers. Someone who teaches Scala becomes even a more rare commodity. You probably would be making a lot of money in the industry and yet you chose to go down the route of teaching kids and teaching others how to write Scala. So where's the inspiration coming from?

Daniel Ciocîrlan: Well, first of all, I started Scala where I started my journey with scholar and functional programming out of pure intellectual curiosity. I was writing mostly Java or even other front-end bits while I was employed. But in my spare time, I used to dive and play with this concepts with these concepts in functional programming and in the scholar language, which I found pretty. Pretty damn beautiful. It's the closest thing to a beautiful language that I've seen in my programming experience and the fact that it's also so powerful and that it allows you to encapsulate very abstract mathematical concepts. This just was pure intellectual pleasure for me. So I spent most of my time, most of my free time studying this and playing with it. And I also have a passion for teaching, which dates are almost 10 years now. And I wanted to combine the two because I was so passionate about this Scala and functional programming thing. And I was also passionate for teaching and for the kind of feeling that I can create in my students, when I explain a complex thing, I wanted to merge these two into what became rock the JVM. And apparently people like my work and they like my teaching style. And so I've been doing this for a while.

Karim Galil: So you said that you're teaching kids, Scala

Daniel Ciocîrlan: I'm teaching kids something else. No, not Scala

Karim Galil: so, so, what are you teaching kids?

Daniel Ciocîrlan: I am teaching kids to code with a programming tool called scratch. I'm pretty sure you've heard of it. It's this tool created by MIT, which is completely free and this allows kids to, or pretty much anyone, but it's targeted at kids and young adults to create the sort of games and applications and stories and interactions based on tools that look like legal pieces that you stitch together and you create these scripts that do something and you learn core programming concepts by putting these together and creating ever complex animations or games or applications or drawing things. So, kids learn by intuition and this is how I like to teach kids to code. And they have a lot of fun playing with the games that they made with their own two hands, which is by far, the most meaningful.

Karim Galil: Historically we, we had to learn like a language or two in school. Like two, like spoken languages, right? Like say English and French or English and German. Do you think it's like, we're heading in a future where everyone will have to have to understand some basics of coding actually as, as a way of expressing thoughts and ideas.

Daniel Ciocîrlan: I don't think computers are gonna go away anytime soon and quite the opposite. I think we are going to merge more and more with technology and technology is gonna get ever closer to our lives. And it's crucial to understand what technology is, what technology can and cannot do and how it works under the hood and I think the more people at least understand if not produce something with it. The better it is for all of us, because we have a deep understanding of the tools that we use instead of them using us without having any sort of awareness about him. them.

Karim Galil: That's an awesome point. Now that's the whole point of the podcast is like, you don't have to write code. You don't have to be an AI scientist, but you should have some sort of an understanding of how those things work at a very high level. So you're able to make like judgment calls, whether it's a good use case for, for whatever you have or whether it's a good vendor or whether it's a good path to take. So who's harder to teach kids or adults. I mean, I'm assuming you're teaching on different age groups.

Daniel Ciocîrlan: There's no one group that is easier, or harder to teach than the others. They're so different that they're almost world apart. I mean, kids learn by intuition and kids learn by immediate feedback. So if they stick a new code block in, into their scripts and they see the cat moving 10 times in a, in a circle, that's something that immediate feedback that. That they produce. And they now learn that the repeat block does something 10 times. And when you click on it, it programs the cat to do a certain thing. Whereas the experienced engineers that I teach scholar and functional programming, they learn a completely different style. So I have to deconstruct a very complex topic into its constituent parts and then select the most important bits and then sequence them along the way so that the learning curve is as sequential as possible, as smooth as possible because complex software requires complex understanding and therefore the kind of material that I produce needs to lower that barrier as much as possible and this is what I, this is what I wanna do in my R the JVM classes with that's to teach this sort of complexity that a language like scholar and technology is based on scholar can involve and make them accessible so that every developer simply can write. Three lines of code and everything happens by magic, but they deeply understand what those, those three lines do.

Karim Galil: Got it. So let's take let's use this methodology to explain to our audience like the methodology of breaking things into small components and assembling them together again, what is the difference in programming languages? Like why is it different thing to use Java versus scholar versus call versus any other type of programming language aren't at the end? Like the question folks usually have at the end of the day, when I click a button, the same action will happen. So why does it matter what programming language we're using? So I think that's the first question, like why it matters. And the second is like, what are the types of, of programming languages out there? You talked about functional programming, but what are the others and what are the differences between these things?

Daniel Ciocîrlan: Yeah. So what is a programming language? A programming language is a way to translate human thought into something that computers can execute and programming languages were invented so that we can write programs more easily because computers are essentially machines that you can plug in some bits, which are essentially currents and a processor. And the processor does something as a result it's completely predictable and it will be completely feasible to build the kind of application for podcasting that we're using right now, just by that, by putting bits in processor as manually one by one. But it would take us a million years. And so we invented some more intricate tools that allow us to build more complex applications faster and above machine code that there's assembly code, which is then broken down into the sort of instructions that computer can execute. And based on that on top of that assembly language, then we have some other more complex programming languages that compile to the sort of bites that the computer can execute. But it's easier for us as programmers to write a few lines of code instead of millions of assembly line code to do the same thing. And so languages like C or recently, C++ with its never ending stream of improvements was needed because complex software requires complex thought and the kind of approach that our thoughts needed to be broken down into bits that the computer can execute is just too tedious for us humans to accept. And so we wrote ever more complex programs to be able to express our thoughts into something that the computer can execute. Then we have these higher level languages. So on top of assembly we have compiled languages like C, then we have interpreted languages like Python. Then we have Java, which is a completely different paradigm. And then we have these higher level languages like Scala or Haskel or things of that nature. So one classification or one difference between programming languages is whether they they're compiled or they're interpreted. What does that mean? A compiled language means that you write a piece of text, which is the programming code. What is code is just text you write you type it on a keyboard. It's nothing but text, but there is a, a very smart program known as a compiler. To turn that piece of text into the stream of bits that the computer can understand and execute. And the compiler is what does most of the work and that compiler is built by years of iteration and so on and so forth. So that we as programmers, just write the, the text described by the syntax of the programming language, which is the easy part. And the compiler does the heavy work of turning that into something that the computer can understand. So that's the compiled, the compiled stuff. Then there's the interpreted languages such as Python, whereby you write some code that is not turned into computer bits by taking the source file and then turning that into an executable. But rather there's another smart program known as an interpreter that traverses the lines of code that you've written and executes those one by one. It just parses them incrementally. So for a language like Python, you can have a syntax error line, let's say 54, but in the first 53 lines of code. The Python interpreter just executes everything until it gets into some sort of error.

Karim Galil: Error

Daniel Ciocîrlan: Right.

Karim Galil: Versus in the compiled languages. If you have an error, you have an error. It's not gonna compile.

Daniel Ciocîrlan: Exactly. So in a compiled language, the executable cannot be created until the program is correctly written, following the exact rules of the programming language. So that's the one difference. Another difference is more philosophically. And I think this is the bigger, fundamental difference between programming languages. And that is the style of thinking that is applied to those programming languages. This is why we have imperative languages, such as instruction based things like C or C++, or Python or C# or Java. And there are obviously differences between all of them. After that we have other styles of thinking such as functional programming, which is like Haskel or Lisp or Closure or Scala or other programming languages that think differently. They allow you the programmer to think in different terms that is approach or translate your thoughts into code in a different way. It's like speaking another language and there are other paradigms. There are very esoteric languages, for example, Prolog or CLIPS or other programming languages that think completely differently. Like you write three lines of code and the computer tries to validate all possible solutions to a constraint satisfaction problem. So for example, if you write five lines of Prolog, the computer will try all possible solutions to the kind of restrictions that you specify. So Prolog is a language where you just write restrictions and the compiler just works around those restrictions. And the answer to those restrictions is the solution to your problem, for instance. So the style of thinking is completely different, and this is why Scala is such an amazing thing for me because the style of thinking was very much resembling my own in terms of a mathematical approach. So Scala and functional programming in general is very mathematical in nature. And because I have a background in physics right before I started learning computer science this was so fit for me because I only think in terms of expressions that needed to be evaluated. And then the run time just does its job to, compute those for me. And in three lines of code, as you've probably seen in your, company you can do a lot and this makes language like Scala extremely powerful.

Karim Galil: Let's dig into that a little bit. First of all, super helpful classification of how things work. So the story I was sharing with you before we started the podcast was like we had something written into Java into like hundreds of lines of code. And then we hire an engineer who loves Scala and he spends few weeks comes back, shows me like, hey, you see those three lines. They can do exactly what those hundreds of lines of code can do, this is Scala. This was my introduction to what Scala is right. Is like, it's the language where you can write in three lines, something that requires hundreds. The question that I don't understand till today is why, like, for example, right, in my mind, if I'm speaking, let's say English and I wanna write an essay or something around the weather and I write it in 10 lines and then I choose to write it in French, I'll probably still use 10 lines, but the idea of trimming down hundreds of lines into three is, is something that I still cannot conceive or understand.

Daniel Ciocîrlan: Yeah, that's a, that's a really good point. And this is where the, the analogy between programing languages and human languages start starts to break down because human languages are more or less equivalent. And there are fundamentally different languages, for example, Japanese versus English or European languages versus Chinese or Japanese. In the realm of European languages, you have more or less the same kind of structure. The words are different. Grammar are slightly different. You get the same kind of conceptual approach to how you can express your thoughts? Well, in programming languages, there is a possibility that you can express in a programming language, an extremely short code, what you can do in another programming language isn't like 10 times as much. And the reason is not just the structure of the language itself, but rather how you can organize your thoughts so that at the end of the day, you as programmer, you can write in a few lines, some functions or some values, or some expressions that behind the scenes involve a lot of code that do involve a lot of code. So there's no way around it. The computer will do the same thing when the computer starts to evaluate that instruction by instruction.

Karim Galil: Are you basically abstracting code into functions?

Daniel Ciocîrlan: Yes. Yeah. So you abstract complexity away into a variety of concepts, not just functions. Maybe you abstract them away in like, I dunno, operators or interfaces of various types, depending on the kind of language that you use. But the concept of a abstraction is possible in programming to almost an unlimited amount, which is what allows us to write in such short code. What, in other languages like Java, for instance, you would need to write a lot to accomplish the same thing. So it's this abstraction thing that makes programming languages in general very, very powerful. Now, obviously people can probably abstract away those hundreds of lines of Java into let's say 10 lines of Java. There's always that possibility, but the way that Scala approaches the problems, the first place makes it more susceptible or more has higher chances of having this sort of compaction that you describe

Karim Galil: So why is it called functional programming? Like what, where does that come from the functionality in, in description of Scala.

Daniel Ciocîrlan: Before programming languages were invented Alan Turing and Alonzo Church were the pioneers of the mathematical description of a computation. So these folks formally described what it meant to be computable. What means to have something that can be calculated is two plus three computable. Yes, it is. Well, how do you compute two plus three? Well, you can evaluate it at five, which is how math arithmetic works, but you can also compute two plus three by starting from two and counting by one, three times. So the fundamental difference of what it means to be computable or rather the equivalence of that is that the same expression or the same result can be computed in two ways in multiple ways. But for these folks in two different ways, Alan Turing came with a description of an algorithm, which is the sequence of steps that you need to reach your result. For example, two plus three is counting from 2, 3, 4, and 5. So you follow an algorithm like a process, like a step by step thing in order to get to your result. And Alonzo Church had a completely different approach, which is reduced expressions with their values. And this is fundamentally different mathematics. And these folks demonstrated about a hundred years ago that these two forms of computations are equivalent. So mathematically they're equivalent in the sense that a fictitious computer or a fictitious machine that would evaluate such an algorithm would take roughly the same amount of time as a fictitious computer evaluating expressions in Alonzo Church definition. So we say that in mathematical terms, these two have equivalent complexities and the mathematical models of computations are equivalent because any computation described as an algorithm can be expressed as an expression and vice versa. And Alan Turing's approach was more geared towards machines that could program step by step. So that compute numerical things step by step in algorithms. That's what his mathematical model was about. And it just turned out that computers are easier to build that way, but theoretically, at least Alonzo Church's model is just as powerful. Because it allows us to build expressions, like mathematical structures that can be evaluated to a value. And from a Alonzo Church's theory, functional programming evolved. So from Turing's approach, we have imperative programming, like step by step things from which our regular computers are built. And then from Church's approach, we have functional programming, which is thinking in terms of expressions and functions that you can pass around as the function in functional programming.

Karim Galil: Interesting. So do you need to be good in math? I mean, you need to be good in math or have some math competency to be an efficient programmer, but do you need to be relatively good at math compared to others to be a good Scala engineer?

Daniel Ciocîrlan: It's an interesting question. I think just to clear any doubt, having good skills in math doesn't hurt anyway. So. The, the more skills in math, you have the, the better usually in scholar, depending on the kind of abstractions that you want to use, certain mathematical things are probably needed or helpful. If you are a heavy functional user, which Scala is very well known for, you need a good understanding of what those abstractions mean in order to navigate around them just as you need some good algorithmic preparation to study any kind of regular code, like in Java or C or Python. We learned algorithms in school before we before we released in the world as engineers. In Scala especially if you want to do some heavier functional programming or very high level abstractions a decent, at least an interest in kind of abstractions that functional programming involves will be very, very useful.

Karim Galil: So why is it that Scala is not as predominant as Java or Python? That's one question, but the other is like, why is it the higher density of Scala engineers are Europe based compared to the US or at least that's the myth that I've heard. It's like, there is more Scala engineers in Europe per capita, basically compared to in the us.

Daniel Ciocîrlan: Okay. Let me take this, take this in turn. The popularity of Scala I think the popularity of Scala is quite minor in the JVM world and in the programming sphere in general because of its apparent complexity. Scala can be intimidated to beginners, especially if you have the kind of hacker mentality. That you wanna get something done very quickly, like you can do in Python. Scott's not very friendly with that because in order to write a good Scala program, you need a bunch of foundations in place and you need some sort of patience to understand what Scala concepts can do. Now, once you've crossed that barrier, everything is easy. And then you can write three lines of code that do the same thing as hundreds of lines as you described. But there's a, there's a mental barrier that you need to cross in order to become a productive Scala developer. And I'm pretty sure, or this is my hunch at least judging by my students is that Scala can seem quite intimidating for this reason. And this is why I think people come to me to teach them because in my, in the learning curve that I try to create. They find that Scala is not that intimidating. If you take it the right way. So just like regular programming. If you read a hundred lines of Python, if you've never heard Python before, of course, those are intimidating as well. So you need a bunch of foundations, but the trouble is that once you've learned a programming language like Java or Python, I don't know why, but I find that many people, I, I don't even wanna say most people, but many people get into the mindset of, oh, I know how to code. I don't need anything else. I know how to think code, but then you see something like Scala and it looks on a little bit alien and you go like, I don't wanna learn this thing. This doesn't look like code, or this is too hard. I know how to code. I'll use my own tool. I think one of the reasons why Scala has a pretty minor proportion of programmers. Now to answer your second question, I have no idea about stats. I don't even know how well those are distributed among the US or Europe. So I, I can't really answer that one cause I don't have the data.

Karim Galil: We have an interesting social experiment in, in, in Mandel. So as I told you, like, we, we weren't a scholarship early on and we, we basically build a lot of AI. So we have a couple of teams, an R&D team that basically is in the business of getting the model right. And then we have an engineering team that's in the business of taking that good model and actually making it a product that's scalable, repeatable and so on. And the engineering team, like the AI team is heavy on Python as every other AI team and is heavy on Java. And the engineering team was all converted into Scala and then they had this like interesting meeting where like, engineering was preaching. Like you guys need to start using Scala. Your life is gonna be easier. Our life would be easier. And they offered to teach them out of 18 engineers, only two opted into the Scala like course, and now they don't want to go back Java. They don't wanna do any other thing except Scala but to your point about the barrier of entry, it wasn't that hard. It's just like the decision to get, to know how to do it was the hard one, two out of like 18 did it. And when they did it, it became more efficient, faster. And I think it's not only about writing lesser lines of code, it's just easier to debug your code. Like whenever there is a something that's not working as, as you want it to work, rather than looking in a hundred lines, you know, you have only three that, that, that you need to to debug. Cool. So let's switch gears here. One other thing, why people who are heavy on the AI side prefer Python and kind of sway away from Scala. There seems to be like a strong disagreement or a, strong lack of desire to use specifically in the AI community.

Daniel Ciocîrlan: I can see a couple of reasons. When, when you start working in AI, you look. Not necessarily a programming language, but you look for tools that could implement what the theories that you want to bring into the world. And the programming language is not that important. At least in my case, I also played with a bit with, with some AI models. And when I looked for tools, all of them were in Python. The most popular ones were in Python. They were very well maintained. They had good documentation. They got results really quickly. So the incentive is very small to use another programming language and have whole bunch of barriers to the kind of things that you want to build. Because when you're an AI programmer, you're not the regular software engineer that builds a web application, but rather you wanna test a model into the world. So you want, you're more mathematically inclined. You want something that could embody that math into something that computer can execute. And the Python tools that we have that we currently have are the most appealing because they seem to do the job. The quickest and Python is an easy language to learn. And the barrier to entry, as we described as probably double in the field of AI or for the purpose of AI compared to regular software engineering, because the, at least the, the AI programmers that I know I can, I cannot speak for, for the entire field as I'm not that intimately familiar with it. The AI programmers that I know just want their models out. So as an AI program, or you're interested in a model more than just the more than the code itself, so you don't need high level functional abstractions or, or, or things of that nature you'd want, you want the model plug it in, hit play. Just let the, let the GPU learn the, learn the weights of your model and then just have it run predictions. So I think the the kind of habits that a language like Python would involve make it easy for Python to be the, the language of choice for, for AI. I think in terms of James Clear's Atomic Habits I think the kind of make it obvious and make it easy bits are very easily enabled by Python

Karim Galil: Yeah. The downstream of that from a business perspective, like from my position, what I look at is at the end of the day it's all about making sure that my clients are happy by getting a product that is high quality, repeatable and scalable. And even though when you have the mentality of like, I want my model to work, which I agree, like you're in, you're in experimentation. You wanna focus on things to work, not how to scale them, but as a business, you start suffering downstream. And what we have done here. We started this team of rewriting things from Python into Scala or Java, but mainly Scala after that R&D mode. The challenge we're facing though is like, to your point about abstraction, the one who thought of the model is the one who most well positioned to abstract his thoughts into code. But now you're getting an engineer who haven't have spent as much time thinking about the model and is required to abstract those from Python lines. And, and, and talking to other like folks, it seems like it's a, it's an industry wide problem. And one I don't know, happy to see, to hear your thoughts about that, if any. But one thing that we're trying here is to have an engineer shadow the AI scientist early on in the experimentation, just from like, Hey, what's going on here? How are you thinking of the model? So when the time comes and they have to do a switch, it becomes an easier thing. But I don't know if you have seen other practices or, or other ways of doing it.

Daniel Ciocîrlan: I was just about to suggest something similar. I think across skilling of your AI modeling team versus your software engineering team would probably do wonders for both because the AI team will now understand software engineering and product, and the software engineers will understand models and the closer they can understand each other's business, the higher quality code you'll push. Less friction and probably more scalable and with more productivity in the end, because if one side of the development team or of the R&D is thinking a certain way, and another side is thinking differently and side number two, which is the software engineering part has to push the product. There is certain degree of friction or, or at least this is how I anticipated it. There is a certain degree of friction to rethink what the first site has written into something that's more scalable and works well from a software engineering perspective.

Karim Galil: So, as I. told you earlier, like last couple of weeks ago we interviewed Leslie Lampert. And we, we asked him a lot of questions around AI and his answer was always I don't comment on things I don't know. And the levels, the level of humbleness there was pretty impressive because in healthcare we see the opposite. People who have like very little experience are making bold claims and comments around AI in general. And now talking to you, you, I'm, I'm seeing the same theme. Like you're like, I don't know a lot about AI. I have some friends and I played with some models, but I'm not as intrigued as into the community as others. So I'm seeing a theme where like true engineers tend to make the, the, the differentiation or the, the, the, the distinction between engineering and AI research that's happening these days. But here's the thing in healthcare. It's not the case in healthcare. If you are a software engineer, it means you can write AI or you can, you're qualified to be an AI engineer. Those two terms are used simultaneously basically. In your experience or in the way that you break down things, what is it that the difference like what is the difference between AI and engineering, even though they're very closed fields? Two, what is. the difference in the engineer? Like the R&D engineer versus the production or product ready kind of engineers.

Daniel Ciocîrlan: I would like to, to ask you some clarifying questions about the second bit, but let me take a stab at the first. So you're asking about what the difference is between an AI engineer and a software engineer or a product engineer.

Karim Galil: a software, like writing a piece of software, right? Like, let's say writing whatever, a piece of software that allows you to do trading, market trading versus building a model that can predict that next prize for that same stock trading. Those are two different sets of skills and two different schools of thinking. Right. So what, what, what is the distinction between those two things? And what's the distinction between the engineers or the scientists working like from a qualities perspective? In each one of them.

Daniel Ciocîrlan: well, I think it pointed out quite well. I have a, to give another another analogy here. I have a, a very good friend working in astrophysics and she's a very well respected astrophysicist now in university of Toronto. And she does research on dark matter and cosmic dust and she pretty much does mathematical models, but she also needs to know how to code because she needs to validate those models on data. And so essential software engineering skills. I shouldn't say software engineering skills, but at least programming skills to know what works and what doesn't, how to structure your code, how to think properly so that you don't refer to your code and just make a mess, a mental mess out of your code. At least those are essential skills for her to do a good job. And I think you've pointed out just right. I think that an, an AI developer would be equivalent to somebody building a model for just about any other kind of business where trading, like you said, I also have some friends working from the old days in the physics Olympias that are now mathematical modelers for, for trading companies. But they're not software engineers. They just know basic code just to put proof of concept, but then the software engineers have to then understand what they've done. Push a product that could scale that could sustain high loads, that would be fall tolerant that would minimize failures and impact for, for the company and so on and so forth. So I think the analogy is pretty similar in this case.

Karim Galil: Sweet. Now one other term that's being used lately a lot. Akka so what is Akka and why it's always within the Scala community, a more frequent thing to talk about.

Daniel Ciocîrlan: Akka, Akka is a toolkit that is a set of libraries used for distributed systems. Akka allows building a distributed system in very few lines of code compared to other tools or equivalent things in other programming languages with a high degree of fall tolerance, scalability, resistance to errors, elasticity in terms of high and low load and so on and so forth. And Akka solves a bunch of distributed systems problems, very, very well. And the Scala API, which is the language that Akka has built in allows you to write very few lines of code that could do all those automatically. The way that Akka is structured is by thinking your code or thinking your product or software in terms of independent entities known as actors that could work not just by. Using their internal state or using their internal data or mechanism like you would do in regular programming languages. For example, in Java, there is this concept of encapsulation. For example, if you have a piece of data, a data structure, which has some fields and some functionality you would call or invoke those fields, or some function that functionality by calling their methods. For instance, in Akka that's not possible because these actors are completely encapsulated in the sense that you can only send messages to them and they will reply, or they will act upon those messages asynchronously at a later point, you don't, you don't control when those messages will be treated pretty much like human interaction, because just to give an analogy. Let's say that I'm interested in going to a concert. I play at an absurd amount of it, cheering on the guitar. And so I'm interested in learning when the next Ed Sheeran concert is gonna be in Brucharest, Romania, which is where I'm talking from right now. And assume that I don't know that, but I have a friend who's more of a fan of Ed Sheeran than I am. And so I'm going to go ask my friend, Hey Alex, when is Ed Sheeran going to play next in Brucharest and Alex might reply to me back. If, if we're talking and Alex might say it's going to be on November 17 or, or something like that. Notice the interaction. I'm asking a question. It takes a little bit of time for that question to be registered. She'll have to fetch the memory and then reply back. This is the normal human interaction. I can't poke into the brain of my friend to fetch that information and retrieve it back into my own. This is what regular programs do, by the way. So. takes physical time for a message to travel. And this is the, the kind of normal interaction that would happen in between independent entities. This is what object oriented programming was meant to be by the way, spending several decades back by its original creator. So objects were intended to be these dependent entities with whom you can only communicate by sending messages. And just to keep the analogy with my friend, answering my question, it takes physical time for a message to travel, to be registered, and then a reply to be sent back. And my friend, Alex might not even reply in the same at the same time, or even in the same context. For example, I call her, hey, Alex when is Ed Sheeran gonna play next in Brucharest? And she'll say, I don't know, I'm gonna get back to you. And it takes like three hours and then she'll send me a text message instead. So notice that the, the, the scenario, the, the context was different. She didn't reply back by the same means, but she sent me a text message instead. So notice that this message exchange might be in a different context or in different style than the the original request. So this is how Akka works, and there's a bunch of software that was built on this principle. And Akka is one of the most powerful toolkits in the Scala ecosystem. Now recently Lightben, the company behind Akka has decided to change its licensing model for Akka to something that's not open source anymore. This cause a stir in the Scala community and time will tell if this move was worth it. But there is some discussion happening about the future of Akka.

Karim Galil: Interesting. Ah, I didn't know. So basically means we're probably gonna get a bill pretty soon. Interesting. So on, on, on the open source piece before we go to that and, and I know we're coming into the end of the hour and I wanna be conscious of your time, but our engineering team, I ask them like, Hey, what, what do you want me to ask Daniel? So this question is not for me. This is from our engineering team. It's like Scala 3. Everyone wanted to hear your comments around Scala 3.

Daniel Ciocîrlan: Okay. How much time do you have?

Karim Galil: As much as you want.

Daniel Ciocîrlan: Yeah, I

Karim Galil: It was Scala 3 and they also wanted to hear about Scalameta which by the way, when I heard Scalameta, I was like thinking of like the Facebook Meta kind of stuff. But then did my own research and was completely like, oh, I'm off from what I thought it is.

Daniel Ciocîrlan: Yeah, it's a, it's a, it's a tool. Scala 3 is the next iteration in the Scala world. And I think it's going in the right direction because one of the gripes with Scala in general was that with every, the Scala is a research language. It started almost 20 years ago. Oh my God. It's been a while. It started in 2004, something like that as a research language. So it started in Switzerland as a Ph.D or something research and it's become this production language, but at heart Scala is constantly evolving. So features are constantly being added into the Scala language and have been for the longest time. But this created gripes with the Scala language because in between versions, Scala was not compatible with the older versions, unlike Java, which moved very, very slowly, but it maintained the trust of programmers using Java because they knew for sure that subsequent versions of Java will still run their code. But with Scala, that wasn't, that wasn't the case. And this would, this was probably a, another reason for the lack of adoption in Scala early on, but they've fixed this in Scala 3 with monumental effort. They've rewritten the Scala compiler from scratch with a bunch of guarantees, for example, in between new versions, from example for 3.2 to 3.3, like minor versions, they are now compatible and major versions are only going to occur with a cadence of once per 10 years or something like that. So this is what, this is the kind of intention that now backs Scala as a research language. So Scala 3 is this new age in the Scala world where we hope that by guaranteeing compatibility, between versions, things will be more stable in the Scala world and with the libraries that we as software engineers use. And therefore, because things are more stable, people will have more confidence to use Scala and associated tools for large scale projects. So this is the, I think the main benefit of using Scala 3 plus the language itself has changed of course, to make it easier for programmers to write it correctly.

Karim Galil: Interesting. Alright, so that was very helpful. And specifically the area where you were explaining the differences between different programming language I thought that was very easy to understand I have to say so are you, are, are you living in, in Romania.

Daniel Ciocîrlan: I, I, am in Romania Brucharest. Brucharest is my home. I yeah, I plan to spend quite a bit of time here.

Karim Galil: There is a big, like. Really great movement from an engineering perspective in Eastern Europe it's picking up really fast, like Ukrainian, Romanian, Polish kind of developer community is, is growing fast. And it's a lot of Silicon valley companies now are, are, are trying to hire and, and build there. It used to be India for quite a bit. Then it became Ireland, Holland for some time. But now it seems to be moving more towards Eastern Europe.

Daniel Ciocîrlan: Yep. This is what I've noticed as well. Eastern Europeans are generally well prepared software engineers, and because of the pandemic software engineers now are working remotely. So US companies now have access to global talent and Eastern European software engineers being quite competent. It's quite an attractive offer for such companies like Silicon valley to invest, in hiring Romanian Bulgarian, Polish software engineer.

Karim Galil: All right. So I was thinking what's the best way to end our podcast. And as I told you early on, we, our mission is to make medicine objective, and we believe that this is a mission where we would be saving lives. Like some folks go through suffering that they don't have to go through because they either got a treatment that's not gonna work. So they better focus on quality of living or missed on a treatment that should have. Most of our stack is built on Scala and a lot of our engineers have got to learn a lot about Scala from your work and videos and, and website. So in a very indirect way you are, saving lives and that's I, I, I think I have to say I was very impressed by the fact that he decided to teach specifically a language like this. So thank you for, for, for the good work and thank you for coming into the podcast on like without a lot of context about what the company is, is doing. I really appreciate it.

Daniel Ciocîrlan: Thank you so much. I really appreciate you taking the time. I'm honored that you had me for the podcast. I wanted to thank you for your insightful questions. And I am really, really happy that you like my work and that I've helped you in any way that I can.

Karim Galil: Definitely. And just for the audience, the website is rockJVM, correct?

Daniel Ciocîrlan: It's rocktheJVM.com.

Karim Galil: rocktheJVM. So, and we know the rock now is coming in from the guitar in, in the background. so now we know like where this came from. Awesome. Hey, thank you so much for, taking the time.

Daniel Ciocîrlan: Thank you so much. I really appreciate.

PODCAST — 30 minutes

Hylton Kalvaria, Chief Commercial Officer on Patientless Podcast #008

Hylton Kalvaria is the Chief Commercial Officer at Mendel.ai, bringing our technology to healthcare businesses.

Listen and read →

Hylton Kalvaria, Chief Commercial Officer at Mendel.ai on Patientless Podcast #008

Welcome to another episode. When we, when we first started Mendel, we acknowledged that there's two key challenges. One of the technical challenge, can you actually get the machine to read medical records like human beings? The second challenge is actually distribution. Can you convince the market that machines can do that? The state of the AI today is the boy who cried the Wolf. So there is a lot of companies that have promised AI didn't really deliver on those. Which makes the job of selling and actually working AI very challenging. So we acknowledged that from the get go it's like, you need to build a tech, you need to build a very unique go to market motion for it. So when we were out in the market trying to hire our first chief commercial officer, it was as hard as hiring my co-founder and, we got to meet Hilton we saw the signs like this is the right hire, one, like the story about how he got into healthcare and what's the mission that he's after in healthcare kind of aligned with us second, his background, third, he made this really big statement. Listen, I'm not a sales man and that's exactly what we were looking for. And that's counterintuitive when you hire your Chief Commercial Officer. Welcome Hilton to our podcast. I thought this, we, we had a press release, but I thought like it's only official when we do the podcast. So the idea is to get to introduce yourself, your background, why you joined Mendel. This is our first broadcast where we actually interview someone in Mendel. So no pressure. Your father is a physician, you actually were born in South Africa. I only got to know that last week or few days ago. And that's how you got into healthcare. So why don't we start from that?

Hylton Kalvaria: Yeah, sure. Well, my dad actually told me don't go into healthcare. that's exactly what he said. He's a, now retired gastroenterologist and, I was pre-med in college and he is like, do not go in into healthcare. And I think for him, he loves treating patients. He's great at the, the diagnostic parts of it. But actually the business of medicine got in the way for him and he said, don't. And actually for a long time, I had nothing to do with healthcare. So I started my career as, an executive recruiter, actually. That's what I did as my, my first job did that for a couple years. Joined, financial technology company. After that I did two real estate technology companies. And then post-business school. I, I said, well, now now's the time I really wanna do something in healthcare. So that's really what I've been doing for the last 15 years now.

Karim Galil: So it all started by an advice not to get into healthcare.

Hylton Kalvaria: That's what he said. Yeah. And he still stands by that.

Karim Galil: All right. So what was your first healthcare gig? Like how you went from finance to, to healthcare. I believe you were at Zs.

Hylton Kalvaria: Exactly. ZS Associates. My first job there was actually doing, qualitative interviews for a multiple myeloma drug and so we had to do literally a hundred interviews with multiple myeloma experts. All over the world. And if you've ever done these kinds of interviews, before, by the time you get to the fifth one, they become super boring. So you try to figure out ways to make them interesting as much as possible and infuse your own personality into it. I really enjoyed that part. Cause you just end up talking to amazingly smart people on a topic that you had no idea about before. And I learned about multiple myeloma.

Karim Galil: So how does that work? Do they first get you up to speed on the clinical aspect of multiple myeloma, or it's only more about like the go to market or is the mix of both?

Hylton Kalvaria: It ends up being a mix of both. I mean, if they just throw you in the deep end with your first interview with some multiple myeloma key opinion leader out there, the interview will go really poorly. They will prep you, allow you to speak to some experts and this was really clinically detailed, the thing that we were doing there. So you have to be very well prepped for that.

Karim Galil: So when I first got introduced to you, Here is what I was told. I was told like you, had a call with the founders of Flatiron and few years later, this call ended up with like 1.9 billion dollars almost so like can we talk about that?

Hylton Kalvaria: Yeah. Which part of it that sounds like it might be overstating did

Karim Galil: When I heard that, I thought like, all right. If, if this call ended by that, let's say he got half percent of this. So I expected you coming in Maybach or some crazy car. So I wanna make sure I give the disclaimer, you're not a billionaire or a millionaire yet.

Hylton Kalvaria: Not yet. I arrived in my 2014 Hyundai Genesis, so really sexy car

Karim Galil: What's the story? I know that you were working for Roche and, I believe you were tasked to find unique data assets. That kind of changed life for a lot of different folks, including Roche, Flatiron yourself, a lot of other people in the industry, actually.

Hylton Kalvaria: Yeah. So the way this happened was I was sitting at my desk at Genentech and one of my good friends came by and said, they want me to work on this magic database and it sounds really stupid. I don't want to do it. Do you want to do it? And I was like, well, that actually sounds awesome. I wanna work on that. And so it became this project that I was almost voluntold to do. It was supposed to be a 5% project and the name of the series of projects was called work stream 2020. So this is back in 2011, 2012. And somebody smart at Genentech said, there's a whole bunch of things we need to prepare our organization for the year 2020, we think data's gonna be an interesting thing. And so, my project was data and they said, go find interesting sources of data that can support this growing oncology portfolio. And one of the first meetings I took was with the two Flatiron co-founders.

Karim Galil: And I believe this was also their first customer?

Hylton Kalvaria: We were their first or second customer at the time. And this was when they were still visiting everybody. Every meeting I had with them, Nat and Zach were there, they made me feel special, brought me In n Out. They were really trying to figure out what their go to market motion was going to be not only in Genentech, but then how do you generalize that to lots of other pharma companies out there?

Karim Galil: Interesting. So, what is it that magic database. What was the thesis at the time? I mean, obviously we know how it ended. It became manually abstracted medical records, but what was the initial thesis at Roche for this magic database?

Hylton Kalvaria: Well, initially, most of my career at Genentech was in the commercial organization and so that's why they thought I was particularly interesting. If you know the whole story, actually it ended up being less about the commercial organization. More about clinical development, real world evidence, post-marketing studies, those kinds of things. But initially the thought was, well, if you're going to track your market based on claims data, why couldn't you do the same thing with the HR data? And so that's initially why they were interested in talking to me and as we started to really get into what are the real strengths of EHR data, the outcomes, right? The outcomes are the thing that you cannot get anywhere else. That's what really led us to think about other parts within the company that could really benefit from this kind of data.

Karim Galil: So would you say the bigger usage of real world data today or HR data is on the R and D side or on the commercial side? When it comes to the pharma.

Hylton Kalvaria: Because commercial organizations care about getting the biggest data cut possible. I think they still tend to use claims databases and those kinds of things. So for me personally, I've seen much more activity in real world evidence groups, H E R groups, medical affairs, and then bleeding into some clinical development also now,

Karim Galil: Before we'll come back to that like just to finish your background. So from there. You actually went from a buyer to become an employee in Flatiron. Right? Were you one of the first 50 or something, or you were later on in the journey of Flatiron.

Hylton Kalvaria: So what happened was, I think I got my offer and I would've been number 50 and then they acquired a company, and got 60 people overnight. So I went from being like number 51 to being 111, which is like a bit of a bummer, but yeah.

Karim Galil: Okay. Well, that still counts. So that, that was like early on. Right? So what's the story? Because at the time flat iron was still a startup, right? Like it wasn't Flatiron that we know today. So what made you make the leap from a big company? Like Roche super established company to like a Flatiron.

Hylton Kalvaria: A lot of it was how fast you can move. So my special project success for me was defined as I got one SOW in place. With, an outside partner. And to measure yourself based on that was actually fairly sad for me. So I wanted to move to a place where we could move incredibly quickly. And when I got there, it was exactly as promised everything moved at 10 X the pace, information is flowing everywhere in a way that I just had not seen anywhere else. So actually, initially it was very disorienting to go from a place like Roche, where. Communication and information is metered out. If you will, to a place where it's just like free for all. It was awesome.

Karim Galil: Yeah. It's interesting. A lot of folks start in startups and end up in startups, some start in corporate and end up in corporate, but it's usually the most interesting conversations is when someone saw the two sides of the coin, basically like a small company and a big company kind of a thing.

Hylton Kalvaria: Yeah. The biggest thing I noticed in my first week was, people were routinely sending emails to Flatiron all like the Flatiron all email address. And this to me was like, you never do this at Genentech. I mean, if you send an email to more than 20 people multiple times a week, like you're probably doing it wrong. Here you have this information flow to a hundred people, 150 people, and people would regularly do this without even thinking about it. So the information flow was very disorienting to begin with.

Karim Galil: But at the time, like have you ever thought that you're eventually going back to your first employer through an acquisition? Like, would that even cross your mind at the time?

You left Roche, then Flatiron, but you ended up in Roche somehow again, like through that acquisition.

Hylton Kalvaria: Yeah. And you know of all the large companies, you could go back to their top, top of the list. I mean, they really aren't an amazing company, but once you've experienced the speed of going to a small company, there's almost no way you can go back to a large company again, it would take a very unique circumstance.

Karim Galil: So is that why you joined Verana after that?

Hylton Kalvaria: Exactly. Yeah. So went back to large company. Realized I wanted to do something small again, and then join Verana at probably number 25.

Karim Galil: Oh, so with ver you're one of the first 50. Oh yeah. Yeah. Okay. So I just, for folks who don't know about Verana. Verana is basically, almost like a version of Flatiron, but outside oncology, right? Like other therapeutic areas that is not really covered by real world evidence.

Hylton Kalvaria: Exactly. And that was part of the appeal for me is oncology has such an amazing amount of real world evidence today, but, ophthalmology, which is really what Verana was focused on at the time, there wasn't a whole lot of this information. So why does oncology get all of the cool data sets, right? Why can't we have it in these other areas? So I've been doing oncology for the better part of 10 years and wanted to learn something new.

Karim Galil: Interesting. So what brought you to Mendel, right? We're obviously like to the size of companies where one of the smallest companies that you have joined, right? What was your. I can leave the room if you don't want. No, but like what attracted you to Mendel?

Hylton Kalvaria: Why this weird, like confluence of events that ended up happening, and maybe they're not so random, maybe you had something to do with it, or somebody else had something to do with it. But my intern, who I had at Verana from two summers ago reached out and was like, hey, you should come talk to us. Search firm reached out the same week and somebody who I really respected. Who is now our Chief Product Officer, Sailu, I heard that she was joining at the same time. I was like, well, that's three really interesting things. I should come talk to these guys.

Karim Galil: So here's the backstory.

Hylton Kalvaria: I want to hear this. Yeah.

Karim Galil: The backstory is, when we first engaged in that we engaged like a very reputable executive search firm and on the intake meeting, very first meeting, they were asking me like, who's your dream candidate? Like, can you describe us the profile. So I was like, I want someone who can do this. I can do that. And like as I was going through it, one of the partners was like, I know the guy that you should hire. I just don't think like you're too early for that hire. I was like, okay, fine. And he just mentioned your name and kind of gave me a brief and completely forgot about it. And we moved on with the search. And then he called me two weeks later and he was like, well, this same person called me. And he was like, what's up with this company called Mendel. I've been hearing their name almost three times in lesser than a week. I'm actually open for a meeting. And I think this is how the serendipity behind that story, how we ended up meeting.

Hylton Kalvaria: The other thing too is I thought I was actually coming in to get a demo to begin with which I was, yeah. I mean, we did the demo, but the demo was really when I saw it, I was like this. This is unbelievable. It's almost black magic in a way. And then it kind of turned into something else afterwards.

Karim Galil: Yeah. I believe after the demo, you sent us an email full of questions. There was like eight questions or nine questions. I don't know if you remember that email, but there were like to the point kind of questions you asked about dates you asked about, there are like very, kind of someone who have seen what is not working in AI can only ask those kind of questions. So maybe this is a good segue for that. One of the things that we as I said at the beginning of the podcast is we believe that the state of AI and healthcare in general is the boy who cried the world. But in healthcare, in specific, you hear AI all the time, but we haven't really seen the impact of AI yet on healthcare. So if we pick the area of electronic medical records in specific. What are your thoughts about that? What have you seen, what is not working? And what do you think is, or you believe is working here in Mendel. It's always refreshing to see the insights of someone who's outside the company. Like not anymore, but I've been doing this for 4 years, so I'm somehow in a bubble, but what was your first impressions? What impressed you and why.

Hylton Kalvaria: I think the first was honesty about what a can and can't do. Cause I think we can really hurt ourselves and it hurts the industry. When people come in and say, it can do literally everything because it's not, not credible. And you can kind of get into, you know, proof of concept or a contract with a customer and they very quickly figure out what it, what a can and can't do. Which exactly why. Now I wanna go check that email that I, I sent you guys to see what I'd written there, but I wanted to understand what, what worked and then where the holes were. And to have an honest discussion about, what has been done already? What's different from what other people have done in the past? And what do we still need to do as a group? And I've sat through many presentations with companies coming in to talk to Verana and even at my job at Genentech where people would promise things were that were just not credible. And I think there's a big danger in doing that.

Karim Galil: Yeah. So expectation setting. It's not only about like what we can do. It's also about what we cannot do today. Couple of arguments that we hear a lot in the industry, why do you need medical records? When claims has been basically doing the job for the last few years and the argument is like, you only need medical records in very specific situations. So it's kind of a niche thing. But claims is 90% of the times can cut it and can get the job done. What's your take on that?

Hylton Kalvaria: I think that's completely false. All you have to do is look at. What's happening with drug development in general, if it were the case that you were going to have many more, you know, blockbuster drugs that we saw in the nineties, where you could say every single patient who's got this cardiac condition should get this drug. I would agree with you. Right. But we're getting so specific in what people are, are studying here. And when it's not only how they're studied, but once they get on the market, it's not good enough to know that the patient has lung cancer. You have to understand that it's non-small cell lung cancer and then you have to understand which biomarker within non-small cell lung cancer. So drug development is headed into a direction where claims data just can't do it. So I think it's a fallacy to say that you have to have EHR data or figure out how to cobble together EHR plus claims plus genomic data, which I think is where everybody's headed now.

Karim Galil: So when it comes to the unstructured date or the medical records, there has been three kind of schools of thought one is that the only way to do it is manual abstraction. There is a lot of context. There's a lot of nuances that only humans can capture. Right. Then there is the other school of thought, which is AI will do it. We will build an AI. That is gonna understand medicine like humans, and it's just gonna structure the data, think of IBM Watson or Amazon comprehend, for example. Right. And then there is a third school of thought, which is we can build a machine that can understand a lot of things, but not everything. We will have some sort of a symbiotic relationship where AI increases the efficiency of humans and maybe even replace humans in some end points, but doesn't completely eliminate the human from the loop. What's your take also on this? What do you think is gonna end up being the go to approach?

Hylton Kalvaria: I think it's inevitable that'll always be an augmentation for humans. Like these two things have to work together. I think the Flatiron model was fairly heavy on the human side. With some technology to facilitate it, basically mechanical Turk for going through a patient record. What I found appealing here and what I find credible in the moment and why I'm enjoying what we're doing here. It's that model's flipped. Heavy on the machine with the appropriate amount of human intervention. So if we can figure out how to flip that model and say, these are the things that machines are really good at measure it really well. We think these are the variables that machines can do better than humans and those variables exist. And here are the ones that machines don't do particularly well at but we direct the humans to where they need to spend their time. That is a winning model.

Karim Galil: Obviously I agree. This is the thesis around the company. We, and, and maybe this is good, like for, for the audience of the podcast. So we ran this experiment, before you joined, we ran an experiment where we really wanted to understand how humans do against machines. We got 20 data variables that we asked. Two sets of abstracters to, to extract from the medical records. So like group one and group two, and we wanted to do enter annotator agreement between both of them, just to see like where did they agree? Right. Then we got a third group that we gave them AI to augment their abstraction. And then the fourth is AI only. Here is the very interesting finding on some data variables, like response or outcomes. Actually the chances for two humans to agree is against the gold set is almost a 50%. It's like tossing a coin. If you give them AI, what end up happening this jumps up to almost 80%. The reason is the AI is able to guide them to where in the record, there is most relevant information that can guide their decision making. The other key finding that we found is on some endpoints the AI outperformed humans. So the conclusion we came up with is like on some endpoints, AI is definitely better than humans on other endpoints. It is humans plus AI. There is no end point where humans only can actually get to the accuracy that that matches or comes close to a goal set. And we thought that was very intriguing finding that we didn't capture before when we first started Mendel actually.

Hylton Kalvaria: I think that makes a ton of sense. So somebody said here, the machine doesn't get tired. Right. So if you're trying to find all the mentions of something, all the instances of something complicated to narrow the task down and then have the human weigh in afterwards. I mean, it makes a ton of sense.

Karim Galil: Yeah. What are your key goals now in Mendel? Now that you have joined if you wanna share like, I mean, obviously I know all of them... it's just what would you like to see? Or how would you define success for the commercial organization in the upcoming few months.

Hylton Kalvaria: Yeah. So the challenge with the technology that we have is that, you can point it at so many different things in the healthcare industry. And as you kind of go in and this has even been eye opening for me to talk to customers now, The pockets of unstructured data that exist within the healthcare industry. It's vast, right? So it's a bit of a game of where do you go first? What I'm trying to figure out right now is where is the place where we could go to begin with where we can do the most good and figure out a repeatable way of engaging with customers that delivers them a ton of value and allows us to learn along the way. So we don't spread ourselves too thin. So it's really a game of where are we going to go first with this incredibly powerful tool?

Karim Galil: So obviously the word AI is something that's being repeated in the industry almost every day when you first joined? I believe, Wael, Wael is our co-founder and Chief Science Officer. Kind of gave you a crash course on AI, machine learning, symbolic AI, like all sorts of like AI things. Right. What was the most intriguing or surprising thing that you found out about AI that you didn't know before you joined the company?

Hylton Kalvaria: So I had almost no experience with this prior to coming here. And the most amazing thing about talking to Wael is the examples, the examples he gives, which are the real life examples, you get them immediately. They're so tangible and you can understand immediately why AI will struggle with certain kinds of concepts, but maybe the most amazing thing is, the NLP technologies that I knew, at least in the past. Pretty rudimentary. It's basically information retrieval. And when you actually walk through some of these examples, you're like, wow.

Karim Galil: Can you share some of those examples?

Hylton Kalvaria: Yeah. So when, Wael showed me is. We're trying to understand all of the words in the sentence or as many as possible with the relationships between them. It's both of them. It's not just individual words. It's all of the words and the relationships, and then try to model those relationships using a variety of techniques. When he showed me examples of what comes out of some of the other systems, it was one or two words that end up being pulled out. And sometimes the relationships between those words don't even come. So to me, I was like, Is this really what people call NLP? Because it felt like it was just on a search for very specific terms in returning those terms. It actually didn't make sense to me.

Karim Galil: Yeah. I mean, you find very basic things like fatigue, fatigue can be a symptom. It can also be a side effect. Yep. So how can the machine actually distinguish between those two things and actually in the same document, it can be both. It can be sometimes a symptom and sometimes a side effect for certain events. As a physician, I never thought that that fatigue can be such a complex thing because you take it for granted that your human brain can distinguish those almost instantly. But the amount of work that an AI team has to put in just to distinguish a side effect from a symptom is massive.

Hylton Kalvaria: Yeah. The other thing that I found amazing was how layered all these approaches are for us. And even something as simple as negation, right. You know, Wael has mapped out all of the different ways you could do negation in a sentence. And there's a whole class of algorithms just for that. And. You start to look at all of the different ways human sentences and medical sentences are constructed and you start to pick apart all the different pieces that require different kinds of frameworks and algorithms. That's what he and we have built. It's amazing.

Karim Galil: So usually when we approach a customer, they have a problem that they don't believe what we're saying. Like they believe it's too good to be true. That's like one of our key challenges in every customer interaction. I think when you joined, you had almost the same kind of reaction. So I remember you went in, there was a customer delivery. You actually went in and tried to corroborate the output of the AI with the actual medical records. And you wanted to see like, is the AI actually as good as it's being promised, but you've also seen some end points or you've seen some data variables where you didn't feel that the AI was doing well on. Can you share more on that? What are the challenges that we have today with the product, from your own observation and that we're working?

Hylton Kalvaria: Yeah. Great. Great question. So the one that I remember very clearly, and this is not like a class of problems, but, EGFR, if you're talking about kidney function or eGFR, if you're talking about cancer, right, they're spelled the same way. One is a lower KC, actually, almost every case is gonna have a lower KC. And, and so I found this example and, you know, it's the kind of thing you could fix it really easily. And while I pointed out that there's hundreds of these examples out there, so that was. One class of issue that I saw the other is around dates. Dates are hard. I think variables that are single points in time, I think tend to be a little bit easier, but understanding one, that this actually is a continuous variable where we expect to see lots of different measurements over time and then capture events, value and dates over and over and over again. I see that as an area that the team's working pretty hard at.

Karim Galil: One thing that a lot of our customers and potential customers don't understand is the actual difference between NLP and NLU. So it's one thing to be able to extract biomarkers or dates or whatever data variables that you want from a page or from a document, but it's a completely different set of challenges. To stitch those together to become a patient journey like a patient is, is what probably thousands of documents occurring over a span of like few years. And you have to extract events. You have to extract them and then you have to kind of put the puzzle together so that you understand what happened before what. You wanna speak on about that?

Hylton Kalvaria: Yeah, that was probably the other major eye opener event for me. To understand what NLP technologies today are doing. Some of them do a decent job at extracting all of the clinically relevant events. And the way I explain it to customers, when I talk to them now is, you know, if you were to work with some of those technologies, they will dump 3000 terms in your lap and basically say, okay, now you customer stitch it all together and turn it into a nice summarized version of that patient record. What was eyeopening was being able to take all of those different mentions of a particular term. Understand the trustworthiness of where all of those terms came from in the patient record and have algorithms that can decide based on the kind of document it is, where it found at the context in the sentence and come up with the consensus best answer for it. To take thousands of things and turn it into the summarized version of the patient record. That might be only the 30 or 40 data points that the customer truly needs to understand what happened with that patient that is completely groundbreaking.

Karim Galil: So you've mentioned earlier, we were getting into that thread of things that is not working right. And you've mentioned there are some data variables that remain very challenging, like dates, biomarkers , and some words that can be understood in different ways. What else? What else do you think is still missing today? That Mendel has to be investing and working on.

Hylton Kalvaria: I think one of the other pieces is actually structured data. Because I think we've tackled the hardest problem, which is unstructured data, but realistically speaking, there's many pockets of data out there that contain both the structured and the unstructured. So figuring out how to bring all the structured data in when we consider that consolidated version of the patient record, I think is something that we're gonna have to tackle over the next year.

Karim Galil: That's hundred percent. It's the merging of the structure, then the unstructured actually has a lot of challenges that a lot of people are not aware of because sometimes you have contradicting facts. Like you have something in the medical records that the patient was not built for or something that the patient was built for that actually does not exist in the medical record. And how to reconcile those is not an easy job.

Hylton Kalvaria: Yeah, it's hard. we talked at some point, well maybe, maybe you say the, the trustworthiness of these structured variables, you, you crank those up. And so as the system evaluates, it, that's the one that ends up picking, but sometimes you end up having things in the structured data that isn't necessarily right. Either. So it's hard to know which to go with.

Karim Galil: So in this day and age, work from home is the standard. Actually when we were negotiating the offer, we never talked about work from home or work from the office. But, I try my best, like before you join, I'm always the first person to step into Mendel like I'm the first person to turn on the boiler and the coffee. And since you have joined, you are the first person to come in. I don't know how you are doing it. Like today I came. Intentionally earlier than every time yet I come and I was like, today's the day I'm gonna be the first one I come in. You are in the meeting room. Like when do you come to the office? Like, I really wanna know that,

Hylton Kalvaria: I normally get in around 7:45am because my first meeting is normally at 8. And I figure if I'm not in the car and sitting down here, I kind of missed my window to travel. So my schedule in the morning, I have two kids, two girls, 10 and 13, try to help a little bit with them in the morning. So it's not all on my wife. And then I'm out the door.

Karim Galil: So obviously we have Benny who's recording a podcast. So there is two competing shows in Mendel. There's the podcast and then there is the vlog, and Benny's vlog is more popular as we speak today than the podcast. But one of his questions, I'm gonna borrow that question is who is your favorite employee in Mendel.

Hylton Kalvaria: Oh man. Actually easy, Thiago wow.

Karim Galil: I don't understand why Thiago is everyone's favorite employee.

Hylton Kalvaria: Well, a couple things

Karim Galil: So Thiago is actually just for context. Thiago is on our AI team. He just joined, he actually quit his PhD to come join Mendel, as one of our first 10 AI scientists.

Hylton Kalvaria: Yeah. So he's aside from me, he's first in, in the morning. So we always like congregate around the coffee pot over there as we're making our fancy coffee in the morning. Plus who's got a better accent than that. , it's the most amazing accent.

Karim Galil: So Thiago is originally from Brazil, I believe. Right. Everyone likes Thiago he cost us last month, I believe, close to $3,000 worth of like ping pong balls. What else did we get? Foosball we got a lot of ball stuff. So we haven't yet hired our first like head of finance. Like he's about to join in a few weeks and the joke is like, spend as much as you want before he comes. And it was a joke, but people are actually taking it serious. You're actually one of them.

Hylton Kalvaria: There was a need for t-shirts here. I gotta order the t-shirts.

Karim Galil: So when you first joined, like I think that was one of your questions, like why no one has any Mendel t-shirt here. And then one day I come in, everyone is dressed in Mendel t-shirt I still don't have one by the way, but like everyone had Mendel t-shirt. It was almost like synchronized, like where everyone came into that.

Hylton Kalvaria: I got a form you can fill out if you want one.

Karim Galil: That's what Benny was telling me. There's like a Google form or something to fill in there. Hey, thank you for being on our podcast, being the first, Mendel employee on the podcast. Thank you for joining the company. We're super pumped. And I think there is a exciting journey, ahead of us. If you have unstructured data, reach out to Hilton, we need your unstructured data.

Hylton Kalvaria: Yeah. That's hilton@mendel.ai.

Karim Galil: Awesome. Alright man, thank you. Have a good one.

Hylton Kalvaria: Cool.

‍

PODCAST — 26 minutes

John Lee, Partner at JAZZ Venture Partners on Patientless Podcast #007

John is an experienced early-stage investor, focusing on companies across frontier technology, enterprise software, robotics, healthcare, and artificial intelligence. He gravitates towards startups that are using science and technology to break down barriers to productivity growth and enable a better future for the largest number of people.

Listen and read →

John Lee, Partner at JAZZ Venture Partners on Patientless Podcast #007

Karim Galil:
Welcome to the Patientless Podcast. We discuss the Good, the Bad, and the Ugly about Real World Data and AI in clinical research. This is your host, Karim Galil, co-founder and CEO of Mendel AI. I invite key thought leaders across the broad spectrum of believers and defenders of AI to share their experiences with actual AI and Real World Data initiatives.

Karim Galil:
Hi everyone, and welcome to another episode from Patientless Podcast. Today's guest is from the venture capital world. We don't usually invite a lot from the venture capital world, but he has a very interesting background, very interesting portfolio of companies, and I thought this is going to be a very interesting conversation about the future of AI and where is the industry heading. Today's guest has his undergrad from Cornell in Biology. And rather than being in the wet lab today he actually is in the VC world. He started as an associate at Lux Capital, but today he's a general partner at Jazz Ventures with very interesting investments, such as Gordon Health. Today's guest is John Lee, partner at Jazz Ventures. John, thank you for making it to the episode.

John Lee:
Thank you. I appreciate it. Thanks for having me, Karim.

Karim Galil:
John, what's the journey from biology to the venture capitalist world. What happened?

John Lee:
I ask myself that every single day. I was studying computational biology in undergrad, but at the same time I was doing internships more around health systems, and around how technology can impact health systems. The one thing that I noticed was oftentimes there were these really interesting scientific discoveries in the lab, but frequently they weren't making it out into the real world for a number of different reasons. And so, I was always interested in the application of technology rather than just the basic discovery of science. And so, when I was starting to think about what I could in between undergrad and joining the PhD program I thought venture capital would be an interesting way to explore the things I was interested in on how do you get breakthroughs in science really out there in the world. It has been over a decade now. I'm probably not going back to grad school anytime soon. But, I do think that being in the VC world is an interesting opportunity to really push out these core innovations that sometimes do get stuck in the lab, and I think it's one of the most effective ways to do so.

Karim Galil:
Is that what Jazz is investing in? Are you guys investing only in healthcare? What's your thesis around investments at Jazz Ventures?

John Lee:
Yeah. We do have a lot of interest and focus in healthcare, but it's not the only stuff that we invest in. We have a really broad mandate to invest in companies that expand the boundaries of human performance. And so, we particularly like looking at breakthroughs at the intersection of digital technology and neuroscience that can impact human experience cognitively. This has led to lots of different companies in our portfolio where we think that things like consumerization can really impact healthcare delivery pretty positively. We think that human and loop AI can augment productivity in lots of really interesting ways. And so, we really look at everything from the enterprise, to healthcare, to therapeutics, to even consumer products. And so, we have a pretty broad mandate, but largely set around this idea of how do you scale productivity, how do you enhance and expand human potential in a lot of different directions.

Karim Galil:
A lot of these companies have very solid, and they spent years basically in the R&D process. How do you guys as a venture capital evaluate the technology, or evaluate the secret sauce, say for a therapeutic company for example.

John Lee:
Yeah. It's a great question. Prior to Jazz, I spent about a decade helping to build a firm called Osage Partners, which focused on academic innovations and how do you [inaudible 00:04:05] into companies. I'm very familiar with the topic of how to evaluate technology. Frankly, the simple answer is that most times the technology comes secondary to the team that's actually building and rolling out the technology. Oftentimes, the build in market and the form factor that you take at the technology is more important than the technology itself. You know, that being said, occasionally there are moments where the superiority of technology can be the biggest competitive advantage, which is actually often the case with things like therapeutics where there's very strong IP around it.

John Lee:
I think it works a little bit differently when you start talking about digital technology, oftentimes software doesn't have a strong [inaudible 00:04:47] around it, and so it's a little bit of a different go to market, and the packaging of the process matters quite a bit more. I would say that AI is a great example of this. There's lots of different methods out there for how do you get a slightly better neuro network, how do you get slightly different algorithms that are somewhat better than one another. But in reality, the most important thing is how do you package that stuff into a completely product.

Karim Galil:
You just opened a big can of worms now. There's tons of questions in my head. Let's start with this, all right, how can an AI company... I hear you, it's really hard to patent AI today. It's just hard. And once you've published something, it becomes public knowledge already, and anyone can just get the paper and work on the same model. How can a company build a mote? How can a company actually build a defensible software business? Is it only the packaging, or is there any kind of network affect you have seen from a customer perspective, or from a data perspective?

John Lee:
Yeah. I mean, I would say that the classic answer for this would be you want to build some sort of data mote, you want some proprietary way that you're collecting data and which can feed to your algorithm that nobody else really has the ability to do so. I think this is particularly relevant when you're talking about neural networks that rely on large sources of data. But you know, that being said, it's always hard to compete against the Googles, Facebooks, Microsoft, AWI, Amazons of the world, because they're always going to have more data than you. In the world of neural nets, I actually think it's very difficult for an individual startup to have a significant advantage there. And the advantage often comes from nimbleness and the ability to target markets that are perhaps too small of an opportunity for those large companies to go after and build a mote around kind of brand users features, and then build from there.

John Lee:
When you talk about things like what you're doing where you're talking about neural symbolic systems, a lot of the field really isn't quite there yet. There's an advantage in experience and breadth of experience in being able to design those symbolic systems where only a handful of people will have that perspective, or that point of view, or the ability to design those systems. I think that when we're talking about neural symbolic systems as opposed to neural nets, there's some higher inherent barriers to entry, because you have to be able to design those expert systems, which often the expertise is limited to just a handful of people in the world.

Karim Galil:
Then at that point, the team becomes one of your main competitive advantages. And the problem that we have seen, and I'm not sure how you work with that with your portfolio companies, is the great AI talent tends to gravitate towards the Googles and the Amazons; not for the salary, but because they can get to work with tons of data from day one. For them, it's like hey I go to a startup, my career is going to take a down turn because I don't know how much data they have, how much data they can get in the next year or two. Then hiring becomes a really challenging problem for a lot of companies who want to build say neural symbolic systems, for example. Is that a pattern that you're seeing in healthcare? And if so, how did your companies work around that?

John Lee:
Yeah. I mean, I would say that I think this applies not only to AI, but kind of engineers in general. Right? You're always going to have a lot more safety, and a lot more comfort, and probably a lot more interest from engineers to work at those large companies because they are a lot more attractive for a number of different reasons. That being said, I do think that technology rapid commoditize, and so when it comes to neural nets I would say maybe five years ago there was probably a pretty substantial AI talent shortage where there were really only a handful of experts that you could draw upon, and there was a lot of competition for those. I would say since then there's been a lot of commoditization. You can see that because you're not longer seeing the massive seven, eight figure salaries going to AI engineers as it used to five years ago.

John Lee:
And so, I think naturally technology commoditized. There have been lots of AI platforms that have come out since then that make it a lot easier to work with neural nets, work with deep learning, without the need to be able to go and design the algorithms themselves. Really employment is the key issue there. In reality, for healthcare and pharma companies, yes they're never going to be able to recruit a large number of those algorithm designers, but they're going to benefit for the commoditization of a lot of these technologies. In some ways, I think it actually naturally matches up well with the risk tolerance they have anyways, where you probably want to start absorbing those sorts of technologies once it's prime time.

John Lee:
I think that when we get into hybrid AI systems, and kind of these novel architectures that are starting to emerge, a lot of those have higher day on utility to pharma companies, and I would guess that the engineers that are working on that would want to initially start in pharma or harder problems, because they're algorithms suit those problems better. And so, I actually think that from an AI shortage perspective, or a talent perspective: it's one, commoditize very quickly, or commoditized very quickly; and two, the next generation of problems to be solved by these hybrid AI systems, these engineers are going to gravitate towards those industries, and pharma, and these other healthcare organizations are going to benefit from it.

Karim Galil:
So, pharma companies pretty soon are going to be technology companies actually. And we're seeing big pharma are aware of that. They're already recruiting hundreds, if not even thousands in some occasions, AI engineers and AI talent, because they're quickly realizing that... I mean, [inaudible 00:10:33] now says we are a data and a pharma company, we're not just a pharma company. But, a lot of our audience are not super tech savvy, so we jump on the neural symbolic approach, terms. Can we explain to our audience what is the difference between neural nets, symbolic AI, a neural symbolic approach, and what are the benefits of each one of them?

John Lee:
Yeah. I like to somewhat think of this kind of from a psychological perspective. If you think about levels of understanding of say animals, I would say that there's probably a few different levels, and people like Judea Pearl comment on this where the first is more of a sensory and observation level, or an association level, where you're taking in insights and you're making conclusions based off of rough correlations that you're doing. I would say that this is probably where neural nets are today. It's basically saying all the answers are within the data, and with correlation you can find every single answer, which obviously just by saying that statement there are flaws in that, because a lot of those correlations tend to be spurious. But, I think that's where we basically are with neural nets today.

John Lee:
A level above that, or a tier above that, would be the ability to intervene or do based off associational observations that you make in an environment. I think that's kind of where neural symbolic systems come in, where symbolic systems are these expert systems or knowledge systems where you have roles associated with what you view as kind of knowledge in the world. And so, you map out these systems of knowledge and then you apply things like big data or correlative systems like neural nets to have a better understanding of what's going on. For example, you could do what would X be if I do Y? You can make these types of conclusions.

John Lee:
I step beyond that where you start getting to strong AI, and artificial general intelligence is the ability to think counter factual, or to imagine within a system. I don't think we're quite there. I think that things like neural symbolic systems are really a step for that and probably will be the predecessors of truly strong AI.

Karim Galil:
Would it be safe to say that a neural net learns by statistical weights, like how statistics work, versus a neural symbolic system is leaning more towards learning by facts?

John Lee:
Yeah.

Karim Galil:
If that is then the case, it seems like the way to go in healthcare is a neural symbolic system, because I find it hard to imagine a physician working with a neural net and getting hey, this patient... The chance for death is high within the next three months. They will gravitate to why, and then the system is going to fail to say why. And a physician will then feel comfortable working with that. I also find it hard for a pharma executive to basically make decisions based on a system that doesn't really meet the FDA way of thinking about life, which is very factual, and very scientific way. Would that be a safe assumption?

John Lee:
Precisely. I think that's correct. But, one way to say this is if you look at a thermometer and you see the temperature rise on a classic thermometer, and then you feel that it's getting warmer, a neural net approach would be the rising of the thermometer is causing the temperature to rise, or is it the temperature that's causing the thermometer to rise? With a symbolic system, you simply just place a rule and say it's obvious that the temperature rising impacts the thermometer. You draw that causal inference, or the causal relationship, and you have a much better understanding of what's going on. And so, when you're talking about pharma, it is very important to know if the impact that you see due to some sort of molecule is the result of the molecule or it's something else. And so, those causal relationships are really the key to unlocking much more intelligent systems. And it not only applies to pharma and healthcare, but it applies really to any industry that has sparse information and that requires true insight and understanding rather than just being able to associate.

Karim Galil:
The question then is how can you start crafting those rules? I mean, if we can think of medicine, what are the rules of thinking of medicine? It becomes really hard. How have companies solved that problem? Is the approach to those neural symbolic systems very rule based, requires clinicians and experts? Or is it a hybrid? What does it really mean?

John Lee:
Yeah. I think multiple approaches to do this. At the core of the question it becomes comfort with how do you define causality. That's really the important relationship to suss out here. And so, if you take kind of a historical scientific approach, you kind of steer away from causality, but in reality as humans we probably assign a lot more causality than correlation, and do it probably correctly in most cases. There are a number of ways to do that. I think one way, if you have experts designing the systems, they have a better sense of what is causal and what is correlative, but it's going to be subjective, but you have lots of experts in the design and then you create it. I think that's one way.

John Lee:
I think the NGO had a paper recently about how you could do this within your own nets where you statistically identify and suss out causal relationships, or what are kind of pre causal relationships. And so, I think there is a statistical approach here. There's also, Judea Pearl speaks a lot about how you can define those causal relationships at scale. But ultimately, I think this is the great unsolved problem when you're talking about neural symbolic system. How do you exactly create at that scale those structural ways to do things like semantic reasoning, or to create those understanding.

Karim Galil:
You talked about and touched on packaging and go to market. What are the successful models you have seen as an investor for an AI company to go to market, especially in the healthcare sector, what kind of business models and what kind of distribution channels work?

John Lee:
Yeah. I mean, there's no perfect answer here, but I would say that oftentimes it's no different than any other successful company that patches up a product. I view AI as really kind of a feature in the stack, it's a way to make things a lot better. You really have to focus on the things that actually improve and give you an advantage. And so, if there's a meaningful improvement for end user, I think that that's an appropriate place to really apply AI, it just has to... It just goes back to what is good product design and what is good product.

Karim Galil:
There's a very interesting blog post that [inaudible 00:17:08] had, which is investing in AI is not like investing in software. It's different gross margins, and it's more like investing in a pharmaceutical company where you need to expect two to three, if not even more, years of pure R&D, no commercial activities, before the company has something that is significant enough to take to the market. A great example is Adam Wise, Adam Wise now are cutting really big deals, I mean talking about billions of dollars very few month. But that took them what? Around four or five years before they come to that breaking point. How do you guys at Jazz think of that? Do you think this is a patient investment? Or do you tend to more do investments when the company pass that R&D threshold? What's your thesis around the timing of investment in an AI company?

John Lee:
Yeah. You know, I would say the [inaudible 00:17:56] article was probably a bit more focused around software tools, enterprise software and B-to-B tools that are permanently selling AI as a service. And in those situations the gross margin can be quite low, and the years to build it rather than kind of a traditional tool may take longer because you need the data gathering portion of it. I think when you compare that to examples like Adam Wise and some of these drug discovery companies, interestingly enough more-and-more I think pharma companies have been leaning on purchasing those services and doing drug development deals with companies that don't yet have a ton of data, rather they have an interesting approach.

John Lee:
The thing about Adam Wise and some of these kind of first generation drug development companies is that they're using these neural nets and there's a big hypothesis that this could lead to better ad-me, or better drug selection, and candidate selection. It could be possible, but it's probably more investing around hope than reality. That's partly because just the feedback cycle in pharma is just way too long to actually tell.

John Lee:
And so, I don't think we have any issue investing in companies that does have, or want to spend the time to build something foundational. And then the question becomes what are the near term and midway milestones that are indicative of future success? That's slightly different for every single company. But for a drug development company it's really hard just because the feedback loop is just so long.

Karim Galil:
Any interesting investments that you have done lately you want to share with us from your portfolio today?

John Lee:
Yeah. Speaking about drug development and something that is kind of very relevant to this conversation, we are an investor in a company called Genesis Therapeutics. In a lot of ways they're an AI powered drug development and drug discovery company. Their philosophy is that the first generation of drug development companies were using these computer algorithms that were really ported over from things like imagine net and that are looking at molecules at a pixel-by-pixel level, but have a true understanding of what's going on at a physical level. And so, the team over the invented something called potential net, which is using something called a graph convolutional neural net and it's basically using a physics based model with knowledge and understanding of how proteins fold, and then designing molecules from that point.

John Lee:
In a lot of ways, it's very similar to a neural symbolic approach, just given that you are starting with a base rule of limitations... Or a base set of rules, and limitations, and libraries, and then optimizing using neural nets to find the ideal molecule. I'd say approaches like that are just really exciting, because it is the next generation and there seem to be, at least in early data, some real impact and real positive impacts when it comes to ad-me optimization and selecting a much better molecule.

Karim Galil:
How's the investing world going virtual? I mean, take a company like the one you talked about right now, Genesis, this is very sophisticated technology. I'm assuming you need to spend a lot of time with the founders, and getting to know more about them, and about the tech. How are you guys able to do those kinds of interactions today in a world where everything is on Zoom?

John Lee:
Yeah. It's interesting, there's somewhat of a dichotomy happening. I would say in one way things can move a lot faster, because the barrier to meeting and then also the expectation of how well you get to know somebody has gone down. And so, if the barriers have gone down, and then the expectations have gone down, you can essentially work through a deal a lot quicker. I think we're starting to see this in the case of these deals. I think that there's probably a few months of hesitation where a lot of firms were not sure what was going on and probably hit pause, but then are realizing that they're being super productive and getting lots of companies... Spending time with lots of companies. I actually think that in terms of operation it is ideal for the VC world to operate on this model.

John Lee:
I think the downside is its harder to get to know somebody very well, and have I would say a lot more attention and time focused on a specific relationship. And so, if things are moving faster you have a shorter period of time to get to know somebody before a deal is done. I think additionally, if you're not meeting someone face-to-face that there are some clues that you're probably missing from body language that may have impact later on, but it's unclear what the feedback loop is. I would guess that in any situation like this fraud probably increases over time. And so, it'll be interesting how it plays out. I would say that productivity has certainly gone up, but maybe the depth of diligence, the ability of the depth of diligence has gone down.

Karim Galil:
Should we expect another [inaudible 00:22:43] coming out soon from this pandemic?

John Lee:
I hope not.

Karim Galil:
Going back to that AI piece of the equation, when do you really think we're going to see a true impact of AI in healthcare? I mean, is it the next five years? You talked about the feedback loops. You said, today we're investing in hope, because by the time we know whether those drugs are going to work or not it's not going to be next week or next year. Right? It's very long feedback loops, as you said. So, when do you really think we're going to see a different healthcare system, a system that is driven by artificial intelligence, is driven by data insights rather than by subjective experiences from different stakeholders?

John Lee:
Yeah. I think it's happening now and the reason I think so is I think Covid has really accelerated healthcare innovation by 10 or 15 years, because rather than sticking to models that were slowly failing you had a realization from lots of providers, pairs, and stakeholders that the systems need to change now. The benefit of a lot of healthcare delivery going digital or digitally optimized is that you can inherently collect a lot more interesting data. And so, I do think that the transition is definitely happening. I think that it bleeds into everything from infrastructure, it bleeds into how you keep your records, it bleeds into EHRs, and the standardization of those EHRs, it bleeds into what you can do with the data once everything is standardized, and then new and novel types of information that you can then start analyzing with artificial intelligence. I think it gets quite interesting.

John Lee:
We've seen many different models in telehealth; everything from ABA therapy delivery, to primary care, all those things you're going to start to be able to automate certain parts of it. And it's a question about how much you can automate. But it's unquestioned that you will have a lot more information to build a lot more interesting things. And so, I think that for companies that are oriented around building systems that can deliver a lot more with automation, so neural symbolic companies and hybrid AI companies, it's such a fascinating time because now you actually have the data and the willingness from stakeholders to move. And so, I'm really excited about what's going on. I think that there's just a tremendous opportunity right now.

Karim Galil:
Speaking about Covid, I always like to end the podcast with asking if you can Zoom call any living person today who would it be and why?

John Lee:
That's a very interesting question. We spoke about him quite a bit. I would say that Judea Pearl is probably someone I would love to talk with and Zoom call, just given that I think a lot of his work around causal reason and causal inference is going to become very relevant in the very short-term.

Karim Galil:
That was a great choice. I recommend reading a lot of those things because it's just a different way of thinking and just a paradigm shift on how you approach AI problems in general. John, thank you so much for taking the time for this. I'm going to borrow your analogy around the biology of the difference between neural symbolic AI and neural nets. But again, thank you so much for taking the time for this podcast. It was a pleasure having you as a guest here.

John Lee:
Yeah. Thanks for having me. Really appreciate it.

PODCAST — 33 minutes

Jason LaBonte, Chief Strategy Officer at Datavant on Patientless Podcast #006

Jason LaBonte is an executive with over 15 years of experience in leading healthcare information companies. Accomplished manager at all levels, including analyst and production staff, product management and product development teams, and executive teams.

Listen and read →

Jason LaBonte, Chief Strategy Officer at Datavant on Patientless Podcast #006

Karim Galil:
Welcome to the Patientless Podcast. We discuss the Good, the Bad, and the Ugly about Real World Data and AI in clinical research. This is your host, Karim Galil, co-founder and CEO of Mendel AI. I invite key thought leaders across the broad spectrum of believers and defenders of AI to share their experiences with actual AI and Real World Data initiatives.

Karim Galil:
Hi, everyone, and welcome to another episode of Patientless. Today's guest is Jason LaBonte from Datavant. Jason joined Datavant as part of Datavant's acquisition of UPK, where he led a product management there. Jason actually comes from a very scientific background. He did his PhD at Harvard in virology, which means he's really in high demand nowadays, given all that's happening with COVID. And, obviously, at Datavant, he kept innovating on the product side.

Karim Galil:
We're really honored to have you, Jason, on this call. Thank you for giving us the time for it.

Jason LaBonte:
No, thanks for having me.

Karim Galil:
Awesome. So why don't we start by telling our audience more about you guys at Datavant and also maybe about you. I just gave a very quick introduction, but the journey from virology and Harvard to product management of a Silicon Valley tech company is, obviously, an interesting journey that the audience will be interested to know more about that.

Jason LaBonte:
Sure. Yes, as you mentioned, I had my PhD in virology. It was a lot of fun to do the bench work, but I did not envision myself as being a bench scientist for the rest of my career. So I was looking around at roles that would allow me to still interact with the scientific literature and thinking about science and medicine, but not actually performing that research.

Jason LaBonte:
And so my first job out of my PhD was with Decision Resources Group where I was a market analyst, and that was a role that was primarily centered on doing deep research on specific disease states. We did a lot of interviews with thought leaders, with physicians, but we also played with a lot of the data that was available at the time to build market forecasts. And so that was my first entree into health data. I was using retail pharmacy counts from folks like Ikea.

Jason LaBonte:
As I progressed in my career, we started playing with claims datasets, looking at how we could use that data to understand lines of therapy and switching patterns within specific disease sets and across therapies using patient claims.

Jason LaBonte:
And so over my career, I've got progressively involved in how you can use that real world data to understand how a disease is manifesting itself within a patient population and how treatment paradigms can be monitored using that dataset.

Jason LaBonte:
And so that was my background, of going into Universal Patient Key before they were acquired by Datavant, where really you became ... Now, Datavant focused on, how do I not stay limited by having a single dataset to work with? So a single pharmacy claim set or a medical claim set. But how do I actually find all of the data that is relevant to the analytical problem I'm trying to solve?

Jason LaBonte:
And so, as a former analyst, I could really appreciate the benefits that would bring if I didn't have to choose between dataset A or dataset B, but I could actually get both together. That gives me the ability to fill geographic gaps, to fill demographic gaps in the types of patients I have coverage of. It allows me to find different variables that aren't in a single dataset, but might be found by combining the different fields of different datasets from different sources.

Jason LaBonte:
So, I really liked the mission of Datavant in that regard. Stepping back, Datavant's vision, big picture is to connect the world's health data. Our partners are using our privacy protecting linking technology to link together disparate data sorts, datasets from across the entire ecosystem of folks who are collecting that data from patients. We're allowing our customers to really link those patient datasets together to build a longitudinal history without knowing the identity of those patients. That's a really important piece of our model, is to preserve patient privacy.

Jason LaBonte:
Our big picture goal is to create an open data ecosystem, where organizations retain full control of their data, so they get to decide what data, under what terms and to whom they're going to share with. But because they have that control, it really increases the liquidity of data to move to the people who need it to do their analytics.

Jason LaBonte:
And so our role in that exchange is really to provide the technology infrastructure to enable a data source to safely share data with a data consumer. And we support all of our clients in linking that data together, wherever it's sourced, without compromising patient privacy.

Karim Galil:
So on that patient privacy point, you guys are linking one of the world's most sensitive data. Clinical data today is the most sensitive data and there's a lot of concerns about patient privacy. How can you link datasets without compromising patient privacy?

Jason LaBonte:
Yeah, great question. So this is a technology that's been around for a little while in different forms, but I think we've refined it to what we believe is the most secure way to do that.

Jason LaBonte:
Essentially, in a nutshell, our software is an on-prem piece of software. So we ship our software to a data source that is holding PHI. That data source, therefore, does not have to share PHI with us or with anybody else, but they run our software on-prem, and what the software does is two things: It removes identifying information from that PHI dataset to turn into a de-identified dataset per HIPAA. As it removes a patient's name and address and other identifying info, it's adding back a unique encrypted identifier for that patient, which we call a data map patient key. It's also known as a token in the industry. And this key is unique to each patient. So as I create this de-identified output dataset, it's got unique patient keys for each of those individuals, and that key is interoperable with any other data sources also running the Datavant software.

Jason LaBonte:
So while I might have a record in three different datasets as Jason LaBonte, once those three datasets have de-identified their records, I now have a unique encrypted patient key in each one so that when they send their data off to another party to join my records together, they can use this linking key to know that all these records belong to the same person, even though they no longer know it's me. So that's really in a nutshell how the software works.

Karim Galil:
And how do you guys deal with things like nurses missing something in data input? So one of the problems of this industry is there is no really standard way of how you input data in an EMR system. And in many cases, Johnson can be sometimes John and sometimes can be J-O, and someone missed the rest of the name spelling and all those kinds of errors. So how can you guys ensure that you have fidelity of that integration piece of making sure that this patient is actually unique and identifiable across the whole system?

Jason LaBonte:
Yeah, great question. This is one of the longstanding difficulties with anybody, whether they're dealing with identified data at a hospital and trying just to link together all of your patient records, or, for us in the de-identified world, how do we do that matching correctly of a patient's records?

Jason LaBonte:
So, as you mentioned, there's a lot of ways that the data can be less than ideal when we work with it. It can be that different spellings of a name are used, people move and change addresses. And so the way we handle that is when we create these linking keys in a de-identified record set, we're actually creating different versions of those keys. So each record may actually have six to eight different keys appended to it based on different combinations of the underlying identifying information. So one of them may be built from your first name, your last name, your date of birth and your gender, but another one might be built off your social security number and your first name, if those happen to be present. We build them off of email addresses or phone numbers or street addresses or varying combinations of those elements.

Jason LaBonte:
So the idea is twofold. One, if there is a missing data element that restricts you from making one of our patient keys, hopefully you have the elements necessary to make some of the other ones, so you can link on those instead. But, secondly, and probably a little more importantly, is when I have multiple keys made, I can actually tease apart different matches to know which ones are correct and which ones are incorrect by making my matching algorithm a demand or stringency by saying, "I need, at least, five of the eight linking keys to be the same before I declare this to be a match."

Karim Galil:
That's really cool. And what about cases where data is unstructured? So, for example, in today's world, a lot of the physicians actually are dictating data to some sort of device and getting someone to input this data for them somewhere else. How can you guys deal with cases like that where actually the data doesn't exist in the form where you can see where's the first name and where's the last name is?

Jason LaBonte:
Yeah, that's a great question. So our system works really seamlessly with structured data because we know all the different elements that are supposed to be there and we can properly de-identify it and create these linking keys out of the structured fields. For unstructured, medical notes, imaging notes, and things like that, there's a really extreme wealth of information that's captured in that type of data. But to your point, there are two problems in working with that data. One is how do I properly de-identify it so that I remove everything that might be identifying in that note? And then how do I, perhaps also at the same time, extract those elements so I can create these linking keys out of it? Datavant has a solution for working with structured data to remove some of those elements. But, quite honestly, we have a lot of partners who are really excellent at that as well.

Jason LaBonte:
And so, Datavant, we view ourselves as a piece of the larger ecosystem. So as I mentioned earlier, we're helping people who have data share it with people who want data, but there's a whole host of other enabling technologies that are involved in making that data usable. And so we have partnerships with folks like Mendel.ai, who are really exceptionally talented at certain problems like taking in an unstructured note, structuring it, extracting values that we need to make a linking key. And I think by using our technology, along with our partners technology, that's actually the optimal way to solve some of these more thorny problems.

Karim Galil:
We actually had a very interesting experience using Datavant's technology a couple of weeks ago. We were working with a customer of ours and, obviously, without mentioning names, and they had a really interesting dataset and we wanted to know do we have any kind of intersection? Is our data anywhere intersected with their data? And if there is any intersection, can we complement this data? Or we don't have to work together because we both have the same sets of data? And the experience of getting the intersection and getting all those analytics done within a few hours, I think, is a game changing experience because now two companies can really work together without exchanging any PHI, without even having to share data with each other and can decide whether it's a good fit or not early on before getting into a study or into a project.

Karim Galil:
But to the point of the structure, that unstructured data, we invite a lot of guests to this podcast who make the claim, "Listen, structured data is really good when it comes to certain therapeutic areas and we don't have to worry about unstructured data. The case of the unstructured data become more obvious in things like oncology, things like immunology, the more complicated diseases where there's a lot of co-morbidities and the patients go through this chronic journey." Do you agree with that? Or do you think that unstructured data is essential regardless of the therapeutic area?

Jason LaBonte:
I generally think that structured data is fine for a lot of the large chronic diseases that we have built a lot of the industry around over the last couple of decades: hypertension, diabetes, some of these disease states where the coding is really well understood, it's really well populated, probably structured data is pretty sufficient for a lot of those. But, as you point out, the nuances that we're starting to see, and I think we saw first in oncology where the coding architectures just aren't sufficient, is really where unstructured data starts to shine. So I think we saw that a lot with oncology when you're looking for tumor size and staging. When you start to look at biomarkers, a lot of those elements are just not well recorded in an EHR system and certainly not in a claims dataset. And so that's where unstructured data really starts to shine.

Jason LaBonte:
But I think we can extrapolate out and say as personalized medicine becomes more and more attainable, the same things we're seeing in terms of the value of unstructured data in oncology are going to start to actually pertain back to some of those larger indications where we start to look at more and more subsets of patients with these chronic disease and say, "You know what? These folks are actually different in this segment versus that segment and I can code them both with type two diabetes, but that's actually not that helpful anymore. I need to get into those physician notes to understand what their recent test levels are, what's going on with other co-morbidities, what their ability to maintain their healthy eating lifestyle is, all those other factors." But I don't think the value of unstructured data is completely diminished at all by that trend.

Jason LaBonte:
I think, especially, as we start to think about moving into rare disease spaces with more specialty products, those patients are often not diagnosed for eight to 10 years. So even if the structured side gets better and has a code for each disease state, a lot of folks are interested in how do I mine this data for the early signal before they're officially diagnosed? How do I look through the logs of symptoms and tests that are ordered and doctors they're referred to, to understand this might be a patient with the disease that I'm seeking to enroll in a trial, or that should be getting this lab test, because I think they have the genetic condition I'm treating? That information is often to be found in that unstructured note early on here. So I do think that, as an industry, we're going to find more and more uses for the unstructured side, even as we improve the structured piece.

Karim Galil:
One funny story that we have seen in some of the unstructured data is the patient decided to stop a line of chemotherapy and the doctor wrote, "Patient decided to move from chemotherapy to milkshake and marijuana." And that kind of nuance you cannot get from structured data no matter how good your coding is because the question comes in, why did this patient stop this chemotherapy? Is it because of side effect? Or is it because the patient passed away? Or is it because he jumps to other line? The last thing that can ever cross your mind is that this patient decided to jump on a milkshake line of therapy. And this kind of nuance, you just have to go through the unstructured data to understand the context of the events that happened.

Karim Galil:
But that being said, you guys at Datavant are dealing with insane amount of data every day. Do you have access or visibility to what are the end use cases your customers share with you, what they have done with those projects, or your job starts and ends at tokenization and linkage of the data?

Jason LaBonte:
Yeah, great question. So primarily we are the infrastructure piece for moving data from party A to party B. They don't even have to move that data through us. We are just an enabling technology so that when they share directly with each other, that data is coming through de-identified and linkable. However, because we do sit between these two sides of the equation, the folks that have data and the folks that need to use data, we are actually often involved in those discussions of, "Here's what I'm trying to do. This is the problem I'm trying to solve. What types of data are out there that I could use to feed this analytic? And can you introduce me to the right folks?"

Jason LaBonte:
So we do actually have pretty good visibility into the various use cases that our clients are trying to perform. We don't do analytics ourselves, so we're not inventing a lot of these, but there are a lot of smart people doing a lot of smart things out there. And so it's a great part of the job actually to see what folks are doing, see the innovations they're leading with and to be involved in a small way with helping them out with that.

Karim Galil:
Can you share with us some of those interesting projects, obviously, things that are not confidential, but one of the most interesting projects or use cases that you have seen in the last, say, year or something?

Jason LaBonte:
I'll give you two. I think we talked a little bit about rare disease patients. One of the really interesting use cases is a bio pharma company had a new therapy for a genetic disorder. It was a rare disorder. And the concern was that physicians may have patients that could benefit from this therapy but had no idea that this patient actually had that disease, didn't know to order the confirmatory diagnostic test to say, "Hey, this is an eligible candidate." And so what that company did is they, working with a very smart vendor, basically, aggregated a number of different real world datasets into a large linked dataset, and then they built an AI model on top that said, "I have a training dataset of people I know have this disease and what they look like in real world data. And then I have all these other patients who I don't know if they had the disease or not, but I have all their data."

Jason LaBonte:
And they basically built an AI model that was able to predict which patients in the larger real world data setting were likely candidates to have this rare condition. And so, using that, they built this predictive model, and now they're at the stage of seeing how they can use that model to identify physicians who might have patients with that disease, and then educate those physicians. You know, if you have a disease, a patient with any of these symptoms, we suggest that they may be a candidate for this diagnostic test to confirm that they actually have this rare condition. And if they do, now you know how to treat them. And so that was really exciting because they were trying to pick up signals that, in isolation, in a single dataset, didn't really mean much. One test, one referral, one set of symptoms.

Jason LaBonte:
But, in aggregate, when you start to link across all these different datasets, you can now start to see a pattern and identify these folks before the physician could do so. And I think what we're going to see a lot of. Any single physician who's seeing the patient for the first time because they were referred, because the last physician didn't know what they were doing in terms of trying to identify this, that each new physician has to start from scratch. But we, sitting with real world data on top of this as an industry, can start to say, "Well, while each person's starting from scratch, I can actually see the whole picture. I can help guide physicians towards the correct treatment path here because I'm seeing more than they can see themselves." Ideally, long-term, we give the physician that tool to be able to see all of that data in one spot and they can do that themselves. But I think, right now, this is a really elegant approach to solving that problem.

Jason LaBonte:
The second really interesting use case that we're seeing a lot of interest in is how do I use real-world data to accelerate or enhance my clinical trial? Clinical trials are the gold standard for how we evaluate therapies. But in our view at Datavant, clinical trials are really just another silo of data. It's a really expensive silo, a really rich silo of data, but these are patients where we're collecting a lot of data, but we have put a subject ID on them so that we can't unblind the study that has the unfortunate side effect that we also can't bring in any more data about that patient. And so if I have a clinical trial that I've run and I have outliers that the dataset does not explain, I'm stuck as a sponsor. I have to try to guess at what might've gone wrong. Hopefully I design my next trial better.

Jason LaBonte:
What we are now doing with some of our clients is embedding at Datavant a linking key for that patient inside the clinical trial dataset. So as I said earlier, our patient keys are anonymized and encrypted. So they don't unblind the study in any way, but they do give the optionality to the sponsor to now bring in real world data about that same patient cohort. And now let's say a trial does not go well and they can't explain the results, and then, unfortunately, many of the trials end up in that situation, they can bring in real world data to say, "What is different about this outlier group that can help me understand how I can improve the trial design, the inclusion/exclusion criteria, or the other factors that are underlying why this patient responded or didn't, why they had a certain safety event when other folks didn't?"

Jason LaBonte:
That should really help us design better trials. It should help us select the right patients for the right treatments. And it doesn't even stop there. I think that by doing that tokenization in the patient cohort, once the trial ends, we can actually use real world data to follow that patient over time without them ever having to come back to a site visit. We can monitor them even as they move locations to see is there any long-term event that we need to understand around safety or efficacy? And so that's really exciting because I think it's no longer choosing between real world data and a clinical trial, it's actually saying, "Let's take the best of both worlds and let's make them combinable in a way that still protects the trial design, still protects patient privacy, but now we haven't even richer dataset for us to use for analysis on whether this intervention works. Is it safe and who is it best suited for?"

Karim Galil:
Yeah. A symbiotic relationship between both datasets, I think, is the future. The answer is not either/or. The answer is going to be the combination of both. But to the first use case that you talked about, I believe this is super interesting. A lot of patients go undiagnosed with rare diseases, and an AI model that can help detect that is, obviously, a super useful piece of technology.

Karim Galil:
The question is how are you seeing the providers? So you talked about pharma embedding tokens in clinical trials, and they have the incentive to do that, the financial and the scientific incentive to do that. But what about the providers' side? Do you see physicians using those kinds of AI algorithms, which require access to data? So, as a physician, you have to make the choice between, "Yeah, that's a great AI model, but I also don't want to get sued by the patients next day because their data were exposed." So you have this tough choice between a really useful tool but also big liability on accessing this tool or allowing this tool to access data of a patient. How are you seeing the adoption curve on the providers' side? Forget about the formula, forget about the CROs. On the providers' side? What's the adoption there looking like?

Jason LaBonte:
Yeah, I think that's going to take a while. Personally, I'm a strong believer that there always will be a physician in between data and a treatment decision. I don't believe in an AI generated treatment output where the doctor just blindly follows whatever the engine said to do. And I don't think any providers are thinking that's the right way to go. I think AI is going to be a great tool to sift through a large set of data about that patient and patients who look like them, and suggest potential areas of investigation for the physician to follow up on, potential lab tests that should be run, potential symptoms to ask about to clarify the diagnostic. But I don't think that we're going to get adoption of AI as the deciding element anytime soon.

Jason LaBonte:
So I think if we can be careful about building robust AI algorithms that are built on solid data and have been well vetted in the scientific side, I can see physicians being willing to apply those algorithms against their EHR data, to highlight patients who are at risk who needed a different type of follow-up, an earlier follow-up, than might otherwise be apparent.

Jason LaBonte:
And I think that we're seeing a lot of that happening from a population health angle from payers and providers who are looking at their overall patient population and trying to identify folks who may need an intervention that's different than what you might normally think. But I think in terms of being willing to access data and being willing to use these algorithms, there's a number of other problems that have to be solved. And, not to belabor it, but I think the data fragmentation is still the underlying issue.

Jason LaBonte:
Patients are seeing a large number of different physicians. Their data is being captured in different EHR environments, often with different data models and different data normalization rules, the ability to collapse all that data together quickly and make it available to the AI algorithm to even run on, those are big infrastructure problems for providers to solve before they could actually use an AI model at scale and in real time, anyway. So there's a lot of other problems that have to get solved, I think, before that can actually come to fruition.

Karim Galil:
A tokenization engine like yours can even go beyond the healthcare system. If you're able to marry a patient EMR record to what kind of food or groceries or what kind of shopping they are doing, now you have access to their diet. And if you can marry this to what kind of behavior they have through their credit card purchases, is this someone who is always home? Do they have some psychological disorder? Or is it someone who is very outgoing? You can marry all those data sets, which not necessarily exist, even within the ecosystem of healthcare. It exists outside healthcare. And marry this to a patient's record. Now you have a 360 view of what actually happened. I don't even think that the medical records has the full story. The story goes beyond the medical record. What kind of diet? Where do you live? Which state you're in? What kind of neighborhood you're even living in? And all this kind of data.

Jason LaBonte:
Yeah, I completely agree with that. I think the medical record is only a record of your interactions with the healthcare system. There are a host of other factors that determine whether you get a disease in the first place and what your outcome is likely to be after the fact. And we've seen that right now with COVID. The more vulnerable populations are getting hit especially hard. Just the segmentation you'd want to do to understand COVID is not available in the healthcare record. So socio-economic status, access to care, race, ethnicity, all of those variables, are poorly captured in a traditional healthcare dataset. And so I completely agree. I think we're seeing a lot of interest in ... Datasets are traditionally used in the marketing world. So datasets about consumers and their buying patterns, their housing status, their family size, those are all things that are traditionally used when folks decide how to place an ad in front of you on the Web or on your phone.

Jason LaBonte:
But that data is there and that data is available to be linked if we can run our software on it. And we're doing that in certain cases. And that now allows you to really address the social determinants of health. So what is your access to food, to housing? What's your educational status? What is your access to care and your consumer buying behavior? All of that gets really interesting.

Jason LaBonte:
I know there is a group that is looking at credit scores because they're interested in understanding, "Can I look at missed bill payments as an early indicator of onset of Alzheimer's?" So I can see patterns happening in these other datasets that give me early clues to somebody who might be having something that's going on medically.

Karim Galil:
So rather than serving you a better ad, we're going to be serving you better healthcare,

Jason LaBonte:
Exactly.

Karim Galil:
... which is, obviously, extremely more useful than trying to harass you on Instagram. We're going to, basically, be able to better understand disease. And, as you said, it all starts by how can you link the data together in a way that does not step on the patient's privacy and their concerns on data identification?

Karim Galil:
Now, you guys did an awesome job with COVID. Datavant just really was one of those companies that quickly responded to COVID and were able to put their technology in good use. Can we spend some time talking about your COVID initiative and how you guys built some sort of very collaborative approach of different companies to help them to address COVID? You want to talk a little bit about that?

Jason LaBonte:
Yeah, thanks for asking me about that, because we have worked hard and I think that's a great example of our larger ecosystem coming to bear on critical problems. So, as background, I think the COVID-19 pandemic has really revealed that our existing healthcare infrastructure, especially here in America, is insufficient to answer a lot of the key questions that would almost seem pretty basic. Just trying to understand how many people are getting infected with the disease. What is their outcome? That data, because it's siloed and fragmented across the United States, has been really hard for us to gather.

Jason LaBonte:
And so Datavant was proud to, basically, pull together with a bunch of our ecosystem partners, the COVID-19 research database, which is a collection of de-identified, linkable real world data that has been generously made available pro bono to academic and public health researchers.

Jason LaBonte:
So this collective is not just Datavant. It is Datavant with our linking technology, but the data sources that are part of our ecosystem have made their data available for free. We have technology partners who are hosting that data and building an access controlled research environment for researchers to come into, and they're doing that for free. We have data hosting that's provided for free. We have services like expert certifiers making sure that it's all de-identified per HIPAA that are providing those services for free. So it's been a tremendously joint effort to stand this up. And we stood up this research database inside of a month and really put a bunch of data out to researchers quickly.

Jason LaBonte:
To date we have over 1600 researchers who've registered to get access into this research environment and we have over a hundred projects that are live and active in a research database.

Karim Galil:
Wow.

Jason LaBonte:
Over a hundred projects. And we're starting to see those start to pop out now on the other side. So we've got some really interesting projects that started to come out, which is tremendous. I think that there are early projects looking at how mortality data, which is in the dataset, has been shouldered disproportionately by vulnerable populations and minorities. We've seen really interesting projects looking at building reopening models based on these data and what the likely effect of reopening under different circumstances would have on disease burden.

Jason LaBonte:
And so I think those are really interesting, basic science questions around the disease, who it affects and how we deal with it, that this large linked real world dataset is now answering for folks. And so it's really been a tremendous effort. We continue to get generous donations of data from more and more players. And so we definitely encourage folks to make use of that asset. It's really, I think, a great example of how this industry can come together and solve big problems pretty quickly with some pulling together of the best in class folks across all these different parts of the solution.

Jason LaBonte:
So, again, Datavant, we view ourselves as part of this larger ecosystem. We are not the best at everything that is needed, but we have lots of partners who are. And so I think we can replicate this for a lot of different needs in healthcare by now having the ability to pull together best in class folks from data and from technology, from services, and pull them all together quickly for serving some of these needs.

Karim Galil:
That's actually super quickly. You guys started that early Summer. So to see hundreds of research is being done now, that's really awesome. That actually brings me to this question: are you guys a technology company or an ecosystem company?

Jason LaBonte:
Great question. Datavant itself is a technology company. That is what we make, it's what we offer out to our clients, but like any good middleware, the ecosystem is really where the value is. I get asked all the time, "What is unique and special about our software?" There are some really nice things that we do that I think that make it really strong, but we're only as good as the people who are using our software. And so the ecosystem is really where, I would say, a lot of the value is. And so we take a lot of pride in the folks that are using our software because I think that those are the folks that are actually generating value for patients and for doctors. So I think that the ecosystem is really, I think, where we spend a lot of time making sure that we're really connecting folks together who can take advantage of each other's strengths.

Karim Galil:
Now, we come to my favorite question. The last question, my favorite question. If you can Zoom call any living person today, who would it be and why?

Jason LaBonte:
Yeah, that's tricky. I think I spend enough time on Zoom. I'm not so sure I want to do more Zoom calls. That is a complicated one. Any living person? And I'm not going to get into politics or any of those scary areas. Well, I don't know. You've stumped me on that one. Let's see. Maybe I'll come back to you on that one.

Karim Galil:
All right. Let me make it easier for you. Alive or dead?

Jason LaBonte:
Alive or dead? You know, I would-

Karim Galil:
When you actually said "Politics," I thought you were going to say Obama or something.

Jason LaBonte:
Yeah. He's still alive. I think it'd be interesting to go back to some of the civil rights leaders of the '60s. What would Martin Luther King Jr. think of where we are today? Would his approach to solving some of the issues we're still having be the same or different as the approach he took then? I think that'd be a really interesting discussion to have. I would hope he'd have insight that we would be able to apply to today's ongoing issues and, hopefully, not be sad that more progress has not been made. That would certainly be an interesting phone call to have.

Karim Galil:
That's actually a really good choice, especially given what's happening right now. And I think this applies in so many different ways, even in healthcare. I think in 50 years from now, if I ask this question, they're going to ask us, "Do you, guys, if you look at what we did today, looking back, would you have done healthcare the same way that you're doing healthcare today?" So it's always really a good retrospective question to see how your approach really worked.

Karim Galil:
Hey, Jason, thank you so much. You shared with us today really great stories and it's impressive work, especially with the COVID. So we'll just put that in congrats. Also, having someone coming from super a scientific background, like you, to lead a product in healthcare, we can see more of that happening, I think we're going to have a better healthcare system.

Karim Galil:
So, again, thank you so much and all the best of luck for you and for Datavant. Stay safe.

Jason LaBonte:
Absolutely. Appreciate it. Thank you for having me.

PODCAST — 23 minutes

Mark H Goldstein, UCSF Health Hub Chairman on Patientless Podcast #005

Mark Goldstein is a full-time investor and advisor, he runs a lit' seed fund and he is a partner at a Series A venture fund. He is also a Founder of UCSF Health Hub, UCSF's growth studio.

Listen and read →

Mark H Goldstein, UCSF Health Hub Chairman on Patientless Podcast #005

‍Karim Galil:
Welcome to the Patientless Podcast. We discuss the Good, the Bad, and the Ugly about Real World Data and AI in clinical research. This is your host, Karim Galil, co-founder and CEO of Mendel AI. I invite key thought leaders across the broad spectrum of believers and defenders of AI to share their experiences with actual AI and Real World Data initiatives.

Karim Galil:
Welcome, everyone, to a new episode of the Patientless Podcast. Today's guest is Mark Goldstein. Mark is the chairman of the UCSF Health Hub. He is also a general partner at Builder's VC. Mark generally invests in early stage healthcare data companies. But before that, he was an entrepreneur. He started a dozen of companies, and that's not a joke. That's for real.

Karim Galil:
The first time I was introduced to Mark, they told me, "Listen, I want to introduce you to a guy. He's one of the very few people, if any, who was hired by Steve Jobs, fired, then he hired him again." That's not something that usually Steve would have done at the time. So Mark started his career at Apple. He was also an early advisor to Salesforce. Then later on, he decided that he wanted to take all the scars, all the things that he learned from starting his companies, and help entrepreneurs like me start data companies.

Karim Galil:
Mark is the kind of a guy that he can call it 3AM in the morning and you say, "Hey, everything is going wrong. What can I do now?" So, Mark, thank you for coming to the show, and I really appreciate it. Thank you.

Mark Goldstein:
Absolutely.

Karim Galil:
So, Mark, why don't we start by you telling us more about the UCSF Health Hub and your role there, one of the very early people who started this entity at UCSF.

Mark Goldstein:
Sure. Well, UCSF is a pretty amazing place. It's, I would say, worldwide, there's more healthcare innovation that starts and/or happens around UCSF than anywhere else in the world. So here we are in Silicon Valley, and we have this amazing resource called UCSF, but we started Health Hub to help companies around UCSF, more closely work with UCSF and help inventors and doctors and clinicians at UCSF, who were trying to basically start companies, get connected with the right types of advisors and mentors and investors, so they can scale and grow their companies.

Mark Goldstein:
So that's the goal. We've got thousands of member companies at this point in time, and we also produce what's called the UCSF Digital Health Awards, which is really becoming the Academy Awards of digital health.

Karim Galil:
You've put together an amazing team actually at UCSF, like professors that people would wish to just have half an hour of consultation with. How did he convince them? How did you convince them to take some time from their academic work and sit down with startups, founders, and advise and walk them through?

Mark Goldstein:
Well, look, these professors and doctors and clinicians, one of the reasons they're at UCSF, one of the reasons they're in San Francisco, is to basically hang out with people that are changing the world and that are ... So here's an opportunity that says, "Here, I'm going to put together a curated list of absolutely some of the best new entrepreneurs thinking through great new healthcare ideas. Would you like to meet them?"

Mark Goldstein:
So pretty much the universal answer that from them is, "Damn straight, I do. That's why I'm here. That's why I'm at UCSF. That's what it's all about." So it's, it's really a win-win for them, too.

Karim Galil:
So two kinds of industries are really hard to invest in. Hardware obviously is one of them, but also healthcare. It's not easy to invest in healthcare. What attracted you to healthcare? You could have invested in so many different kinds of sectors, but you're very focused on healthcare. Your portfolio is amazing. You actually make very early bets on companies that are even pre-revenue very early on, on ideas, which is quite risky investments. It's not for everyone. Why are you in healthcare?

Mark Goldstein:
Well, there are two reasons. One of which is I give my friend, Marc Benioff, some of the credit because he opened up my eyes to healthcare. I spent a few years making all of his private investments, and he got into healthcare very early on, at least early on from us non-healthcare investor types. So he opened up my eyes. But secondly, I realized that healthcare is, in a lot of respects, behind FinTech. So it's behind financial services. It's behind retail and eCommerce, where all the transformation we've seen in the financial technologies and we've seen in retail haven't necessarily happened yet in healthcare. We can get into the reasons. So I just said, "Well, healthcare is what's next." I love it. There's ... some of the best entrepreneurs in the world are focusing on it.

Karim Galil:
So what are the reasons why is healthcare hard? So from all your investments, what are the friction points? Why aren't we seeing the same progress that we're seeing in other industries happening in the healthcare?

Mark Goldstein:
Well, first of all, it's beginning to happen. So if you look at what's happened in the last few years in digital health, the transformation in healthcare is fast and furious, and it is going on. Relative to five years, there's so much innovation now, it's somewhat mind-boggling. The real reason is data. There's data. You can access a million patient records if you needed to. A great entrepreneur can figure that out.

Mark Goldstein:
So all of a sudden there's data, and from data, you can basically start making inferences. So all of the great analytical ideas that we saw Amazon pull off in the early days, and we saw the E*TRADEs and the Charles Schwabs in the world pull off in the early days of financial technology, hospital systems and entrepreneurs and clinics can do the same things today.

Karim Galil:
So is data now in healthcare a vitamin or a painkiller?

Mark Goldstein:
Oh, a vitamin. So I think that when I look at vitamins, to me, vitamins are things you probably want to eat every day, gobble nonstop. I have vitamin C every day. I don't want to take a painkiller every day. So we need ... to me, data is a requirement. When you say vitamin, it's almost like water. So I would even go beyond that. I would say data is just like air and water. Without it, you're nowhere.

Karim Galil:
So you're in the campus at UCSF. At the same time, you're in the heart of The Valley. You're seeing companies being started. Do you see a gap between the innovation that companies are working on and the bedside, like right next to the patient? Do you see that there is a gap? Obviously, San Francisco is not the center of the world. How do you see it here in California at UCSF? How do you see it in the country in general?

Mark Goldstein:
For sure, there's a gap. The older hospital systems, and you could argue that UCSF is a older hospital system, this is separate from a lot of the research that goes on, but they have legacy software. They have legacy systems. They're forced to use Cerner and Epic and the traditional EHR platforms, because that was the only game in town. That's what they had to use a decade ago when they had to start transforming how they worked and worked with their patients, worked with their doctors, and worked with third parties.

Mark Goldstein:
So, yes, there's a problem. The problem is that you have these legacy systems and we're spending so much time reconfiguring legacy systems to take advantage of the new world.

Karim Galil:
So how can founders tackle that? Is it a problem in their pitch? Is it a problem in their offering? Or is it just a problem that the market is not there yet? It's still early on in the adoption curve.

Mark Goldstein:
The market is here now. So, and you could argue five years ago, for example, when you started Mendel, you were earlier. You've got a lot of arrows in your back and you're starting to ... you're pulling those arrows out. Every few months, another arrow comes out. Today, it's easier. One of the reasons it's easier is if you're a founder and you're not a clinician, you're not a trained doctor, you better find one because if you want to basically get involved in healthcare, you better have someone sitting next to you who understands healthcare, who's basically gotten their degree.

Mark Goldstein:
They have bedside manners with patients. They understand what makes healthcare work. So arguably, if you're a technical founder, if you're not sitting next to someone like that, don't bother. Find one. Now, if you are that type of person, say if you are a doctor, make sure you do sit next to someone who is technically as adept with wrangling data as anyone else, anyone could be in the world.

Karim Galil:
One of the pieces of advice that you gave me early on, and I thought it was funny, yet it's a great piece of advice is you said, "If you start a company and you have your company meeting, if you sit around on the table and you're not the dumbest guy in the room, you just didn't put the right team together."

Karim Galil:
So what is the right team? You talked about the technical founder having also someone with a clinical background. What have you seen work? What kind of teams are there, also commercial side of things? What works? What kind of teams should people put together?

Mark Goldstein:
It's like cooking. You're in the kitchen and you have to know that there's certain ingredients that you've got to put into your cake, your steak, whatever it is that you're making at that point in time. Those ingredients better damn be good because the alternative is you could just order a pizza. So better cook up something great. The only way you're going to be able to cook up something great is to have people or to have the knowledge to know what spice is needed at what point in time.

Mark Goldstein:
When do you turn on the stove? When do you take that pie out of the oven? How long do you wait for it to cool down? These are things that you just can't guess. Now you could guess, but it's probably not going to be a great cake or a great pie. So you better have someone around you or you better have the experience to know when it is to put that cake in your mouth.

Karim Galil:
Many of the guests we invite to the podcast come from the clinical side of the equation or from the technical side of the equation. You come from the funding side of the equation here. In most of our conversations, the theme that I see is you're always questioning where is the money? Okay. Great idea. Great technology. But unless you are making money, there is no company here.

Karim Galil:
So what's your take on that? Why aren't we seeing companies making money easily in healthcare when it's three year-plus industry? What have you seen work? What you haven't seen work? Is it a problem with the payers and the pharma because these are the guys who have deeper pockets? Or is the problem in business modeling? I'm interested to know more from you.

Mark Goldstein:
I used a simple sports analogy on this one. Healthcare companies basically have to realize that they are running a marathon, not a sprint. Now, if you're in some other sectors, for instance, if you're you're focusing on social media or you're coming up with a consumer play, or you're doing something that it's about speed to market, it's about quick validation and quick turnaround, you're running a sprint.

Mark Goldstein:
In healthcare, it's a marathon. Why is it a marathon? It's because there are certain things you have to do. There are certain protections, for example, your patients need that are required, and you're going to have to do to validate your solution. So you have to go through this whole validation process. You have to get your first, what we call, logo or first hospital system on board.

Mark Goldstein:
They have to go through a year or so of, "Does this work?" Some cases, there's FDA approval process. So healthcare just takes longer. So A) you have to recognize it's taking longer, but at the same time, if you're not always looking for where is the money, where is it, what is the outcome here, where do I want to end up, you're going to get lost in your technology, and you're going to get lost in your idea. Before you know, it, you could potentially be out of money without having a solution that people are going to want to buy.

Karim Galil:
So healthcare investors are typical investors. They need to see some thresholds of some commercial activity before they're even able to put more money in. So to your example about the marathon, what are those things that people need to focus on at the time? Are we talking about, as you said, big logos, or is it more like smaller hospitals, people who are willing to sit down and give you more time and test your product, or is it doesn't really matter as long as you have some user traction, the company is in a good shape?

Mark Goldstein:
Well, first thing is you better be really clear about the problem you're solving. Is it a problem? Is it a life and death problem? Is it an important problem? We were talking about vitamins and painkillers before. Make sure you don't focus on a vitamin that people don't have to take. If you're not in healthcare, there are lots of options. There are lots of alternatives. If you're not basically saving lives or meaningfully making people's lives better, you're focusing on a wrong problem. So you might want .. you have to rethink. So that's part one. Focus on a problem that matters.

Mark Goldstein:
The second thing I would say is don't lose sight of who the customer is and why they need to buy, and that you need to be top of mind for them because in a given day, in a hospital system, there's so many things going wrong. Think about it. A patient gets wheeled into the ER, and the number of decisions that have to be made for that patient in the first half hour is mind-boggling. It's in the tens of thousands of decisions, and it requires a lot of insight, and make sure that you're building something that's going to be top of mind. If you're like the 999,000th thing that somebody might have to do, you know what? They're going to forget you, and you're probably not going to get renewed.

Karim Galil:
You've mentioned early on that a great entrepreneur can manage his way or hustle getting access to one million patients. This data should allow them to start building different applications on top of it. One of the themes that we try to have here in our podcast is to define the word "data" because very few people know that 90 or 80% of the data here are actually unstructured. It's not even data that is ready to go. It's not analytics-ready data.

Karim Galil:
What's your take on that? What's the state of the union today when it comes to that quality of the data, and the quality of the data that startups are working with?

Mark Goldstein:
Well, it's getting better. Look, one thing that I've learned is there's nothing called perfection. Don't wait for perfection. You've got to work with what you can work with, and you've got to keep on learning and you've got to keep on testing. Test and learn, test and learn, test and learn. Try to find the best data that you can, recognizing that it's not going to be perfect. Start throwing some of it out, adding some of it new. That's going to help you focus on the problem you're trying to solve, because you don't need an ocean of data. What you need is the right lake of data, but typically you don't jump in the right lake to start, where it's probably more typical you jump in an ocean because oceans are everywhere.

Karim Galil:
I attended one of the webinars that you put together for UCSF. It was amazing. It was 500 plus people on the Zoom call. It was like, "Wow, there's a big audience here." The theme was COVID-19 and healthcare data. How can companies and how can data help in COVID-19? What is the latest here? What have you seen startups doing about COVID-19? Any success stories that you can share with us on that?

Mark Goldstein:
Sure. Well, there's tons of it. First of all, our government completely fumbled everything. We're like, "Oh, well, the CDC is going to come up with a test and we're all going to be good." Well, that clearly didn't work. What did work and what is working is just getting the government out of the way and having private enterprise innovate. So the innovation you're seeing in testing and the innovation you're seeing in tracing and the innovation you're seeing in devices and in RPM is pretty, pretty fantastic over the last six months.

Mark Goldstein:
It's imperfect. There are a lot of problems. Nothing is getting to market as efficiently as we'd like, but in a lot of cases, there's hardware build-outs required. So it delays a lot of devices from coming to market. But I just think across the board, we're starting to see the transformation. Look at the death rates. Look at the learning we have. We used to think, "Oh, well, let's just have a million ventilators in every hospital system. That will solve the problem. When someone comes in, just hook them up to a ventilator."

Mark Goldstein:
Clearly, that is not what's working. Those ventilators, for the most part, are just sitting there. There's a lot more triaging and a lot more intel, a lot more things that we are collectively learning that is making this crisis somewhat less of a crisis.

Karim Galil:
I was amazed by the concept of remote testing in cancer. You guys were mentioning that you were working with companies today who can do some remote testing when it comes to oncology. I see that to be very intriguing because historically, a cancer patient has to leave home, has to travel to the hospital and the ability to get care all the way to his home is amazing. So, Mark, how do you see 2025 in healthcare? Even, you see all the cutting edge technologies that are coming out there. How do you see 2025?

Mark Goldstein:
Well, I see it through the lens of the next entrepreneur that I get to work with. I spend well over half of my time when I'm not at UCSF, it's well over half, investing in companies, and the amazing ideas are all coming from the entrepreneurs that are in there. They're in the foxhole. They're figuring it out. They're trying to solve a problem that honestly, I hadn't thought was a problem or most people hadn't thought it was a problem.

Mark Goldstein:
So what do I see in 2025? It's basically a function of the great entrepreneurs that I have the opportunity to invest in. So that's what I do. I try to invest in 10, up to 10 or so, a year. In a nutshell, one thing that isn't going to change is telehealth. We know that the number of doctor visits that people have today when they have a problem and how they have to go through what we're all calling this triage process, is insane. I think there's going to be probably 50%, if not more, less doctor visits.

Mark Goldstein:
These are basically devices, remote devices and telehealth and various other systems are going to make it so much easier for patients to say, "Hmm, something feels wrong" and basically just strap on their phone or make that telehealth call to a robot or a clinician, before just saying, "Oh, my God, I got to go to the ER."

Karim Galil:
I agree. Talking about talking to a robot, I work in AI and I find the kind of friction, the behavioral friction to get a physician or to get someone even at pharma to interact with an AI system. I cannot see a patient, at least for now. I can't see my mother talking to a robot if she has a health problem and trusting whatever interaction is going to happen there. Any takes on that?

Mark Goldstein:
Well, it's the 80/20 rule. 20% of all patient engagements are going to have to be one-on-one. They're going to have to be, we'll call it ... We'll say "in person" for now. The virtual and tele-technologies are going to get better. So I think increasingly, less and less will have to be done in person. But, look, 20% is always going to end up being there. I'm talking about the 80%, the 80% of visits and meetings and discussions that lead to the 20%. That's what we have to automate. We have to strip the costs out of the system.

Mark Goldstein:
So it's really an imperative. It's not like, "Oh, sorry, Mom. Oh, you don't want to do it this way? Well, you don't have to." No, the system is going to say, "Sorry, you have to do it this way. You have to go through a process, so that we know when we're down to that 20%, what the problem is and what doctor that can best serve you."

Mark Goldstein:
That's what goes on in retail and financial services now. It used to be in financial services, you'd have these meetings with these stockbrokers. They scribbled down a few notes of some investments they'd make. They'd go and they start buying individual stocks for you. Then they'd read the newspaper on their train into the city and decide whether or not they buy or sell those stocks. Think how stupid that is today. None of that goes on. Everything has changed.

Mark Goldstein:
Even I think of retail, how that's all changed. I think of how we buy today. You speak to someone who's been living in a cave for 20 years and you talked about these processes and what's changed, and it's phenomenal. It's phenomenally great. But these changes haven't necessarily occurred in healthcare, but they will by 2025.

Karim Galil:
You talked about AI and robots. How, as an investor, or even seeing how the process at UCSF works, how are you able to weed out good tech out of bad tech? Obviously, all the physicians are not trained to understand what is deep learning, what is symbolic AI? We can keep throwing all these terms every day. How are you able to make investments? How are you able to make diligence? How can people, when they are making solutions, really understand the quality of the technology, given that it's not easy? It requires an engineer in many cases to at least understand the terms that are being coined.

Mark Goldstein:
Well, first of all half of the early stage investments I'll make are not going to work. So I have to be very honest with that, is that that I have to recognize that even though I think I have this amazing intuition, a lot of it is luck. So I got to basically think about how do I mitigate my risks. So one of the way I mitigate my risk is I focus on again, the entrepreneur. Is this founder an entrepreneur who's going to be able to pivot and change and modify how he thinks about things or how she thinks about things? If it's someone who's a little more brittle, there's no way they're going to be able to be successful.

Mark Goldstein:
So I would say that that is my tenant number one. That's where I start with all this. I start with the founder who is thinking through the problem and understanding that whatever they're thinking through and whatever solution they have at the onset, isn't going to necessarily work. That's why this initial validation, getting to your question, that's why having the right hospital system and the right customers, as partners, as you go through validating your initial solution, is so critical because some of your assumptions are simply going to be wrong, and you have to have someone who's willing to work with you.

Mark Goldstein:
It's why in healthcare until you have a good logo and a great hospital system that is fundamentally endorsing what you're doing, it's really tough to scale the business and to sell it to other people because everyone looks for that first validation.

Karim Galil:
So my last question is if you can Zoom call any living person today, who would it be and why?

Mark Goldstein:
Oh, surprise question. Well, if I could Zoom any living person today, so I would basically say I would wake up in the morning and ... You know what? It's going to be different because at this point as an investor, I live for my entrepreneurs and I live for my entrepreneurs that are trying to solve incredible problems. I've made a decision to focus on starting a company. I started a bunch of companies. That's not what I'm doing now. I'm actually helping people with their companies. So I have, call it 20 or so, entrepreneurs that are trying to solve really, really important problems.

Mark Goldstein:
So the answer is that given day, when I woke up whatever entrepreneur I have who I'm trying to help, the biggest problem they're having, their biggest life and death problem, I'm going to basically want to speak to that person that they want to speak to.

Karim Galil:
Mark, that's a really, really nice answer. It's true. I know that it's true. A couple of things that happened to Mendel, I can believe. Four months after Mark made his first bet on Mendel, I came in and I said, "Listen, all the tech is not working. The problem is harder than I thought it is. We're going to have to start all from scratch." I was expecting something around the lines of, "Where's my money? Give me my money back."

Karim Galil:
Actually, he didn't. He was like, "It's fine. Let's sit down and just try to see what we can do." Two, three years later, Mendel was able to build tech like amazing piece of technology, but it's not only because we hired the right people, because we also had the right mentors and the right investors around the table. When COVID started, the first call I got was from Mark. It was like, "Hey, rough times ahead. What is your plan?"

Karim Galil:
Before I even start talking, he was like, "Three points. I need a three bullet-pointed plan." So thank you so much for being always there for us, for founders in healthcare. Thank you also for spending time trying to get all the learnings that you have outside healthcare to healthcare, because I'm a trained physician. I would have never been able to think from a commercial perspective, how can I scale a company? How can I build a team, all that kind of stuff. So thank you again. Thanks for making the time for the show today.

Mark Goldstein:
Absolutely. This is always fun. It's always great seeing you and hearing you, so ever forward. All good.

PODCAST — 32 minutes

Melisa Tucker, VP PM & Operations, Flatiron on Patientless Podcast #004

Melisa Tucker, VP Product Management & Operations, Real World Evidence at Flatiron Health was our guest on Patientless Podcast.

Listen and read →

Melisa Tucker, VP Product Management & Operations at Flatiron on Patientless Podcast #004

Dr. Karim Galil: Welcome to the Patientless Podcast. We discuss the Good, the Bad and the Ugly about Real World Data and AI in clinical research. This is your host, Karim Galil, cofounder and CEO of Mendel AI.I invite key thought leaders across the broad spectrum of believers and descenders of AI to share their experiences with actual AI and Real World Data initiatives.

Hi everyone. And welcome for another episode of Patientlesss podcast. Today's guest is Melissa Tucker. She started her career at McKinsey shortly after getting an MBA from Harvard, and then she went to the venture capital world at DFJ. Then, she decided to do the real thing and join Flatiron. She joined Flatiron six years ago. We're talking about Flatiron when it used to be a small startup and now, six years in and after a big acquisition, she leads product management as a VP for Product Management and Operations. Thank you so much Melisa up for taking the time for the podcast. And I'm very excited to have you, as a guest today.

Melisa Tucker, Flatiron: Thanks Karim, it’s great to be here.

Dr. Karim Galil: So obviously we all know about Flatiron, but for folks who maybe didn't hear about it before, which I really doubt, give us an idea what you guys do, what is Flatiron and what's the mission of the company.

Melisa Tucker, Flatiron: Yeah, absolutely. So Flatiron is a health technology company and we focus in oncology and our mission is to improve the lives of cancer patients by organizing the world's cancer data. So how we do that is, we have about half the company that builds tools and software for providers, community oncology, as well as academic oncologists. And through those tools, we improve their workflow and their ability to work with their patients. On the other half of the company, we. process and consolidate that data and spend a lot of time extracting that data and making it useful so that we can then create research datasets that are then used by cancer researchers, either in academia, life sciences companies, regulatory market access, and the goal is to create richer, larger data sets than have been possible traditionally so that we can improve decision making, accelerate research, and get effective therapeutics to patients faster.

Dr. Karim Galil: So which half of the company do you belong to: that provider facing half or the data side of the company?

Melisa Tucker, Flatiron: I lead the product management team on the data side of the company.

Dr. Karim Galil: So, you're working on the data, and you guys started that category of Real World Data, really? It became more of, I think when Flatiron started this like six years ago, before that it was mainly, the industry was very focused on I would say breadth rather than depth. So we had a lot of claims data, a lot of structured data. Can you walk me through the adoption curve? You joined Flatiron when it was 40, I think 40 people headcount. There you guys are still something. What's the adoption curve looking in the last six years in Real World Data?

Melisa Tucker, Flatiron: Yeah, absolutely. So, when I joined actually we had just raise our series B in acquired Altos, which was an oncology EMR company. So all of a sudden we went from about 20 sites in our network to 250. And overnight we had this, this data set that we thought was sort of at that critical scale to be able to do something useful with on the research side. And then, I mean, it was pretty simple. We just started opening charts and kind of looking at the data and seeing what was available there. And what we found was, and I think this is particularly true in oncology, even more so than in other therapeutic areas, almost all of the rich clinical data that you really needed to understand oncology was trapped in these unstructured notes.

So, the scan documents that come in, things that come over fax, sometimes are not even searchable or OCR-able, and just recognizing that a lot of the information, even about what type of cancer the patient had, or what stage of disease that was, and especially about genomics and understanding the specific makeup of the patient's tumor, which is really critical for understanding how best to treat that patient, that's all information that's just not readily available in structured form. And so what we found was we really need to be able to have a way to process this data in a scalable way, and to do so with high accuracy and reproducibility, and, and the ability to go back to source, right?

So if anyone ever asks us about a data point, we could actually go back and say, this is how we captured it. These are the, the, the documents that were reviewed and, and synthesized in order to come up with this data point. And so to do that we started essentially combining technology and people. So we built Patient Manager, which is our internal tool that is essentially a web interface for our team of abstracters who all have clinical training but are essentially opening the charts and sort of combing through them to come up with the information that we think is important to understanding that patient. And so we built Patient Manager to help those abstracters be more efficient, so that we can monitor them and understand the accuracy of what they're capturing. We can train them and, test them and compare what two different abstracters say. All of these kind of, data quality processes that are built in along the way. And then on the human side just finding the right people, hiring, scaling, figuring out how to manage this very large remote workforce that is helping us do these tasks over and over again. And so that was really the first probably year or so when I joined. And through that process, we were talking to a lot of life sciences companies, trying to get their feedback about what was the minimal data set that they needed to actually be able to do something valuable and how many patients they need that information on, and just trying to understand their use cases and try to build kind of an MVP that would meet those use cases. And it was tough going for, for probably the first 12 or 18 months, and then all of a sudden, and anything in that, during that timeframe, we were trying to find the right balance between depth and breadth, right? So we knew we needed to capture more depth than what other data sources had in the past, but we didn't quite have the right balance until, I remember this one meeting with a customer where, she just gave us this very blunt and sort of honest feedback that we didn't, that we didn't know kind of where we wanted to play relative to other data sets out there. And after that, we actually changed our whole strategy and started going a lot deeper and looking for the information that really only we could get, even if it meant a trade-off around patient numbers. And then all of a sudden, six months later, I feel I looked up and I realized, we had actually achieved product market fit. People were buying these data sets. We were way behind on hiring, relative to where we needed to be. And, and really it's been all been all about scaling since, since then.

Dr. Karim Galil: It's a very interesting point that you just raised actually, that sometimes it's not about how many patients you have. It's how much data you have about these patients. Do you see that being the main buying value or is it really situational, where depending on the customer, depending on the study, you have to make a decision? Or is it honestly throughout all the way, all my customers would like to see more depth rather than breadth?

Melisa Tucker, Flatiron: Yeah. We found that it's just very customer dependent and even within a single life sciences company, it actually depends on who is thinking about this? So the mistake we made initially was we said, we're going to be one data set that's going to help commercial teams figure out how they're doing relative to other therapies and understand market adoption. We're also gonna work with outcomes research teams who are interested in conducting studies and publishing. We're also gonna work with development teams and, and it just turns out it's impossible to build one dataset that does all that, because the way that each of those groups would trade off between depth and breadth, or the specific data points that they need, are all different. And so, kind of focusing in on the target customer and understanding their specific use cases, and then kind of initially finding, you can get to that product-market fit and then building out from there, I think is, is the way to go. And, it turned out that we were not the right data set for some of those customers, and we, once we said that was okay and we're not gonna go after them, I think that really helped us kind of focus in on what we needed to do and prioritize.

Dr. Karim Galil: I had a guest on one of the podcasts and he said there is actually a billing code for catching fire on a surf board. And there is no billing code for non small cell lung cancer. I didn't know. There's actually a billing code for it. What kind of data have you seen consistently missing from structured data, and you can only capture if you go deep into the notes and the scans and some of those data elements specifically for oncology?

Melisa Tucker, Flatiron: Yeah, so I mean, I'll start with a really simple one, which is date of death. This is something that every EHR probably has a field for it. It's not consistently populated because when you think about the workflow of an EMR, the point at which you find out a patient may have passed away, you may not be going back into their chart actively and updating it. So that's something that's very simple that we find is not routinely captured in structured data. And we also have this issue of patients may go to hospitals, or into hospice before they pass away. And so that information may just never make its way back to their treating oncologist. But then, from there it gets a lot more complicated. So, biomarkers, I mentioned earlier, genomic testing, this is with the trend toward precision medicine and developing treatments that can really target the specific mechanism of a patient's tumor. This is information that is really needed to kind of determine a patient's treatment journey and understand why are they getting one therapy over another, or why might they respond to one therapy over another. And it's really not readily available. It comes back as a scanned document, typically from whatever lab was conducting the testing. So that's one where we've spent a lot of time thinking about how do we capture this, depending on the use case and depending on the depth of that genomic information that's needed. And then progression and response are the main end points that are used to kind of understand a patient's outcome other than death in oncology. And so this is really the concept of, is the patient's tumor progressing or getting worse, or is it responding to therapy and shrinking essentially? And these also are just very tough, complex clinical concepts to understand. They're often not distilled from a single data point. You may be, a physician may be looking at scanned radiology scans with the patient's tumor. They may be looking at lab results. They may be conducting just a visual assessment of how the patients doing and are they able to get around in my office, and sort of integrating all of those perspectives into the therapy is working or not, and therefore I'm going to continue the patient on the therapy or switch them to something else. So a single data point, the patient progressed on this state, is actually really complex to capture and you may need to integrate a whole number of potentially conflicting pieces of evidence in order to come up with that assessment. So thinking about, how do we capture that in a way that's reproducible, that's scalable, that can work for different providers and different source systems has really a lot of work on our part.

Dr. Karim Galil: One of my mentors used to work for a very interesting biotech, I’m not gonna mention the name, but he shared with me a very interesting story that he did with Flatiron. Their drug had a very rough competition at one point, and they really didn't know what to do. Right? Their salespeople were getting killed out there, and then they came to you guys. They went to Flatiron. They were saying, listen, we have the problem. Can you guys help? Now I'm talking about a very commercial use case rather than an R&D case. And using the Flatiron data, they figured out, yeah, their competition may be cheaper and may have close or similar outcomes to their compound, but guess what? On a subset of patients, their drug was significantly better. And they were only able to do that using the Flatiron data. Now, all what they needed to do is go back to the salespeople and say, when you go to the oncologists, say, listen, if you have that kind of population, I have the best treatment out there in the market, and here's the data for it. Don't even take my word. And he was saying in less than six months, their sales numbers went like completely different track than what they expected. And I thought that’s, wow, this is a very impressive case here where commercial teams are actually making decisions informed with Real World Data. Is that the majority of your clients? Is a majority of your clients coming with those kinds of commercial use cases? Or is it 50/50 R&D v. commercial? And by the way, I thought this is a crazy impressive story: they didn't know that their drug performs well, like they are manufacturers and they didn't know that this is the sub population that they need to focus on.

Melisa Tucker, Flatiron: . Yeah. That's a really interesting story. I mean I would say our use cases are pretty split between commercial and outcomes research and development. It is the use case of understanding a subpopulation or a potential way to better target treatment, versus what you can study in a clinical trial, I think though is something that I'm very bullish on for RWD generally, because we always need, in my opinion, we're always in need of RCTs to understand whether the drug is working and whether it's safe, but there's only a certain number of people who can be studied a randomized trial. We know that they're biased toward certain demographics and certain people who participate more frequently in those trials. And there's always gonna be these rare sub cohorts, whether it's certain co-morbidities, or a particular biomarker, or other examples. And we did a study on male patients with breast cancer which is a tiny fraction of patients with breast cancer. They often can't be enrolled in clinical trials, but I think these are areas that are perfect for Real World Data because you're going to be capturing this information anyway. Because these patients are being treated in the real world and if you can aggregate across, they may be very rare at individual clinics or sites, but if you can aggregate across lots of clinics and sites and process the data and curate it in the same way so that it's actually analyzable, I think there's just so much that you can learn there that you could never even design a trial to understand or enroll a trial to understand. So I think this is an area where we've seen the most impact generally and whether that leads to either a label expansion in FDA, whether it leads to, now the company is going to go run a trial to better understand that population or talk to physicians about it better, or, be able to talk to patients about it better, I think those are all things that are I think very meaningful and sort of unlock potentially more effective therapy then we would have known about otherwise.

Dr. Karim Galil: How do you see the market? Are the sponsors or the pharma companies, they want to get the data and do their own work on it? Or they would like to see more vertical integration where, give me the data, offer me the professional services on top of it, maybe even give me some epidemiology work? Is it do it yourself kind of an approach, versus I actually would like to see the outcome, I don't want to know how it's being cooked in the backend?

Melisa Tucker, Flatiron: Yeah, it really depends on the customer. I think there's a huge variation in the industry between, sort of range in terms of, how much companies have invested in Real World Data data capabilities, how much they're equipped to do the analysis themselves. And then sort of able to do that. So we do have some customers, often smaller customers but not always, who come to us and sort of want an end to end solution. They want us to help them work on the protocol, write the analysis plan, give them the analysis results, and sometimes even support through a publication. And that is something that we have built capability to do, or sometimes partner with third parties as well to do. And then we have other customers who have dozens or even hundreds of people who know how to analyze Flatiron data, who sometimes whose job it is entirely to analyze Flatiron data. And so for them it's much more about, I want to be able to see the data and run with it in-house and someone internally is going to be able to collaborate better with understanding exactly what the development team needs, or how this fits into my regulatory submission or, how I want to publish on this. So, we've supported both. Our goal I think is really to try to teach people to fish. And so the more people who are developing this capability internally, that's something we're very supportive of. And we've tried to build customer support tools, ways for people to quickly get a response so that if they're using the data, they can talk to someone live. And then even branching out into data usability. So we're developing shared R-packages that different users at different companies can contribute to. So these are all things were kind of due to just increasing usability of Flatiron data and Real World Data data generally, so that we can get more people using it because, you know, initially the hurdle to learn the data and figure out how to how to use it can be pretty high. And so we want to shorten the time to insight, right? From when you get the data set initially to when you actually get something that's useful that can help you make a better decision or publish on it, to get that down so that it's not taking customers nine months as it sometimes did in the early days.

Dr. Karim Galil: One of the big debates in the industry is how big is the market of Real World Data. So you find IQVIA’s market research estimate $80 billion, while you see a lot of other market researchers are saying $2 billion to $4 billion. From 4 to 80 I mean this is not has not even, it's a huge scale, and I've seen companies saying that it's neither of the numbers actually. And it has to be categorized based on whether it's Real World E or Real World D, like if it were Real World Evidence or Real World Data. Anyways, what's your take on the market size? Obviously when you guys started, there was no market I believe at that time or not a significant market, so what's your take on that? What's the market sizing today, how big is that?

Melisa Tucker, Flatiron: Yeah I mean I think it really depends on which market we're talking about. I mean Real World Data is almost a platform, and when I think about what are we using the real world data for, I mean I think those are kind of the markets that are a little bit easier to think about sizing. So if we're talking about outcomes research or health economics, I think that's one market. If we're talking about Real World Data to support or supplement or enhance clinical trials, I think that's almost a totally different market, different user base, different budgets, and certainly different scale of budgets. So I think it really depends on which of those markets you're talking about and different companies play in different spaces, and to some extent maybe doing different things within those. So in terms of the sort of classic Real World Data market when we started, as you said, it was structured EHR data, it was claims. I would say today the market is a lot bigger than what it was when we got in it, and so when you think about the types of things that you can do with unstructured data, I think we've seen companies grow their budgets quite significantly over the past five years. In a way that's kind of commensurate with the value that we can deliver. And from here I think there's significant potential when I think about the roles that Real World Data can play in clinical trials to kind of further tap into those larger development budgets. We just launched perspective clinical genomic trial at a number of our sites in partnership with one of our sponsors, and there you're talking about budgets that are kind of a different order of magnitude from sort of the Real World Data budgets that we've been talking about in the past. So obviously different level of complexity as well. When you were talking about enrolling patients, consenting them, running genomic sequencing on them, and being able to capture additional information that's not routinely available in the chart. So I think that's essentially a different market entirely, and I think if I had to guess for those bigger numbers that you're talking about where IQVIA would come from.

Dr. Karim Galil: So is Real World Data/Real World Evidence a vitamin or a painkiller and why? Is it good to have or if you don't have it, you are at a disadvantage in 2020.

Melisa Tucker, Flatiron: I mean I think we're at the point where it's really kind of a must have in oncology right now and when I think about the penetration that I've seen, certainly other companies as well that are coming into play here, it's almost table stakes for a lot of companies to be able to understand even simple things as we talked about. And I think this is different than other diseases but even simple things in oncology like understanding who's getting your drug and how they're doing, I mean these are things that require really rich and broad data sets beyond what's available elsewhere. And I think there are other problems that Real World Data can solve, that I think are even more exciting and interesting ways that you can help accelerate drug discovery, make better and faster decisions on whether to move forward with a particular drug, or identifying patients who are eligible and could recruit to a clinical trial. I mean these are all things that I would say have even bigger impact and truly when I think about the painkiller analogy, these are things that if you could do that you absolutely would have to do that in order to stay competitive, and also just have a very real impact on patient's lives, right? And that's kind of ultimately what we're all here for at the end of the day, and I'm just a big believer in there's a lot that we've done so far but there's a lot more that that's still out there.

Dr. Karim Galil: So one of your biggest fans is David Shaywitz. He writes a lot about you guys in Forbes and his theory is that what Flatiron is doing is changing how people perceive how we can measure the effectiveness of a drug. One very interesting piece he wrote was “The Deeply Human Core Of Roche's $2.1 Billion Tech Acquisition -- And Why It Made It” and in this article he goes on why you need a very deep manual engine at the very core of tech company like Flatiron. You guys were started by ex-Googlers, people who know their way around technology. Why is technology not able yet to automate the generation of Real World Data and abstraction of data? Out of.. one question is like a lot of people who know AI is AI is all about labeled data, and you guys have been working for five years labeling data pretty much. So what's the technical challenges of healthcare that still requires a deep human core to make it happen to make it work?

Melisa Tucker, Flatiron: I think in healthcare it's so important and so hard to get it right. And so when we started, our focus has really been on we want to be the highest quality approach out there, and we want to have confidence in every data point that we're shipping because we know it's going to be used for some really important decisions and we want to make sure that we can get it right. I think the other part of this is in tech we haven't been good at kind of explaining the black box and what goes in it. And when you think about things like using Real World Data data in support of regulatory decisions I mean these are incredibly you know, fairly conservative organizations that have been doing things the same way for a long time, and unless you can explain to them exactly how a process works and how we got to it and what the biases are and how missing data affects it, they're not going to buy it. And so for us we really started with the philosophy of, we want to get it right and we want to make it explainable. And so that was why frankly for the first few years we almost had nothing to do with AI and ML, and we were focused on how do we build essentially a way to capture this data, for humans to capture this data, in a way that we can really stand behind. And to your point, along the way we've obviously seen the benefit of being able to label a lot of that data, and then and then I think finding opportunities where we think that certain tasks can be really well done or easily done by machines and sort of replacing them along the way, and so just kind of taking a pretty incremental approach to make sure that we can sort of stand behind how we're doing this. And, and every time we introduce ML or incorporate it into how we capture something there are certain metrics that we want to capture and we try to be really transparent about how something works and how well it works and just get acknowledged some of the limitations of it. I mean I think there's a lot more opportunity here and it's something that we're thinking about over the next three years as an area that we can kind of double down in, but to me there always going to be that human component just given how complex the, the source information is and how difficult it is to.. looking at even sometimes having two oncologists looking at the same chart you may get disagreement and you may have to talk through why, how they got to an answer, and so it becomes very challenging I think to automate some of that.

Dr. Karim Galil: You raised a great point which is the black box. I think one of the main themes of our podcast is we're always trying to educate our users about Real World Data/Real World Evidence, and also on the concept of AI. And I believe unless we untangle that black box, AI cannot really meet the promises in healthcare. As you said, no physician, nor regulatory agency, nor FDA, no nobody is going to be able to accept the data that cannot be tracked to the source code and can be explained, while the whole AI industry is all about black boxes. I'll give you an autopilot, but I'm not going to explain to you if your car crash, what happened actually for the car to crash. I think another big limitation of AI, I'm obviously a CEO of an AI company, but I still know the Good, the Bad, and the Ugly of AI. One of the limitations of AI that I see in healthcare is the approximation. AI is not about getting it 100%. You get AI folks to be really excited when the accuracy is 70%, 80%. They're like “Hey, I have this really good model!” And that’s great in the AI world, but it's not really something that physicians are trained on, right? You're trained on the highest standard of accuracy and it's very hard for you. I remember we were talking to a client and he said, “I can actually work on this model,” and the standard, it was an OCR thing and the standard was 70%, “I’ll get it to you at 92%,” and my AI guy was so excited! Then the guy goes what about the other 8%?

Melisa Tucker, Flatiron: Yes

Dr. Karim Galil: Just a very weird question it’s like what do you mean now, I just increased you 20%! I really agree with you, the black box is a big limitation in the world of healthcare, which brings me to this question what is your take on the concept of Patientless or in-silico research? Do you really see the future, one or the other, RCT or Real World Data, or are you going to see them coming together or or melting together to offer both safety, efficacy, but also effectiveness as a third endpoint?

Melisa Tucker, Flatiron: No I think you said it well, I mean I think it has to be both and we're seeing a real example of that play out right now with COVID. We have therapeutics being studied, vaccines being studied in these large scale randomized trials which I think is what we absolutely need in order to understand, right, like I'm not going to take a vaccine unless I know that it's been studied and in an RCT and that it's effective to a certain percentage above a certain threshold, and I understand the safety profile of that. And I think we're also seeing the value of Real World Data in COVID as well, at understanding in real time who's being infected, how are they doing in the real world, I think it's helping us improve the way that patients are being managed, just by kind of understanding that information kind of in the meantime while we wait for the results of the clinical trials, understanding where even the amount of people who are dying in a way that maybe undercoded or you know but by just kind of looking at the number of access to us, I think these are all really interesting things that can be studied through Real World Data but won't replace the need for clinical trials. I think the same is true, kind of generally in the industry to me, we’ll always need RCTs because even the best AI can't remove confounding or limitations, information that's not there. And so to have the ultimate degree of confidence in how a therapeutic or a drug is working I think you'll always need to have that truly kind of randomized, this is the best we can do to isolate the impact of a drug. But I think there's a lot that we can learn, we talked a little bit about about specific subpopulations and areas where we can pursue label expansions. We ran a study that I thought was really interesting where patients who had a certain preexisting condition, heart condition, that was contraindicated for a particular drug, the regulatory agency asks for the pharma company to study whether those patients were getting the drug and what the outcome was. And it turned out that we were able to cobble together close to a hundred patients across the network which gave us a pretty good sample size to understand whether these patients were actually having worse cardiac outcomes. And I think that's just an example where you wouldn't ever do an RCT, and so I think there are examples on both sides. And if you combine these two tools it will get us to a much richer and more complete view of patient health because obviously we need to know how, we need to be able to isolate the effect of a therapeutic in a very controlled setting, but we also need to know how it works in the real world and whether patients are taking it, and ultimately what the effect of this is in a broader population and I think that's sort of where the two can complement each other.

Dr. Karim Galil: So can you share with us what was the most interesting project that you have worked on, or the most interesting finding that you have seen in the last six years, if possible understanding confidentiality.

Melisa Tucker, Flatiron: See I talked about the heart condition study, I think that the one that we're probably most known for is studying effective CDK inhibitor in male breast cancer patients. These are patients that are not able to be studied in a clinical trial or weren’t enrolled in a clinical trial, and being able to understand whether those patients also benefit from the therapy, so that you could actually expand the label of the drug to incorporate those patients and actually give them a treatment option that wouldn't have been available to them, I think that those are the types of opportunities that can have huge patient impact and where I think Real World Data is especially well positioned to play. And so that's an example of a study that I think is really interesting.

Dr. Karim Galil: Sweet and best part of talking to you, I feel like I'm talking to a physician, honestly. You know a lot about Real World Data and oncology and all of that! What was the journey? What did you do? Walk me through the last transformative shifts to medicine.

Melisa Tucker, Flatiron: Yeah I mean I've always been really interested in healthcare, you know I started my career in consulting but working in healthcare, I think as with many people at Flatiron, considered the premed route and and ultimately decided not to do it. But I think that one of the things I love about Flatiron is, and I think this is true of any successful health-tech company, is how cross-functional our teams have to be and so when you're talking about being a product manager it's not just working with software engineers but it's also oncologists and data scientists and data abstractors who are all, actually like literally, on the team with you side by side. One of the things that makes people successful is just this kind of intellectual curiosity and being interested to learn and so we have software engineers who come in from Spotify, who’ve never done any healthcare and three weeks later they're talking about lines of therapy and understanding exactly how that interacts with the code that they’re writing. And I think that that's something that's really fun about health-tech. It's also really hard because, also as a PM, I can't just look at a dataset and know if it's good, right? I need to talk to the oncologist and the data scientist and understand the use case from the customer. And so it's really challenging but sort of the ability to make an impact in patient's lives and understanding the broader mission is really exciting. So I feel I've had a chance to to learn from all of these great oncologists who are practicing, a lot of our oncologists actually do practice in the clinic as well, usually one one day a week to stay current on treatments and standard of care, and so it's just a place where you're constantly learning and soaking up new things. I'm sure you’ve experienced this as well, given even how rapidly things are changing in the field.

Dr. Karim Galil: Well you guys certainly did affect a lot of patient lives in a very positive way, and you also kind of educated all of us in the market on why is it important to go outside the parameters. My last question, if you can zoom call any living person today, who would it be and why?

Melisa Tucker, Flatiron: Yeah! So I think I have to go outside of healthcare here for this and I would say Ruth Bader Ginsburg. I mean obviously she's a total badass, but is also someone who rose to the top of her profession despite pretty tough odds, had a really great partnership with her spouse, raised a family. I have a three and a half year old daughter, I think you do too, Karim, based on seeing her pop into our Zoom the other day. I just think it'd be really cool to have her meet someone who's done so much for shaping the world that we're in. And even if she's going to have different challenges, but I think a lot of them, will kind of build on shared lessons from thinking about how Ginsburg might have navigated the world 50 years ago.

So I just think there would be a lot of, a lot to learn there.

Dr. Karim Galil: That's a good choice. The first time I've talked to Melisa, I was very tense, because my daughter was pretty much jumping on the desk, and you couldn't see it in the camera then all of a sudden she comes in. Yeah, it's tough working from home, but again, thank you so much for coming in. There is a lot that I learned in today's podcast, and the passion that you bring in, and the energy just is very refreshing. So thank you so much for making the time for that and please if you have any questions, reach out to Melisa, she's an extremely, extremely helpful person. And also, please make sure that you like the podcast, share it, and add all of us on LinkedIn.

Thank you.

Melisa Tucker, Flatiron: Thank you so much Karim!

Dr. Karim Galil: Take care. Bye.

PODCAST — 27 minutes

Richard Gliklich, CEO of OM1 on Patientless Podcast #003

Today's guest is Dr. Richard Gliklich, he's the founder and CEO of OM1, but he's also a physician. He's a professor of Otolaryngology at Harvard medical school, and before starting OM1 he also founded a company called Outcome at around 1998. That was eventually sold to what is now known as IQVIA, and built the phase four or the outcome piece of the business. He has led several key national and international efforts focused on evaluating the safety and effectiveness and value and quality of healthcare. So we're very honored to have him today on our podcast. Richard, thank you so much and welcome to the Patientless podcast.

Listen and read →

Richard Gliklich, CEO of OM1 on Patientless Podcast #003

Dr. Karim Galil: Welcome to the Patientless Podcast. We discuss the Good, the Bad, and the Ugly about Real World Data and AI in clinical research.

This is your host, Karim Galil, cofounder and CEO of Mendel AI.

I invite key thought leaders across the broad spectrum of believers and descenders of AI to share their experiences with actual AI and real world data initiatives.

Hi everyone. And welcome to new episode of Patientless podcast. Today's guest is Dr. Richard Gliklich, he's the founder and CEO of OM1, but he's also a physician. He's a professor of Otolaryngology at Harvard medical school, and before starting OM1 he also founded a company called Outcome at around 1998. That was eventually sold to what is now known as IQVIA, and built the phase four or the outcome piece of the business. He has led several key national and international efforts focused on evaluating the safety and effectiveness and value and quality of healthcare.

So we're very honored to have him today on our podcast. Richard, thank you so much and welcome to the Patientless podcast.

Richard Gliklich, OM1: Well, thank you. Thank you for inviting me. I look forward to speaking to you and your listeners.

Dr. Karim Galil: Obviously Richard is a role model for someone like me. I also went to med school and decided to go into business. I mean, I obviously didn't end up being a professor at Harvard, but it's very inspiring to see your journey between the clinical side of medicine and also the business side of it.

Can we start off touching on that? How did you end up starting a company?

Richard Gliklich, OM1: Well, it's actually a great question. So in the 1990s, I was very active in outcomes research, and I had a lab that was focused on outcomes research, and later becoming what we call patient registries today. And the hospital was looking to spinoff companies. So I actually had a knock on my door once from the new head of business development, who asked me if I had any technology that they can license.

And one thing led to another. And that's how my first company spun off my research lab and got me into the business world.

Dr. Karim Galil: Wow. That was a good investment from their side. So why don't we start off by you telling us more about OM1, how you started OM1, what's the mission of OM1 as an outcomes company,

Richard Gliklich, OM1: Yeah. No, absolutely. So our vision is really to improve health outcomes through data, and that sort of encapsulates what we're trying to do. From a mission perspective, we are harnessing the power of data for measuring and improving patient outcomes. That was our first goal for accelerating medical research and for improving clinical decision making. So that's kind of the mission of the company, and where we started from is that after I'd sold my first company, I was in the venture capital world. I became very interested in healthcare IT and big data. And with the digitization of healthcare data that followed Era and High-tech in the last recession, the last great recession, there was this massive digitization of healthcare data, and I felt that there was an opportunity to leverage information more automatically than I had done with my previous research and business, which was a much more manual, to still drive the same goal of being able to measure patient outcomes. And if you could measure them, ultimately you could predict them. So that's what led to the concept, and really all about how do we measure outcomes.

And what we learned along the way is that while there was a lot of data out there, being able to access data with data liquidity, wasn't the problem. It was really being able to find a really strong, deep levels of information. In fact, we were able to develop a database of almost 250 million unique individuals in the US pretty rapidly.

And there's a lot you can do with that, but it took much longer, frankly, to develop much more sophisticated data sets that could ultimately be used for medical research and personalization in very specific areas of healthcare, mostly chronic conditions, which cost a lot.

Dr. Karim Galil: That's actually a great point. So the word healthcare data is a very generic term and it's used quite loosely. In many instances people mean ICD codes and very structured data that are meant for billing, but it doesn't necessarily capture the clinical aspect of a patient journey.

I think you've touched on that. You said it's easy to collect data, but it's not easy to get deep clinical insights about a patient. How do you define deep? How do you define a comprehensive data set about a patient?

Richard Gliklich, OM1: Yeah. I mean, I think you know this from your medical training as well, but it's getting to the nuance of what a clinician means when they are seeing a patient and entering information about that patient and in the US, that nuance is still generally captured in a dictated note or a written note. There's certainly information in the laboratories and the coding information, the billing information, pathology information and so forth. But if you want to get to the nuance, which is that clinical understanding of how the physician is viewing that patient, you have to get deeper into that data. And that requires a lot more effort.

Dr. Karim Galil: It was very interesting for me that there is no billing codes sometimes to a subtype of a lung cancer. Is that true? Like you cannot capture a non small-cell lung cancer, in some sort of a billing code. Is that still the case? What's your experience with that?

Richard Gliklich, OM1: I think that the billing codes improved, I mean, not to bore your listeners, but the billing coding system improved with the granularity of going from ICD-9 to ICD-10. But sometimes that granularity is still, you know, you're still picking from a list of possibilities. And while there may be a code for, catching fire while on a surf board, I actually think there may be a code for that, there may not be for certain subtypes of lung cancer or lupus or whatever the condition may be, because it's not been critical to those paying the bills to have that information.

Dr. Karim Galil: That's quite interesting. I didn't know that there's a code for catching fire on a surf board. One of the questions we always ask, is real world data and outcome research a vitamin or a painkiller? And what I mean by that question, is it something good to have, or is it becoming to be a must to have for a pharma company or for a decision maker in healthcare?

Richard Gliklich, OM1: I think, you already made the comment that real world data from one source is entirely different from another, so I do think that the opportunities to leverage what happens in the natural experiment to the real world are unlimited. Like literally unlimited. So I'd say it's more than a vitamin, but it's probably not quite a panacea, but I think that we're just beginning to tap it.

And I do think that the strategic value of this deep, clinically focused, because we still want to add in social determinants of disease and information about providers and information from other types of encounters, both within and outside the healthcare system. But I do believe that the opportunity is there to revolutionize what we're doing, both on the medical research side and the clinical care side.

I'm a true believer. And as a result, it's a strategic investment that crosses these companies and the companies that are moving fastest, have huge advantages. Being the pharma companies have a huge advantage.

Dr. Karim Galil: So there's this really big debate. Obviously in every conference we go through, randomized controlled trials versus real world data or more like data-driven trials. And obviously RCTs has been the gold standard and the naysayers are more skeptical about the clinical validity of whatever you get out of real world data.

What's your take on that? Is it one or the other or is it both of them coming together? Where do you stand in this debate?

Richard Gliklich, OM1: Yeah, I don't see it so much as a debate. I think they're complimentary sources of information. So real world data enables us to see what actually happens in the real world. In the large, actual, natural experiments that are occurring. It allows us to see what happens with drugs and devices that are being used in populations that are not generally recruited into clinical trials, which are largely middle aged white males are the predominant group in the US. They enable us to be able to look at combinations of drugs. And also look at the real use patterns. Like if I have a patient who's going to come in and see me for a trial visit every two weeks, they're sure as heck going to take their medication and fill out their forms if that's the requirement, but that's not what happens in the real world.

And so understanding those things is complimentary to what we learn in a clinical trial, which are critical for really handling bias and knowing what works and what doesn't. So I, I believe strongly in randomized clinical trials, but also believe very strongly in the importance and complementarity and the need to have that complementarity of real world data, because you need to know the extremes of the populations.

Who's getting it. Who's not getting it. How you can look at things like comparative effectiveness in the real world, which is very, very hard to do in drug studies. There’s very few head to head to head to head drug studies being done. So to get to the patient choices that are really necessary out there, we need the real world data, but I absolutely believe that randomized trials are critical for knowing what works as well.

Dr. Karim Galil: So this is more towards like, we're seeing more towards Phase IV for the role of real world data, but how do you see the role, if any, of real world data in things like Phase II or Phase III of clinical approval like someone trying to seek an FDA approval for either a new use, like an extended labeling, or even a new compound altogether.

Richard Gliklich, OM1: Yeah. So what we see with our clients are many of them will look for us to generate real world data sets as they're coming, getting their phase II results, and they do that for a number of reasons. They want to compare what they're seeing in the Phase II, to try to plan the Phase III.

They want to utilize it for protocol development. They want to look at it in terms of trying to understand what it will be like to recruit for the further Phase IIs or Phase IIIs in the real world trials as they try to put them into place. And then, as we go towards Phase III, what we are seeing is that there are certain scenarios that the FDA will accept real world data for either a new approval or a label expansion.

And for the new approval scenario for populations that may be rare, small, and difficult to define, so the example of Ibrance getting an approval for a man with breast cancer is a good example of the use of real world data for kind of a label expansion to a new population that was facilitated by real world data.

Another example would be creating external control arms that can actually be provided to augment the placebo arm, to get comparators against the active treatment arm within a trial. So those are all good uses. Another use for expanded label is when you have a natural experiment happening, meaning when you have a drug or device being used off label, but having good results.

And we have one scenario currently where we're providing data in exactly the situation where there is a reluctance to randomize because there's already a bias that's been developed among the clinicians who feel it would be unethical to randomize patients based on what they've already seen.

So there's certain scenarios where it becomes really smart to use real world data and certainly acceptable. But in any of those situations, the sponsor needs to engage in conversations with the FDA. They have to understand what their appetite is for real world data, how they'll evaluate it, what they want to see. Because if it's practical to do a randomized trial, that's generally going to be the preferred option for the FDA.

Dr. Karim Galil: Do you guys at OM1 help sponsors make the case to the FDA of, “hey this is a study where real world data will be really good for it”, or is your role more after they figure it out with the FDA that you come and execute on it?

Richard Gliklich, OM1: We're generally involved all along the way. We'll go with them to the agency to sort of explain what our role is and how we view the data and the quality of the data. There's often a lot of questions and good, smart questions from the FDA about how the data are being collected and processed and how you're ensuring that they're meeting appropriate quality standards and audit-ability, traceability, and so forth. So, we are partnered with them typically from the pre-submission all the way through.

Dr. Karim Galil: Today, the concept of real world data is obviously trendy. Everyone is talking about it. But in 1998, that was not the case, and that's when you started Outcome. Can you walk me through the adoption curve from the 2000s? I mean, you started Outcome Research when that was not a sexy term or not something that everyone is talking about, and today you have a very well established player when there is more of an appetite and more adoption. Can you walk me through, like, what's the difference between early 2000s to now that we're coming to 2020.

Richard Gliklich, OM1: Yeah, absolutely. So if you can bear with me, I'll give you a lot more longer, history than you might want to hear. I did a specialized fellowship in outcomes research during the middle of medical school with a fellow named Sam Martin. I was at University of Pennsylvania.

He was actually on the board, I think, of SmithKline back in those days as well. And he had been the chairman of medicine, I believe at Duke, who said to me that when he had been a chair of medicine years earlier, he always questioned how well patients were actually doing on treatment.

And so he told all of his staff that worked for him, that when you have ever have a patient refuse treatment, send them to me and I'll just follow them. And he did that and he said that we really don't know what works and what doesn't work. And we really need to understand that the only way to do that is to follow them in the real world.

So that got me on the path of trying to track patients in the real world. And initially that was through developing technology, internet based technology to track patients in registries and, and, and the initial uptake was around with medical and surgical specialty societies. So programs like the American Heart Association get with the guidelines program and work with American College of Surgeons and so forth.

And there wasn't a huge amount of interest outside of that. But that enabled us to build a global network to collect data. What happened was when Vioxx was withdrawn in 2003 by Merck voluntarily, that immediately caused the entire industry to need real world data on what was happening in the real world for a number of reasons.

And so that, that's what opened the market and everything changed overnight, frankly. So better lucky than good as they say. That's how I got into it.

Dr. Karim Galil: So that specific event was a turning point for the industry. In early 2000s there wasn't really wide adoption of EMRs. How were you even able to track patients in a real word setting?

Richard Gliklich, OM1: Yeah. So back then we had to set up, what's now called electronic data capture for registries and post-marketing surveillance. So just similar to the way EDC is used today for clinical trials, we had created our own system back then to do it in the real world and set that up. The EMRs were not prevalent.

We actually created an EMR system at one point. Had a few thousand users of our own EMR system to try to enable it to happen more fluidly, but EMRs weren't supported significantly until 2007, 2008. So that part of the business didn't do well. So we actually abandoned that, foolishly, and now 2008, everything over the next five years starts moving towards EMRs.

So now, now clinicians don't want to double entry information, both in the EMR and into somebody's EDC system. So we must work from the EMR if we're gonna maintain the research infrastructure, I believe.

Dr. Karim Galil: That comes with the complexity, as you said, that the nuances are mostly captured in a very unstructured way. It's not something that a computer can analyze. It's not something that you can plug into a SaaS or an SQL. What are the options today? How can someone go around that?

Richard Gliklich, OM1: Meaning getting data from the electronic medical record?

Dr. Karim Galil: Like capturing the nuance out of a non-computer readable format. If you have, say a study that has like 5,000 patients, that translates into a few thousands of PDFs, a few thousands of doctor notes, how can someone lean relevant information out of that?

Richard Gliklich, OM1: Yeah. So there's really just a few ways that it's done today. So one way is you put the nurse or other clinical abstractors to review information and infer from that and fill out a case report form on an electronic data capture system that. That has some utility and you can have more than one

do that and measure their inter-rater reliability. Another way is to force being able to do parallel entry, which would be the more standard was typically done. But sites are generally gonna revolt against that over time. Meaning you're collecting EDC by having people reenter information in that way.

A third is templated EMRs where a certain amount of structure is put into an EMR to be able to capture structured information, but clinicians still don't like to enter it in that way. They don't like to click they'd rather talk. And the fourth is collecting data and trying to use language processing to pull information from the unstructured text to make it into structured variables.

Dr. Karim Galil: One of the biggest questions for me about outcome research is whether you're using AI or human abstraction or whatever tool to extract data, you have to approach the problem knowing what data to capture to begin with. What endpoints, or what data parameters are of interest for that.

Yet I find it hard to reconcile that with the fact that we don't know what we don't know, right? We don't know what we should be capturing to begin with. How can someone reconcile these two concepts, approaching an experiment with curiosity and at the same time being bounded by the idea that you have to abstract, that you have to pick some points to extract, and you have to disregard others.

Richard Gliklich, OM1: Yeah, that's a great question because there's some subtleties. I do think that you need to intentionally know what you're trying to capture for the purposes of a study. Meaning when you look at information, trying to extract it, whether that’s by curation or AI to pull information out. However, using AI directly allows for a tremendous opportunity for hypothesis generation, with more unsupervised techniques.

And we have found that to be extremely valuable in finding correlations and things one might not have otherwise considered. But I think just like any other study, there's those studies that are hypothesis generating, and then you want to move towards a hypothesis testing type of study to confirm it.

Same thing when we're looking at unstructured information we will, if we're trying to learn something about the data, let the data tell us what it can tell us. But then, study the results in a more hypothesis testing way once you've done the hypothesis generation.

So both can coexist, but they they're different mindsets from the start of those evaluations.

Dr. Karim Galil: You have a company that generates outcome research, but you're also a physician. Where is the gap between the bedside of the patient and between where the industry is? In other words, are you seeing any time soon a world where a physician is only prescribing a treatment plan based on the aggregate wisdom of all the outcome research that has been out there? Or is it still going to be more subjective and dependent on his own experience.

Richard Gliklich, OM1: I think it's changing very rapidly. The biggest barrier is getting, identifying when say FDA approval is needed for something being a software as a medical device. We have programs now that will help assist a clinician or a patient with decision making. We just had a, I won't name,the institution and the subject because it hasn’t been published yet, but we just had an academic institution complete a randomized trial using the output from a set of models that provide sort of personalized predictions in the clinical setting to assist an informed consent that, not only improve the process of that consultation, but actually improve the patient's outcomes in a randomized trial. So I think the opportunity is tremendous to bring it to the bedside.

There are some barriers and regulatory questions that need to be addressed and so forth. But I think as soon as the things start, that we hit the tipping point, which may be is in three to five years, most of our clinical decisions are going to be assisted with personalized information based on big data, real world data, and AI.

Dr. Karim Galil: Wow. Three to five years, I expected something like 10 to 30 years. So you're seeing that it is that fast. The movement is that accelerated right now.

Richard Gliklich, OM1: It's not yet, but we're going to hit the tipping point when some of these studies demonstrate like this one, these studies start demonstrating to people within three to five years. That's when we'll hit the start, hitting the tipping point, that the standard of care can be changed by personalization. Like right now. I mean, your team is heavily involved in oncology. In oncology personalization has changed everything in the last 15 years, but outside of oncology, so areas that we work in tremendously like immunology, rheumatology, cardiovascular, in those you can't separate the DNA of the disease from the DNA of the patient.

So it's a tougher math problem. And so you need more data, but ultimately the personalization information that we'll get from those data will be equal to what we're currently able to do in cancer. And we'll change those diseases, it will change their treatment, it will change clinical interactions.

And I think that it's happening at quantum speed.

Dr. Karim Galil: It takes around 10 to 20 years to get a physician to have enough experience. I think the real world data and outcomes research is going to get that way faster. Because now with the click of a button, you can get access to what happened worldwide with patients who had the same kind of phenotypic or even genomic profile of the patient that's sitting in front of you.

But the bigger question is, a physician almost has like 10 minutes with a patient. How fast is this data going to be delivered? Is it going to be delivered in a sense of, he's putting the data in the EMR, he's getting a recommendation? Or is it more like, what we see in oncology where you gather every Monday, you have the board, then you start discussing your patients and do this kind of matchmaking?

Richard Gliklich, OM1: I think decision-making only needs to be accelerated when there's a life and death reason it needs to be accelerated. So what we see in the clinic, when AI based decision making is brought to the clinic, that they do exactly what you just said. They'll meet in spine work that we do.

It's another area that we're in. They'll meet as a council and review each patient and say, this one or that one has particular reason that we need to do something different, and it needs to work in the clinical workflow. That's why it'll take a few years for it to hit the tipping point.

Not because there is not already the ability to generate some incredibly valuable predictive information.

Dr. Karim Galil: I'm a big fan of OM1. Every time I talk to you, you guys are touching all aspects of the business. I didn't even know that you're in the business also of helping physicians, that's very, very intriguing. So how's 2025 looking like for clinical research, like in the next five years, how's it going to look like?

Richard Gliklich, OM1: A lot more modeling and real world data. A lot more automation, meaning, making research much more of a technological effort and then a human effort. I mean I think that's going to be the expectation of centers. COVID is proving that you don't always have humans available to do the research work other than non-COVID research. And our automation, for example, that we have in place to bring data in from centers and to process and so forth enables us to continue to do work, even though the clinical research infrastructures have slowed down. So that that's a good learning to keep in mind. I think there'll be more acceptance of real world data, real-world research in its importance, by the FDA and industry in appropriate places. But also, as we just talked about, more research on what we call implementation science: how do we bring the data from bench to bedside as quickly as possible?

Dr. Karim Galil: If you can Zoom call any living person today, who would it be and why?

Richard Gliklich, OM1: Well, one question I would want to know, and it has nothing really to do with real world data, I’d probably want to zoom the Dalai Lama and ask him if we've just taken a side path off of our karma, or if we're going to get back onto some good karma cause we need some good karma as a planet right now.

Dr. Karim Galil: It's crazy. What's happening out there. Have you seen what happened in Lebanon today in Beirut?

Richard Gliklich, OM1: I saw the videos, unbelievably horrible.

Dr. Karim Galil: It's like a nuclear bomb or something. It's crazy. I couldn't even believe it. Yeah. I hope the best for the world. This 2020 has been a very rough year, obviously for everyone.

Hey Richard, thank you so much, this has been incredible. Thank you so much for all the, like your take on different aspects of outcomes research, the history of outcomes research, and all the good luck for OM1. I root for you guys. Thank you so much

Richard Gliklich, OM1: You too. And thank you, we’re excited to be partners with you.

Dr. Karim Galil: I appreciate it. Thank you. Bye bye.

PODCAST — 40 minutes

Jacob LaPorte, Novartis on Patientless Podcast #002

Our guest is Jacob LaPorte, Co-Founder & Global Head of The BIOME by Novartis | Co-Founder/Advisor of digital health start-ups.

LISTEN AND READ →

Jacob LaPorte, Ph.D. Co-Founder & Global Head of The BIOME by Novartis

Our guest on Patientless Podcast #002

Dr. Karim Galil: Welcome to the Patientless Podcast. We discuss the good, the bad, and the ugly about real world data and AI in clinical research. This is your host, Karim Galil, Co-Founder and CEO of Mendel AI.

I invite key thought leaders across the broad spectrum of believers and dissenters of AI to share their experiences with actual AI and real world data initiatives.‍

Dr. Karim Galil: Hi everyone, on today’s show we have a very special guest. The first time I heard the concept of patientless trials was actually during one of our interactions. We were introduced to the digital team at Novartis, and I was basically explaining what products we were building, and then this guy goes “oh, you guys are building patientless trials”. That’s a very interesting term. It kind of inspired us and also significantly helped us in shaping our strategy. There are very few people who I’ve met in my life, that I can recall, that have inspired me in 30 minutes and he is definitely one of them. After getting his PhD from Harvard University, he started his career at McKinsey, helping Fortune 500 companies shape their R&D strategy. He was also a fellow at the Howard Hughes Medical Institute. After his time at McKinsey, he entered the CRO world and was the Chief of Staff to the Chief Medical Officer at PPD. Then right after that, he joined Novartis and his main objective there was to assist them in figuring out their business strategy when it comes to digital transformation. An entrepreneur-at-heart, he co-founded BIOME at Novartis. Which is very interesting because you don’t get to see a lot of people holding the title Co-Founder in the pharma world and especially in big pharma. Our guest today is Jacob LaPorte. Jacob, welcome to our show!‍

Jacob LaPorte, Ph.D.: Yeah, thanks so much. It's great to be here. And it's great to hear that I might've said something that ultimately inspired you, so that's really kind of, you.‍

Dr. Karim Galil: Yeah. If you’ve ever been to a trade show, when you see our booth, we usually have patientless trials, a big bolded display. So, just as a disclaimer, we haven’t paid Jacob anything for the rights of using the term patientless trials. But Jacob, again, thank you so much for being on today’s show. I believe that today we’re going to have a mind-blowing discussion because Jacob is an individual who is not only able to have the foresight for innovation for what can happen in the next 20 years, but also be pragmatic enough to understand what we can and cannot accomplish today. Maybe before we get started, why don’t you let us know a bit more about BIOME? What is BIOME?

Jacob LaPorte, Ph.D.: Yeah, sure. Well, in the simplest metaphor that we use, we kind of see it as an on-ramp for helping great external partners work with Novartis teams to co-create novel digital solutions. So just like in everyday life, it would be very hard for a car to get on the highway from a stop sign, no matter what car you drive. Similarly for external partners, that we work with at Novartis, it's very hard for them to get up and running and fit into the traffic flow of our highway without a better on-ramp system. So the BIOME is really looking at those classic challenges that exists at the interface of a big pharma company with the external digital health and tech ecosystems. And it's asking the question: how can we build tools and processes and approaches that help Novartis teams find the right partner in this very complex ecosystem? And when they do find the right partner, how can they kind of start to work together in a better way to more effectively co-create digital solutions. So that's really what we're all about.‍

Dr. Karim Galil: Is it a safe assumption to say that you guys are like the pharma version of a Silicon Valley incubator/accelerator?‍

Jacob LaPorte, Ph.D.: It's funny that you say that because I think a lot of people want to draw that analogy to something that they're used to, but we don't have a program in place that is very similar to an incubator or an accelerator as you would say, in a classical sense. What we do do is we help external partners really kind of onboard into Novartis and translate the technology that they've developed into a context and in an environment for Novartis, which ultimately helps them scale their solution into a major pharmaceutical company. So we have things, for instance, that do, really look to support our entrepreneurs and our external partners. We do augment them with internal expertise that we have, where relevant, we are thinking about tools that help them develop their solutions. Like one thing that we're kind of developing right now is called a data sandbox where if we have data in Novartis that could be useful, we might be able to anonymize that data or create synthetic datasets and upload them to an environment to allow these external partners to operate on that data and maybe evolve their algorithms. So we're thinking a lot about how to support our partners in developing these digital solutions.‍

Dr. Karim Galil: Are they free to work with other Pharma companies if they are part of the BIOME project?‍

Jacob LaPorte, Ph.D.: Yeah, a hundred percent. This is a pure open innovation model. We think that it's actually frankly advantageous for our partners to work with other companies because it really helps, you know, spread frankly, the risk, and also spread the opportunities to scale a particular solution. So we don't necessarily see ourselves as the natural owner of a lot of these solutions, but we'd rather use them to augment our ability to get, our medicines to patients, faster, more effectively to extend and improve their lives. So yeah, it's a complete open innovation system.‍

Dr. Karim Galil: You guys started around two years ago, and now have a multitude of success stories. One of which that I find very intriguing happened overseas in Ghana. Would you like to talk about that? ‍

Jacob LaPorte, Ph.D.: Yeah, I think it's a great illustration of what the biome process can do. so I think what you're referring to is a story, we came out with actually a couple of weeks ago where we were working with our global health organization at Novartis, which already has an ongoing initiative around increasing the access to medicines in Sub-saharan Africa for sickle cell disease patients. But one of the classic problems of course, with that disease is the loss of follow up that you experience in healthcare populations that are distributed and may not have the best healthcare infrastructure, because you need to diagnose sickle cell right now with a blood laboratory study right? So what happens is you can get out into the population and collect their blood samples, but once you run the laboratory test or diagnostic, you've often lost that patient to follow up. So they actually don't get the diagnosis and the medicines that they need. So one of the things that the biome did to help out this initiative is we started looking for, diagnostics that could be delivered at point of care to help cut down that problem of loss to followup. And we wound up finding an amazing company in Portland, Oregon called Hemex Health. Which has a fantastic, cheap, portable, point of care diagnostic for sickle cell disease and malaria. And so this is an amazing story where a technology being developed in Portland, Oregon was able to plug into an initiative in sub Saharan Africa, and now we're starting to see some of that exciting fruit of that labor happen where we're being able to diagnose patients and get them on medicines, lifesaving medicines, a lot faster for that. So, this is an example where, you know, the biome also thinks about how to support its partners like I was mentioning before. So Hemex Health, had a great technology. We knew that it works, but it wasn't actually approved on the market in Ghana to actually be incorporate into this initiative. So we actually had, augmented their team with regulatory support from Novartis and I'd love to thank my associates that really stepped in, in a big way to help out in regulatory. And we were able to accelerate their approval onto the Ghana market, and therefore incorporate them in to this initiative. So it's a fantastic story about how we can more quickly translate these new technologies into our existing business to basically improve and extend patients' lives. It's really touching because when you think about sickle cell disease it has such a large burden, particularly in Sub-saharan Africa. Many of the patients that have sickle cell don't live beyond their sixth birthday and it's just amazing to think that we might be able to have the technology here that gets these patients on medications faster and therefore, you know, really extend and improve their quality of life. So it's a great story.‍

Dr. Karim Galil: It is a great story, and it’s a great example of the kind of corporate responsibility that big pharma should take on by breaking these borders. To add, you guys are doing a great job. I like the vision that you have now that has transformed the company from being a “pharma” company, to becoming more of a “pharma and data” company So you guys are leading that digital transformation in the Pharma industry. Talking about digital transformation, Jake, how would you define “Patientless Trials”? It's definitely one of those loosely-defined terms, and many people tend to think that “Patientless Trials” are against the patient. You had made the argument that the best thing we can do for patients are “Patientless Trials”, so how would you define a “Patientless Trial”?‍

Jacob LaPorte, Ph.D.: Defining is always so difficult. Right? To me, I guess I would say that, patientless trials is a subsegment of clinical trial simulation. So really what we're trying to do is simulate an outcome of a clinical trial using existing data. So therefore to reduce the need of actually using patients in a study to test the medicine and to determine whether it's safe or effective, which is the best tool that we have today, right? But if you think about it, what clinical trials are doing is they're testing these medicines on patients. So if we could somehow understand the outcomes using existing data and simulations, without putting these patients into trials, I would argue that's one of the more patient centric approaches to clinical trials that one could one could imagine. Right? So what you're doing is essentially, you know, giving them medicines without involving them in trials that you know, are already safe and effective. I mean, I can't imagine a better approach really.‍

Dr. Karim Galil: So you’ve mentioned this very interesting differentiation that I’d like to touch on which is essentially: data and simulations. We’ve chatted a lot about organoids and you have this very interesting article on Linkedin touching on “Organ-on-a-Chip”. So we’re all here talking about real world data, which is one part of it, but you’re also talking about the next level. Could you possibly explain to us what you mean by “data”, what you mean by “simulations”, and also what is an “Organ-on-a-Chip”?‍

Jacob LaPorte, Ph.D.: Yeah. So, so yeah, I wrote an article on LinkedIn called why humanless trials could be the pharmaceutical industry's nirvana. Right? And I published it quite some time ago. So it's nice that you, kind of referred back to that. And so what I was looking at that article was, you know, the concept of humanless or patientless trials. And as we just talked about, to me, that really means simulating, a trial outcome, you know, with existing data, without requiring patients to be in a trial. And so where are we at today? The key here is that you need to develop an accurate simulation of a trial. And with the advent of machine learning methodologies in AI, we're getting to the point where we can create very sophisticated models or simulations of complex systems. But you need to have good data to train these models. That's often the part that people leave out. Right? We often talk about AI and the power of AI and machine learning, and it can seemingly do all these amazing things. But you need to have the structure data and the right data, to actually train these models and make sure that what they're predicting is something that's accurate and representative of a complex system. So the issue, and what I believe is one of the grand challenges that we need to solve in order to unlock the power of AI in healthcare right now, is the fact that a lot of these healthcare data is very fragmented. It's all over the place in different systems. There's a lot lack of data standards. There's no universal ontology that helps you knit these data sets together. And so you don't we'll have a very complete picture of a person's health over time, let alone a population's health over time. So when you start to talk about simulating trials and outcomes, you really need to have that very nuanced picture of how people's health evolves over time based on various different stimuli. And so what you were talking about with organ-on-a-chip so what I was thinking about in the article was how does one approach this from a different angle? And so I was sort of asking the question, what if you could bypass that very large and sticky problem of trying to knit together all these datasets and instead generate very well structured and very representative data from technologies like organ-on-a-chip? These are generally called micro-physiological systems. So they're, I don't know if you've ever seen it, but they're, you know, organs on a chip are microfluidic systems that generally when they're honed in can do a great job of emulating, you know, the organs and their functionality. And then there's organoids, which are like 3D biology, which are growing, you know, different types of cells together in a way that emulates the physiology of an organ. And so the question is, can you start to interrogate those systems and generate data? And by the way, you can start to make this genetically diverse, right? So you can start to think of populations of organs-on-a-chip or patients-on-a-chip and so then can you use that data to them and create or train these machine learning algorithms that will better simulate potentially the outcomes of trials? So it's a long ways away. And I think, you know, some people be listening to this podcast and automatically say, well, how do you determine or how do you correct for environmental effects which we know are so significant to health, right? Or outcomes? Or health outcomes? And I would say, well, you know, at the very least, if you start to think about these genetically diverse populations on chips, you can start to at least get to more sophisticated hypotheses around subpopulations and inclusion/exclusion criteria, and design more effective trials. But then over time, as we start to learn a little bit more about the connections of environmental effects with health outcomes, you can actually start to weave that data in as well and actually get to a lot better simulations. So that's sort of what I was thinking. Yeah.‍

Dr. Karim Galil: So you’re basically talking about augmenting existing traditional randomized controlled trials with outside source data, rather than replacing it. I believe that a lot of the pushback we get about the concept of patientless trials is the preconceived notion that it's “randomized OR patientless” as opposed to, what you have explained, “randomized AND patientless”. It’s very interesting because what you are talking about is both randomized and patientless trials augmenting one another by supplementing data, is that correct?‍

Jacob LaPorte, Ph.D.: That's exactly right. I mean, at least in the near term, that's really the only way I see it, I don't think it's realistic to think at this point, we could truly replace randomized controlled trials with simulations. But that being said, you can start to think about how you hone your hypotheses around clinical trial design really getting into, you know, better subpopulation analysis or even categorization upfront. So I think these two things can be used in combination to be much more effective. Yeah.‍

Dr. Karim Galil: So, we all know the stats: $2.7 billion dollars and five to ten years to develop a drug. Do you think this kind of approach can accelerate this process and what would that change look like? Are we talking about going from $2.7B to $2.5B or are we talking about a significant decrease in both cost and time for drug development? ‍

Jacob LaPorte, Ph.D.: Yeah, I think in terms of the impact, I think it will probably evolve over time, given, you know, our capabilities and the sophistication by which we're able to establish these models and simulations. Right. So right now we're already starting to see an impact. Right? I would argue that there are some elements of patientless trials already being adapted in the industry, right? We talk a lot about virtual control arms and we're starting to see them being effectively deployed a lot into oncology trials right now. So obviously there's already an impact where you don't need to stand up an entire control arm of a study giving them an existing treatment where you more or less should already know the outcomes right. So we're starting to see that impact. As we, get better at simulating , for instance, get better at maybe kind of using these simulations to design better subpopulation, categorizations and get more targeted on trials. I think you can start to see a pretty significant impact. I don't think we're talking about an incremental 10% here. I think you could really move the needle on and bend the cost time curve of drug development fairly significantly. I mean, you can imagine what, if you get to a point right where you're using these organs-on-a-chip to really get rid of some of these things that still occur in trials, which is like toxicity issues, which, you know, if you can model that out better through simulations and just stop doing some of these trials that aren't going to work out anyway, because the animal models aren't telling us the right answer, that could have a significant impact on the portfolio. So I'm pretty optimistic going forward that this type of approach will start to have more and more of an impact on the way we develop drugs.‍

Dr. Karim Galil: So Organs-on-Chips, are these science fiction? Or have you seen some projects already out there that have at least developed a proof on concept?‍

Jacob LaPorte, Ph.D.: But they actually exist. I've seen them in real life. So a part of my journey into humanless trials was actually meeting some of the folks that work on organs-on-a-chip. So I'm thinking in particular about this company called Emulate Bio and there's some great folks that I've been talking to there for multiple years now. There's a lot of other companies by the way, but I'm just more familiar with Emulate and their products and they really do create these microfluidic systems that's based off of a lot of work that that team did at the Wyss Institute at Harvard. Showing that these microfluidic systems can in fact emulate human physiology, thus the name. And so they have products like lung on a chip and various things. I think liver on a chip, which I think do in a lot of instances, do a lot better job of predicting things than say the traditional animal models in those diseases.‍

Dr. Karim Galil: We need a lung chip for those vaccines that we’re all trying to chase for COVID. Speaking about COVID-19, do you think that COVID-19 made the concept of patientless trials more of a “painkiller” as opposed to a “vitamin”? Or are we still at the “vitamin” stage?‍

Jacob LaPorte, Ph.D.: I'm going to go with a vitamin, I guess. Because I think that, you know, painkiller suggest to me that you're sort of masking the problem by treating the symptom. And I think that COVID-19 has opened up the gateways for us to more rapidly experiment with technologies and trials, thereby increasing the likelihood of finding solutions that are going to have a meaningful impact on the future paradigm of development. So I'm actually. Although, you know, COVID-19 is first and foremost, a human tragedy, some of the challenges that it's posed to the industry and the healthcare system at large, I think we're going to see these reverberating effects of the sort of technology experimentation that we're doing today going forward. So I'd like to think of it, even though this is a terrible experience. I'd like to think of it more as a vitamin and really treating some of the root cause effects of the challenges that we've historically faced.‍

Dr. Karim Galil: You touched on that a bit when you mentioned that machine learning is great, but it is only as great as the data we are feeding it. In healthcare there are a lot of issues with data, you’ve touched on some of them, but I’d like to hear some of the data problems that you’re currently recognizing and also any kind of advice or framework that you could give to help others evaluate the data they are feeding into their models?‍

Jacob LaPorte, Ph.D.: Yeah. Sure. So whenever we start to talk about healthcare data and the power of machine learning and potentially leading to a future of more personalized medicine or more targeted medicines, we always tend to reflect on what we've been able to do with genomic sequencing and proteomics and in fact, we've come a long way. Like I remember starting out as a scientist, I was actually a molecular biologist and it was right around the time that the human genome project was going on and I just reflect on how far we've come from that to now be doing whole genomic sequencing for roughly around that thousand dollar mark that we all said it would take to kind of really come into the mainstream. So it's just amazing. But what we often overlook is that it's extremely important to also collect nuanced data longitudinally on healthcare outcomes of people over time so that we can relate the genomic sequences and the proteomic signatures to those health care outcomes, because I think that's part of the challenges. We don't exactly know what those sequences mean right now in terms of health outcomes and people's susceptibility to disease or the response to medicines. So one of the biggest gaps that I see in this space is really not having that longitudinal healthcare outcomes data. That even exists in the first place. And when it does exist, often exists in various different places and in different EMR systems that don't necessarily talk to each other. And then even when they do talk to each other, there might not be a standard ontology that you can really use to relate these different healthcare data together. So I think that's kind of one of the biggest challenges, going forward, right, is that healthcare outcomes piece, that longitudinal piece. And then being able to relate that back to genomic sequences, proteomic signatures and what that means for health. And what that means to be able to predict health going forward for populations where younger populations, where you do know their genomic sequence, you know, their proteomic signatures and then start to predict what is the likelihood that they're going to have a disease? Or what is the likelihood that they're going to be able to respond to a medication in a meaningful way?‍

Dr. Karim Galil: The industry, in general, is very familiar with the concept of claims data, the concept of structured data that is used for billing. How good is the claims data? Is it representative of a patient journey, or do you believe that we have to go back to the “old school” unstructured doctor notes, pathology reports, and faxes? Is there one good answer here or does it really depend on the therapeutic area?‍

Jacob LaPorte, Ph.D.: Yeah. I mean, look, claims data is great for certain things. So I don't want to come across as dismissing the power of claims data, and what that could mean to even stitch that together in a more meaningful way. But, I think when you start really talking about, you know, these more sophisticated simulations and prediction models around, you know, disease, etiology, disease progression, right? And you're trying to predict disease progression or medication response, you're going to need a much better understanding of how a person's, you know, intrinsic makeup like their omics, relates to their healthcare outcomes and how environmental impacts also affect that. And so I think you're going to need this nuanced data that you know, to some extent sits in physician notes today, but I think to a large extent isn't, in a lot of societies, being collected holistically, right? Because what you think about our healthcare experience in the U.S. ,what's going on, well, most people see a physician once a year, unless you start to really have a problem. And then you're seeing a physician more frequently and you're getting more data collected about you. But even that data tends to be fragmented through images that are stored in one database that don't talk to the physician, you know, really relate to the physician's notes on another database. So I think we've got to fix that and we got to get much more nuanced picture of a person's health over time to really make these more sophisticated predictions.‍

Dr. Karim Galil: There are hundreds of vendors out there promising machine learning and AI, Mendel is actually one of them. There are also hundreds of vendors offering data assets with everyone claiming they have the best technology, best data, the most longitudinal, the most comprehensive. With so many companies like these trying to sign or work with a top pharma company like Novartis, how efficient is your gatekeeping strategy in assessing a vendor’s value? Every week we hear about a new startup, a new VC funding, or a new company with a very creative solution and I’m just curious to understand how you differentiate between promising projects and ones that might need a bit more work. ‍

Jacob LaPorte, Ph.D.: Well, first of all, I'm always inspired by the tremendous entrepreneurs out there really trying to take on, these, health care issues in new and different ways. And so I think what we're seeing is just a product of so many opportunities out there really be getting a number of different startups and others kind of approaching solving these problems in new ways. And I would never, you know, never want to get rid of that right at all. I think the question we're asking ourselves at Novartis is how do we make it a better experience for those partners to kind of interact with us and make sure that the right people in Novartis are talking to the right people in the external ecosystem and that they have the right support system together to really create a novel digital solution. That's more of the kind of question that we're asking ourselves. Sometimes, within that, we'll present different tests to an external partner to really make sure that they have the right fit for the specific problem that we're trying to solve, because they might have a great technology, but it just might not be the right fit for what we're looking to do. So we want to definitely put that in place to make sure again that the right conversations are happening at the right times and that no one feels like this is a waste of anyone's time. So we've been thinking a lot about that at Novartis. And so one of the things that we're thinking about is how do we actually meet people virtually right off the bat. And we're creating this product called the digital brain. Where, we'll be able to have people, anywhere in the world that are creating their next big idea and their next company be able to upload their profile into this system. You might even, for instance, be able to sign a one-click NDA and start to get access to some of the problems that we're trying to solve. We might be able to ask you some questions online and kind of filter whether or not that problem is relevant for you and therefore, route you more directly to the right person in the company to talk to. And we just think that's going to be a lot better experience for everybody involved. So that's something we're actively working on today. ‍

Dr. Karim Galil: Wow. I would assume that a one-click NDA in the pharma world must be harder than creating an Organ-on-a-Chip. A one-click NDA is pretty hard, with the process sometimes taking thirty days today just to get the paperwork done! So the option for a founder to just click a button and get access to what kind of problems they need to start working on, that is very impressive. ‍

Jacob LaPorte, Ph.D.: Yeah as you indicated, it's definitely a journey, right? This isn't easy to get to the next level of becoming a pharmaceutical company powered by data and digital. So, we're obviously approaching this from multiple different dimensions, and one of them, as you sort of alluded to, is culture and talent. So we've been able to really ramp up the talent is in our company that really has an expertise in data and digital. We've hired some phenomenal people from some of the best institutions and companies out there. And that's just fantastic to see. And as a result, we're also kind of changing the culture as well and you know, frankly, the biome is one element of that, right? How do we kind of fluidly interact and co-create with this powerful, external ecosystem of digital health and tech companies? Answering that question is going to be really important to our culture and how we transform. So we've been thinking a lot about that, and I think you'll start to see some really meaningful evolution as a result of those efforts.‍

Dr. Karim Galil: It’s very interesting that you’ve touched on talent. My co-founder comes from outside of healthcare. Once he finished his PhD, all of his job offers were from top tech companies such as Amazon and Google. Something he always reminds me about are the challenges an individual with a CS background faces when coming into an industry like healthcare. For starters, there is a lot of domain experience within healthcare that is not easy to just “pick up” right off the bat. Another challenge is the lack of centralized data that many healthcare companies provide to AI scientists, as opposed to Google or Amazon having 20 years of aggregated and cleaned data making it easier to gain insights from data. What I would like to know is how you guys are convincing individuals from outside healthcare, that have the technical expertise to endure these challenges, to come work for a pharma company? How are you competing against big tech companies like Google? ‍

Jacob LaPorte, Ph.D.: Yeah, absolutely. I think it comes down to a couple of things in my mind. So on the point of how do we attract talent? I mean, one thing is that, as you mentioned, these are hard problems. And I think a lot of people want to solve hard problems, right? So, I mean, I know that's what sort of attracted me into science and has driven me through a lot of my career is looking at how do I make an impact? And usually when you make an impact, it's not solving an easy problem, right? Otherwise it would have been solved. So I think that's one element. I think the other element is really looking at. How can you influence and improve, health? Right. I mean, it's a very important problem that touches a lot of people. And so I think what we're seeing is a lot of these folks that are passionate, yes, they have a technical background, they're great, you know, data scientists, but they want to make an impact in healthcare because they know that this is fundamentally a very important thing for our society. So that combination of having a hard problem that at the end of the day is going to create meaningful societal value, I think is half of the value proposition. The other half is at least for Novartis is our global scale. So if you want to make an impact, come to Novartis, because once you solve that problem in one area, we're going to look at how we replicate that across the world and really touch. I mean, we're operating in over 90 countries and really interacting with all those health systems. So, I think that also attracts a lot of people to our company as well. But as you mentioned, I mean, there's a lot of people starting to ramp up, their talent polls in this area as well. A lot of healthcare companies and Google and Facebook and all those folks are trying to get more and more into the healthcare space. So we're not taking anything for granted at Novartis. We're continuing to think about how do we sharpen our value proposition for people. How do we continue to tackle these very challenging healthcare problems that we know will inspire others to come and join us in our mission?‍

Dr. Karim Galil: You guys now have a dedicated AI team within Novartis’s organization, right? ‍

Jacob LaPorte, Ph.D.: Yeah, that's right. I mean, at some point in time, you know, it was very clear to us that AI, although it's like a technology paradigm, but we knew that that was going to be really fundamental to a lot of things that we're trying to do. So therefore we're very interested in creating that backbone in that platform. That will allow us to deploy that technology paradigm into various different solutions more effectively. So yeah, we do have a dedicated organization.‍

Dr. Karim Galil: Reading your blog posts and talking to you, I feel like one of the individuals who really inspired you is David Eddy. You had mentioned him in one of your blog posts, would you like to talk about Eddy? He really built an amazing team. Actually, now at Mendel, we are looking for people that work at Archimedes because these individuals were on the forefront of the whole concept of patientless trials when it was not popular, which I believe to be very courageous and brave back at the time. ‍

Jacob LaPorte, Ph.D.: Yeah. I mean, I don't really know David that well, to be honest, but the little bit I know of his work, it's just fascinating. Right? I mean, this is a guy that, you know, just has so many different talents. He is a physician, but yet also a scientist that really, you know, also did some amazing, incredible mathematics. And, you know, I think a lot of people look at him as one of the fathers of clinical trial simulations. He developed this program called Archimedes which was able to reproduce a large outcome study that the, I believe the national healthcare system in the UK had run around one of the statins. And he was able to kind of reproduce that you using these clinical trial simulations. And I think that led to this whole discipline around clinical trial simulations being created more or less. And it has been one of the major influencers of us even talking about the reality of humanless and patientless trials in the future. So I always get a kick out of people that just, they seem to have so many different talents and they're able to pull them together to create new and fascinating solutions. So, you know, David, if you ever hear this podcast, hats off to you, I mean, you just had an amazing career.‍

Dr. Karim Galil: So how do you think 2025? How is it going to look like in regards to clinical research?‍

Jacob LaPorte, Ph.D.: Yeah. So I think, I think the probability that some institution or some company will ultimately create a competitive advantage in some part of the value chain and it will force others to catch up quickly as a result. So, therefore, I actually think 2025 is going to look a lot more tech enabled than probably what I expected at the beginning of the year as a result of this pandemic. And so I think what you'll find is different people are going to innovate in different parts of the value chain, but it's going to force everyone to rise to that new expectation. And so I think it's going to catalyze a faster result.‍

Dr. Karim Galil: My last question, if you can zoom call any living person today, who would it be and why?‍

Jacob LaPorte, Ph.D.: I guess it would probably be Ray Kurzweil. All right. And for people that don't know, Ray, I certainly don't so I guess that's why I wish I would be able to zoom call him. But, again, another amazing scientist. Was one of the foundational scientists around voice recognition using AI to do that at MIT. He's now I believe still is head of engineering at Google, but beyond that he's written several popular science books around a concept called singularity, right. And the whole concept of singularity, I might butcher this a bit, so you'll have to forgive me Ray, but it's a point in time at which we're going to be able to create human like knowledge with AI. So essentially really passing the Turing test in all earnestly, and at that point, AI is going to be able to create all these fascinating technologies and it's anyone's guess as to how the future will evolve from there. But I remember reading that one of his books when I was traveling for an extended period of time after I left my consulting role at McKinsey and company. And that really inspired me. It seems, it seems cheesy, but it really did inspire me to go down this journey of thinking about how to digitize the R&D engine of the pharmaceutical industry. Which has led me, you know, it ultimately led me to Novartis, it ultimately led me to think about patientless trials, it led me to my current role, which I absolutely love doing. So he's had a tremendous influence on my career and probably doesn't even know it.‍

Dr. Karim Galil: Ray, if you can hear this podcast, please zoom call Jake. His book was actually translated into nine languages and it's one of the best selling books on Amazon so great choice, Jake. I'm just wondering if you’re going to talk to him or to his AI version? This is a guy who may have an AI version of himself. People that smart they can build a “Ray-On-Chip”, if there is a need. Hey Jake, thank you so much for taking the time to do this podcast. As always, it's been very inspiring. I’m sure our audience is going to find this to be really, really cool. Everyone, reach out to Jake on Linkedin, read his blog posts. They’re very inspiring and interesting. But again, thank you, stay safe, and hope to have you on our show another time.‍

Jacob LaPorte, Ph.D.: Yeah, no, absolutely. My pleasure. I'd love to come back and thanks so much for having me. I really appreciate it.

PODCAST — 44 minutes

Brent Clough, CEO of Trio Health on Patientless Podcast #001

Welcome to the first episode of Mendel's Patientless Podcast. Our first guest is Brent Clough, CEO of Trio Health. Brent started his career in the financial sector. He was a VP at Goldman Sachs; founded a very interesting company in healthcare: IntrinsiQ, which pretty much build the largest longitudinal patient database at the time and was later acquired. Brent is now a co-founder of Trio.

LISTEN AND READ →

Brent Clough, CEO of Trio Health

Our guest on Patientless Podcast #001

Dr. Karim Galil: Welcome to this episode of Patientless Podcast. Today's guest Brent Clough, founder, and CEO of Trio Health. Thanks for being with us on the show Brent.

Brent Clough, CEO of Trio Health: Thank you for having me. I look forward to our discussion.

About Brent Clough

Dr. Karim Galil: Brent started his career in the financial sector. He was actually a VP at Goldman Sachs. And then he founded a very interesting company in healthcare: IntrinsiQ, which pretty much built the largest longitudinal patient database at the time and was later acquired.

Brent is now a cofounder of Trio. Trio leverages real word data for commercial and clinical research excellence, and they have a very interesting model on how they are capturing data and how they're ensuring the fidelity of the real world data. Trio obviously is a very established player in a super crowded space. I'm very happy to have Brent on the show today to share with us his story of starting Trio, what is unique about Trio, and how he sees the real world data/real world evidence industry today.

Before we get started, I think my first question is what attracted you to healthcare? I mean, pretty sure it doesn't pay as much as Goldman Sachs, and it's a pretty sophisticated industry, so I would be very interested to hear what was attractive for someone like you to come into healthcare and build a couple of companies, again, in a very crowded space.

Brent Clough, CEO of Trio Health: Yeah, that's a great question. So, I did spend first 16 years of my career in financial services and then switched over to healthcare and have now been in healthcare since 2004. So about half my career is in financial services, the other half in healthcare.

And to answer your question specifically, some friends of mine had invested in a small oncology software and data company called IntrinsiQ, which is outside of Boston, based on a high profile overdosing death that happened at Dana-Farber. And so, the founder of IntrinsiQ was a physician and a programmer as well as an attorney and basically looked at this and said, Jesus, if this could happen at Dana-Farber, in terms of a dosing death, it must be happening in other areas of the country.

And one of his buddies was an oncologist in upstate New York. And so, they started IntrinsiQ, really as a safety solution in terms of that, to align with the, the way that oncology patients were treated and managed in the early two thousands, which a lot of it was infusion. It was weight-based.

The calculations had to be very precise and specific and unfortunately in this situation, the woman that was overdosed and killed, the calculation was incorrect. It was missed by the pharmacist and physicians.

So what attracted me was in terms of what we are seeing in terms of healthcare, in terms of just the lack of technology and sophistication that was being applied to managing these type of patients and this type of workflow as contrasted to Wall Street that, you know, you could sit on a trading desk and pretty much get information at your fingertips and in virtually seconds.

When you contrast it to kind of financial services in terms of data and analytics and technology, relative to kind of where the healthcare industry was 15 - 16 years ago, it was night and day. And so I actually, through some of my friends that were investors, got introduced to the CEO at the time and the founder and ended up striking a cord with them in terms of really trying to bring to bear a lot of my knowledge and context and relationships and trying to think about how it could be helpful in applying that to the healthcare industry.

And so within about 12 months of joining IntrinsiQ, I was promoted to become the CEO of the company and then ran the company for a number of years. And what we did is we sold our application to over 120 sites across the country with about 700 oncologists, both academic and private practice, for them to more safely and better manage their oncology patients.

So we had oncology nurses that would go out and train the physicians and so forth. And then in the old days, because the internet really wasn't as prevalent, we would have the sites effectively phone home once a week and, in terms of transmit over the internet, a de-identified file of the patients, and then our analytics and operations team would un-package that data and put together and build a longitudinal records in terms of looking at how these patients were being treated, and again, what their outcomes were.

And so it was a really novel time in terms of there's a lot of drugs that were being developed: Erbitux, Avastin, Herceptin, a number of big drugs that are blockbuster drugs were just launching in the early, you know, call it 2004 - 2006 period. And so we saw really this transformation in terms of a big, bolus of new drugs that were being launched into the market.

You had people like Michael Milken that were launching the prostate cancer foundation and what he's done in terms of transforming that disease. And then you had, NCCN guidelines and Bill McGivney, who was starting to build the evidence based pathways.

Dr. Karim Galil: There wasn't even electronic medical records. They weren't widely adopted back in the early two thousands.

Brent Clough, CEO of Trio Health: Yeah, exactly. And so it was an interesting time in terms of, like I said, in terms of like being at the beginning part of early stage in terms of seeing this adoption, as well as starting to deploy clinical evidence and pathways to real world patients. And so, we formed Trio Health in 2013.

And like anything in your career, you've got to learn from him, your mistakes, you learn from kind of the shortcomings and said, jeez if you're going to bring the band back together and do it kind of better, how do we do it? We basically took a lot of the knowledge and things that we learned both at our company, as well as just from our peers in the industry and created Trio Health, which is really principled around building a network in terms of having direct relationships with each of the physician practices, as well as all the additional stakeholders that touch the patient.

Trio actually stands for: physician, pharmacy and payer. So we thought of really those three stakeholders as the stakeholders that could impact the performance of a real world patient.

And so, we developed a technology platform and a business methodology about bringing together all that disparate data, so that we could have what kind of 360° view of the patient. But then what we also recognized just from a technology platform is the inherent deficiencies of trying to record and collect that information from EMRs in different, you know, technology platforms that the stakeholders use that obviously didn't mesh well together, as well as didn't fully encapsulate in terms of all the facets of that care. So in our technology platform, we had to build a two way communication so that we could go in and supplement, adjudicate, and validate information that we couldn't get through the nightly file.

Dr. Karim Galil: Looking at your website, you guys are talking about, the fidelity of the data that you're collecting in comparison to the current methodologies business or technology methodologies in the industry. At Trio, how do you guys define, how real is the Real World Data?

How do you define how good is the data that you guys are capturing compared to several other players in the industry?

Brent Clough, CEO of Trio Health: Yeah. So I think, the way that we look at the landscape is I, I think that there's, there's really kind of two distinct categories: There's a whole group of companies that are focused on very large databases in terms of looking at hundreds of thousands of if not, millions of records in terms of within a specific disease area.

And then there's other companies on the other side. That look more like a registry or that go much deeper, right. In terms of collecting very specific information on the patient. And so when you kind of look at it, it looks like a little bit like a barbell in terms of people are either on kind of one side of the equation or on the other side.

We felt though that we could use both technology as well as almost kind of registry software concept and be a little bit in the middle. And so when we looked at kind of our approach, depending upon the disease coverage, we figured out how could we build a database that had, for example, a hundred thousand rheumatology patients, yet was really on the other side of the barbell, which was very deep in terms of having pharmacy data, information contained from the office, visit notes to labs, to infusions, to kind of all the data that we would want to do.

So that's kind of the role in the niche that we fit. We focus on, in terms of really trying to leverage kind of the value of both of those, in terms of using technology and nightly files, but then also really almost in terms of the registry, which has gained very specific information on very specific fields that we need relative to kind of what are their objectives for that, it's either study or the disease that we're trying to understand and focus on.

Dr. Karim Galil: One of the things that are really exceptional about Trio is that you guys have this broad coverage of different therapeutic areas. You guys are working in rare diseases, you guys are working in rheumatology, you guys are working in hepatitis, and that comes with a lot of complexes. How can you train your team to be able to cover all these therapeutic areas?

How are you going to also be able to build this model where you're able to attract different providers coming from different kinds of specialties and convince them to share data with you? Can we talk more about that? I find that very intriguing about the company that you guys have built.

Brent Clough, CEO of Trio Health: Yeah, it's a great question. So Yoori who co-founded the company with me, her background was really more on the qualitative side in terms of amassing a very large rolodex of key opinion leaders within each specific disease areas. So what's really important in terms of that piece of the business is really, if you think about it, is we start before we enter a disease, as we started in hepatitis C, with the launch of the new DAs in late 2013 and 2014. And we started really with developing a scientific steering committee of key opinion leaders that are across the country that were highly respected by their peers, the manufacturers, and the payers.

And we really use them as our north star to number one: define what is the data that we need to collect on real world patients. These were all physicians that are treating patients. So it was helpful to have kind of realtime insights in terms of: what were they being confronted with on kind of a day to day basis, as well as the evolution of the disease. The third is that we use the scientific steering committee, as I said, to go out and recruit and to build the network. We build every disease organically in terms of one practice at a time, and we signed a business associate agreement and MSA.

The final point that we do is now that we we've used the scientific steering committee for kind of their qualitative expertise, and now we get the quantitative data, we can bring those two important pieces of information together, as well as sit on top of live and active network of physicians that are managing treating patients.

So we have the ability to adapt very quickly. In terms of, to the disease, but also be very responsive in terms of when we start to think about our output and all of our studies that we published today, which is, an excess of over 120 studies all have been authored by the scientific steering committees in collaboration with our statisticians and our analytics people.

And so when you think about the importance in terms of what is the point of the study and what is its relevancy, you're getting an interesting perspective beyond just, an RWD or RWE, company in terms of you're really getting, the physicians that are being respected and are managing, treating the patients so that when we go to submit these studies to medical conferences, they typically are on, you know, obviously forward thinking and really thinking about in terms of what are the specific issues that physicians and patients are confronting on almost a real time basis.

Dr. Karim Galil: It seems to me like your scientific committee is at, a foundation of your business model. It's kind of the core of the company and you built business processes, you built technologies, you built different things, but at the very core, your scientific committee is driving the company. I find this to be attractive to a lot of providers and pharma companies, knowing that this is not a tech play, it's a teamwork between clinicians and technologists.

How were you able to assemble your scientific committee? I think that's one question. The other is, what's in it for the sites to sign an MSA with you, share their data, and contribute to the registries that you guys are building?

Brent Clough, CEO of Trio Health: Yeah, I think echoing to your point that you just made, you know, I, we kinda think of, our scientific steering committee is a little bit of a Trojan horse. It really starts and ends with them. Number one, they give us the credibility immediately amongst their peers. Because again, they're backing this. They're supporting this.

The other piece that is important to note is that we file all of our research as investigator sponsored research. Which means that we get sponsorship from manufacturers, but it's really an arms length transaction so that the authors of the study can't be influenced or tainted.

And it really is up to them at the end of the day, in terms of the methodology, the findings, and everything that we come up with, the conclusions related to that study, which I think is important because. What it does is it really provides the integrity at all levels in terms of that really facilitates our business model, which is why physicians want to join the network.

And in some cases, a lot of the practices join and we don't provide them with any type of honoraria or financial payments, but they're really doing it for what's in the best interest of their patients and for the best interest of care. And so we've been very fortunate in the fact that we can build very large data cohorts in terms of having diversity of academic physicians, private practice physicians, and get the geographic diversity.

Because when you look at the leadership of the scientific steering committee and kind of their track record related to their participation in clinical trials and getting the disease state to where it is today, they want to be part of this. And I think of it a little bit in terms of giving back in terms of, to the patient, as well as to promoting, in terms of best practices for the patients within that disease state.

Dr. Karim Galil: So you are a proxy between sponsors and clinical research sites where you enable the sponsor to learn from the care of each patient that went through that site in a digital way where you don't have to recruit an actual patient, talk to them, consent them. This is the theme of our podcast is patientless trials.

What we mean by patientless trials is not necessarily, getting the patient out of the equation. It's actually, the patient is always in the center of it, but rather than the patient contributing in a clinical setting, the patient is contributing in a data setting where they are basically leveraging their data.

How do you define patientless trials at Trio Health and how do you see the clinical research industry moving from a very clinical setting centric kind of an approach to more of like a digital centric approach where data is leveraged in many different ways?

Brent Clough, CEO of Trio Health: It's a great question, a complex question. Our objective is to best represent the patient by having the most comprehensive data set that provides the greatest insight. And I think what we're most proud of at Trio health is a lot of the patient advocacy work that we do.

So when we bring all that disparate data together and it's "patientless", meaning, you know, the patient's actually not involved in terms of providing supplemental information, but really catalyzing all that information on that patient. I've got two examples that I think that we've been very successful about:

One is, the new hepatitis C drugs transform the disease to, as you probably know, cure rates exceed almost 90%, greater than 95%. With a drug that you take once a day for eight weeks with no side effects. And what we found is with our database, in terms of the timeliness of the updates and the pharmacy and the clinical data, is we found this huge disparity across the different Medicaid States in the country.

And we published a study, that received a lot of awards and recognition and it was on over 20,000 patients where we looked across 40 different Medicaid States. And we saw that Ohio Medicaid had a 95% denial rate of patients as contrasted Connecticut that had a 95% approval rating, which is crazy in the fact that we were looking at patients that were cirrhotic.

That again were not high risk patients in terms of patients that, were stereotyped as living under the bridge or being drug users in terms of they were 'high risk'. And what we were finding were these were patients that, again, you know, one woman and we actually publish a book on our website was basically infected with hepatitis C based on a blood transfusion because she was bleeding out during a pregnancy. And in those days, obviously they had not screened the blood well enough, and so she was tainted with the hepatitis C strain.

And so again, what we did is we use that information, we published it, but then we also went to CMS and shared with CMS that manages Medicaid and showing the disparity across the different states and said, this was completely egregious in our book. We titled it Is This Really the United States of America or the United Countries of America?, because how can we be seeing this level of disparity?

The second piece that we did in collaboration with NORD, which is the National Organization of Rare Disorders, which is a not for profit, is we in collaboration with the FDA looked at, all for pro bono, is how we looked at six rare diseases by which there was a diagnosis for it yet, there was no approved treatments.

And so we took all the data that NORD had collected in a registry for their natural history, and we ended up publishing and presenting at the respective different conferences around the world. And we also published a book around trying to create awareness to the investment community, as well as pharmaceuticals, in providing more insight in terms of these types of patients to see if there are things in the pharma portfolios that could be helpful in terms of being potential solutions.

And so again, you go back to this high quality data and this kind of patientless concept. Trio and our scientific steering committee look at and say: "Okay, how can we give back in terms of helping to promote therapies that are going to improve the quality of care for these different patients?" Be it natural history where there really is nothing approved and trying to create awareness on the disease to second, looking at unbelievably transformational drugs in hepatitis C, that are still being denied in United States with massive disparity based on different payers, both commercial payers and in my example the Medicaid States.

Dr. Karim Galil: This was a great example. Disparity is actually something that you cannot capture in a randomized clinical trial in an easy way. I love that example. You have also worked on rheumatology registries and you were able to collect longitudinal and comprehensive data. We at Mendel find it very hard to do what you guys are doing because:

One: Being able to be EMR agnostic is not easy today. And unless you're being EMR agnostic, you have this selective bias where you are biasing your data base on a set of research sites or a set of sites that are using a specific EMR vendor, while you want to achieve this breadth of sites and you want to be EMR agnostic and that's technically not easy.

Two: We see is a lot of the data actually exists in a non-machine-readable formats. 70% of the data today are faxed. The healthcare industry is one of the very few industries that are still using fax as a preferred method of communication. We also see a lot of the doctor narrative being indirect where they're expecting every one who's gonna read this note to be a physician, so they don't have to explicitly describe everything. Those are like some of the challenges. How can you integrate with different EMR vendors? And also, how can you deal with non machine readable formats like faxes and doctors who are not necessarily explicit or structured in how they describe a patient journey?

Brent Clough, CEO of Trio Health: Yeah, I think you're absolutely correct. And we kind of think of it as three different levels, right? There's kind of the, the most basic and common model, which is nightly files. The second is really AI and OCR technology. And I think the third is the good old-fashioned roll up your sleeves with a clinically trained certified person can remote log in and read chart notes or scan documents or things that obviously don't meet the first two criteria, and really start to put together that patient story or build that mosaic. So you can understand it in an example, you know, that, I think in rheumatology to your question, what every manufacturer wants to know, and what every payer wants to know, it's not, you know, we all know what happened, but we want to know why.

So when we look at a rheumatology patient and we say, jeez, you know, they started on Humira and then they switched over to Xeljanz, you know, why did they discontinue Humira? And why did they select Embrel as a, as a second or third line or fourth line of therapy?

And so what we're, you know, we did in terms of, we looked at really using all three levels of that to answer those questions with the third level being, we actually have certified chart abstracters that have been clinically trained to go through, and look at entering the discontinuation reason. And what we uncovered specific to rheumatology, which was interesting, is that a vast majority of the discontinuation reasons is based on patient tolerability.

Where if you go back to kind of oncologists and they go, jeez, this is standard protocol for us to manage patients with pain, nausea, diarrhea, rashes, and so forth. And what the rheumatologists are telling us is, you know, we actually don't do a very good job managing patient tolerability. So what we're trying to do is uncover for rheumatology specifically, going back to your question in terms of these three levels, is how do we bring this unique insight to help advance the disease?

So how could we help physicians understand in terms of the prevalence of a particular category of patient tolerability? And then how could you potentially work with the patient hubs and support paths from the different manufacturers to do almost realtime triage? And so now, you know, if I was going to discontinue because of a GI abdominal pain, the question is, is there a way to help mitigate that? In terms of to keep me on that therapy yet manage that derivative or a derived side effect that could be correlated or non-correlated. But at the very least it's the basis for why there's a switch. And so, again, we think that there's a lot of still great opportunities in terms of really taking a comprehensive view.

In space like rheumatology , there's a lot of entrenched competitors. There's a lot of people that have real world data, but it's really trying to think about how you creatively look at bringing together all the resources and capabilities to draw out some unique insight to 'advance the disease state'.

And so that would be an example where we're super excited in terms of some of the work that we're uncovering in rheumatology.

Dr. Karim Galil: How did COVID affected Trio? Is COVID the catalyst for real world data studies or did it slow down or change to where it's more data driven trials? How do you see COVID today affecting the real world evidence industry?

Brent Clough, CEO of Trio Health: I can't speak on behalf of CROs other than I know the clinical trials have obviously been stalled and it's been a difficult environment. I think what's been fascinating going back to the beginning of our conversation around this kind of barbell strategy around real world data companies, in terms of being on kind of one end of the spectrum, you know, I think there's a lot of great work on the large data sets to look at prevalence and looking at different populations in terms of how they're being impacted and what the outcomes are.

And then I think if you look at kind of what we're looking at on the Trio Health side is, is really going down 10 or 15 different levels.

So we may only have a database of a hundred thousand rheumatology patients. Yet, you know, we're tracking in terms of patients that have been diagnosed with COVID and then also measuring and looking at their outcome and having the notes and all of those detailed information. And I think the question that we're trying to look at is are some of the rheumatology drugs delaying onset of disease.

There was a webinar that we hosted two or three weeks ago. We're kind of looking at it in terms of, at a very detailed patient-level specific function, versus there's still a lot of value at the macro level, at the epi-side, in terms of doing that. So I think, as it relates to COVID, it presents a unique environment in terms of the clinical trial development.

But then when you look at the real world data companies, I think that real world data companies have evolved a lot in the last 15 years. So I think they play a role, a very important role, but that's kind of the macro level and the micro level, which is going to be highly complimentary to helping us solve these types of complex problems that we're confronted with.

Dr. Karim Galil: How do you see pharma companies and how they perceive real world data? Do you think they perceive it as a vitamin or more of a painkiller? Is it something good to have or something you must have? I mean, obviously you have been in leveraging real world data for more than 15 years, so you can see the adoption curve.

Are we there yet? Are we at a point where they feel like this is a painkiller or we're still in the vitamin stage?

Brent Clough, CEO of Trio Health: I think we've made a lot of progress. I think the biggest problem with real world data for the last seven years up until the last year or so, or two years ago, has been the confusion around how to use the data, understanding that there is no perfect database.

And I think that, it feels like in the last 12 to 24 months, there's really been a lot of progress made in terms of really kind of ring fencing and understanding within each of the different companies kind of what their capabilities and what their best use cases are. Creating that level of clarity, where we can all add value in some capacity, but understanding where we excel and where our weaknesses are, I think is what's critically important.

And I think that's starting to flush out more and more in a, in a more accelerated rate. And I think any time that you get to that level, then I do think that you, you know, using your analogy in terms of, the vitamin or the painkiller, I think in certain situations they both exist right. In terms of it becomes helpful, to programs that they're trying to advance internally.

And then I think it also becomes necessary or required. But again, I think the starting point that we should all be focused on is making sure that our clients understand with complete clarity and transparency in terms of the good, bad, and ugly. What are we good at? What are our deficiencies and what should you not use us for?

And I think, I think when we get to that level of transparency I think it's gonna be best for the entire industry.

Dr. Karim Galil: A lot of our audience are actually executives in the pharma industry and we always get the question: I have sent an RFP, now I have like 10 vendors, and I need to assess where they are (to also borrow your analogy) on the barbell. How comprehensive is their data? What are the right questions to ask?

That's still the industry trying to figure out what are the criteria or the framework where you can evaluate a vendor and understand where do they stand on the breadth and depth of curves when it comes to their data assets. So, what would you advise? What kind of questions? If you are a pharma executive today, what kind of questions are you going to ask?

Brent Clough, CEO of Trio Health: Yeah, I feel like, the pharma industry has evolved a lot in terms of starting to develop those questionnaires and methodology that provides kind of that 'nowhere to hide' for the different vendors or people receiving or responding to those RFPs. And at the end of the day I think it's, at least from the way that we manage our company, we just think it's a mistake to try to misrepresent us because at the end of the day, there's nowhere to hide. And so what you end up is just in a bad situation.

And so from our perspective, we welcome the transparency and it's imperative for us to, for both our clients as well as even our physician partners and networks, in terms of making sure they understand what are our goals, objectives, and what can we do and what can't we do. And over a history of seven, eight years, we we've had to rein back some of our members who are assigned to a steering committee, who get excited, and start talking and say, Oh, we can do this, this, this, and this. And, and we have to rein them back and say, no, no, we can't do that. Right. That's not feasible.

I think that the nice thing is that the industry has evolved, to a point where the level of knowledge is there. I think transparency is now getting to the place where people now understand where they fit and what their strengths and weaknesses are. And I think with transparency, you're going to see a wider adoption and more use cases in terms of how real world data can be applied across a broader spectrum than even exists today.

Scientific Committee at Trio Health

Dr. Karim Galil: I want to go back to the scientific committee at Trio. You explained to me how you guys run that. Do you have like representation of different specialties? Are they full time employee of Trio or they're more of a scientific advisor or consultant? How are you guys able to build that kind of committee, and keep them engaged with the amount of business that you guys are generating?

Brent Clough, CEO of Trio Health: I would tell you that not one person on our scientific steering committee does it for the money or any type of honoraria. I think that we've always positioned from day one that we have to be the North Star in terms of clinical evidence. We need to pave the way for the disease state, in terms of doing really novel and transformational research, that provides and sheds new insight and that is going to advance the care of patients for that specific disease.

So, first and foremost, it really starts and ends with the clinical integrity that we bring to the table, as well as the studies and methodologies that we bring forth in terms of each specific disease state. The second piece is in terms of how do we get them.

You know, our goal is always to try to get an oral presentation of merit at a major medical conference. And when you get an oral presentation, as you know, you're the best of the best in terms of your 1 to 2% of all submissions that make that cut. So we're very much focused in terms of applying our skill set in terms of the data, but also remember that the knowledge that exists within the active physician network to try to really think about, you know, what are the issues confronting these patients real time? And how can we look at it from a safety point of view, an efficacy point of view from different patient cohorts, to even as I mentioned earlier in terms of looking at access to care around payer denials, and so forth.

So I think our attraction is number one, that clinical North Star position, but then to your specific question, each disease state, we have a core of typically five to seven key opinion leaders that serve as the foundation but we will bring in different key opinion leaders for subspecialty expertise within that disease state.

So in hepatitis C, we could have an expert that focuses on the co-infected population, which would be hepatitis C plus HIV. Or we could look at physicians that treat high risk patients based on median income, in different zip codes and so forth, that have a lot of different co-morbids. So for us, it's having that foundation in terms of the rigor and making sure that what we're doing is clinically sound, methodology and so forth.

But then also recognizing that, you know, if we're looking at weight gain and HIV based on the aging population, we need to bring in some of the top statisticians that can deal with this very complex issue that may be out of the purview of our core team. So we have no ego, and our scientific steering committee has no ego as it relates to "it's only these five people that are the authors of every study". It's really about how do we best position that analysis that we think is incredibly important in terms of that topic, so that we can get to that] oral presentation level at that medical conference and to create the greatest awareness and the greatest impact has really always been our focus.

Dr. Karim Galil: You guys are not only leveraging reword data. You're leveraging the clinical integrity of the clinicians and the scientific community. And I find this very, very intriguing.

What's the good, what's the bad, what's the ugly with the state of AI in healthcare?

Another question I have: I want to see from your perspective, what is the good, the bad and the ugly of AI in healthcare? Where are we today? What are the challenges? What are we good at? What are we not yet good at, when it comes to AI? Why are you still using human abstractors? I mean, obviously you see a lot of AI companies saying, listen, we have the best AI out there, but still the industry are at a point where almost every real world data company has a core human operation at the very core of its DNA.

Why?

Brent Clough, CEO of Trio Health: It's a great point. Back in 2004, I actually hired a number of data scientists and we built machine learning algorithms to predict and see if we could help, in terms of market share and understanding, a number of different oncology products that we were developing for a suite of clients.

It was interesting because you learn from direct experience. And so I go back and I would say to the AI companies, and OCR companies, the same thing that you would say back to me as a real world data company, which is, what is the best use case of your platform in terms of the data and the assets and the capabilities of your team.

And don't misrepresent yourself. I think that again, generically speaking, I think a lot of AI companies said, look, we can solve the world's problem and do it very well and we can solve it. We can, we can basically be a solution for everything. And I think we all know that AI companies can't be a solution for everything, but they can play it very, very important role in terms of different aspects.

And if I go back and look at rheumatology, it may be difficult for an AI company to go through and read notes to the level of a clinically trained person that has to put together lots of different disparate data. And I'm not saying that AI can't get to it, but then I look at a physician, a patient global assessment forum, I look at HackMD. I look at different things. I go, geez, that AI company would be terrific, in terms of a whole bunch of different capabilities that they could be a creative to for Trio Health and other companies.

But I think it goes back to defining, as I say, your swim lane, in terms of where you can best apply AI and some of the proprietary technologies that the AI companies have developed, and making sure that you kind of stay within that swim lane and not overstep your bounds, no different than I would say the same advice to Trio Health, which is: What are we good at? What are we not good at? And where should you go to potentially one of our competitors or a different vendor to answer those questions or to, you know, solve

AI-only FDA Submissions

Dr. Karim Galil: You guys use a lot of the term FDA level which it basically refers to a data set that can be, okay or meets the benchmarks that FDA has for data integrity. When it comes to AI, have you been successful to make any FDA submission using AI only, or is it always has to include some sort of a human curation layer on top of your data processing techniques?

Brent Clough, CEO of Trio Health: Yeah. So it's a great question. Actually, we just signed a partnership with Greenleaf Health last year that has a regulatory advisory group in Washington, DC and they're all former executives of FDA and spent a long time there. And again, where we look at our collaboration with them as being the regulatory experts in terms of knowing what is really regulatory-grade data. Everyone talks about it, it is a widely used term. But at the end of the day, what, what does it really mean? And, we look to Greenleaf. Our partnership with Greenleaf has really been being the experts, since they sat in that chair at the FDA for a good chunk of their careers. And so I would tell you very simply for us, I use this term ability to validate adjudicate and to supplement. And so if I can represent back to Greenleaf or to FDA in terms of the source of every data field how I received it, who gave it to me, how did I verify it and so forth, it is part of that process that we do.

And I think that, again, we, haven't not had any direct experience yet in terms of using AI in terms of as a submission or 'regulatory grade', but I believe that just because we haven't done it doesn't mean that it doesn't exist. And I think that there clearly is a role and I think it's just, again, for the agency to understand in terms of the process and methodology. No different than developing a research SOP in terms of the analytics SOP, which is okay: You've got a highly curated data set. And then how did you transform it, you know, based on your statistics and approach and methodology. And documenting that process.

And I think AI should and would be an important component of that as long as the agency can understand the methodology, the process, and exactly how you transform or got to the conclusions that you did.

Dr. Karim Galil: Which is very challenging because a lot of the AI techniques today they used are deep learning, which is not really self explanatory systems. It's very hard to explain how was the outcome generated. And I think this is a very challenging, as AI companies in healthcare have to figure out how can you use AI techniques that are still able to explain how they are able to come to these conclusions. But that's a great point that we need to be able to explain things, to achieve this FDA level acceptance.

Brent Clough, CEO of Trio Health: Look, from my previous experience with the data scientists and machine learning, they uncovered some really interesting patterns and trends that obviously we didn't uncover in terms of just with all of our analytics team. And so if you look at that as really the starting point in terms of then using the rest of the process to then manually go through and verify that, I still think AI can be very informative, in terms of spotting things and in terms of early detections and uncovering things that haven't been detected yet, and doing it in a very efficient way versus kind of a human effort. And then the question is, can you couple that with the human efforts as a backstop to go through and verify in terms of bearing out that trend or bearing out that evidence that the AI company came up with.

Dr. Karim Galil: I agree that the machine has to help the human, but it's not in a position to replace the human. I think this is one of the things actually that we strive to do here at Mandela is we try to always build machines that can truly help a human abstractor or a clinician rather than try to replace the clinical role there.

2025 for Clinical Research and RWD

We are at wide adoption when it comes to the adoption curve of real world data and real world evidence. And I was wondering, how do you see 2025 from that perspective. Are you seeing more budgets allocated for real world evidence? Is it going to be matching the budgets that are allocated for traditional clinical research?

Are you going to see clinicians basing a lot of their clinical judgments on evidence that are created from real world evidence? Are you going to see payers now finally adopting value based contracts, or you still think that 2025 is not going to be where we hit that point of wide adoption?

Brent Clough, CEO of Trio Health: I think we've made tremendous progress in the last 12 to 18 months in terms of, and I go back to my earlier point, I think for us as an industry to move forward, we have to have the transparency and the confidence behind it to understand how we can best utilize real world data and real world evidence.

And I think once you have that transparency and you have that understanding, I think then the opportunities really open up, right? So you're now starting to see a lot of the buzz words around value based contracts, value based contracting, that becomes kind of the next iteration based on real world evidence.

I think, as a relates to clinical trials, you're starting to see the FDA opening up and being more receptive in terms of trying to understand, what are some of the use cases that make sense from their perspective. And so you're seeing, you know, expanded labels. You're seeing synthetic arms.

You're seeing a whole bunch of different things that are now starting to permeate within our business units. And I think there's a lot of exciting things. I think you're going to see a massive amount of changes between now and 2025, but I would go back to that change and that adoption has got to have the clear understanding and transparency to know, kind of the good, bad and ugly.

And once you understand that, then you can apply it. So it could be more precision around clinical trial recruitment, which is speed and certainty. It can be getting better data, so you're seeing now disparate data being linked together, right?

So you're getting more of a complete record on the patient, be it from their primary care physician, to their rheumatologists, to their infusions, especially pharmacy and so forth, which is again great. But, that in isolation, I don't think solves the problem, right? Because there's still the AI component and there's still the other component, which is the human element, which is, I just, I got to go back to the physician and I have to have him certify or verify that what I'm seeing is actually true and it makes sense.

And I think the combination of all three of those and how those are stitched together is really exciting in terms of, I think we're now in the growth cycle or the growth curve of our industry in terms of, of the different use cases that we can develop.

And that can be applied between now and 2025.

Dr. Karim Galil: My last question.

If you can Zoom any living person today, who would it be and why?

Brent Clough, CEO of Trio Health: Great question. You know, this may sound off topic, but probably the person that comes to mind is Richard Branson. And the reason I bring up Richard Branson is, I've never talked to him, I've only read articles about him, but he's built companies that aligned with his personality, it feels like.

And so it seems pretty amazing that you can be a serial entrepreneur and your whole theme is based on aligning with what seems to be his personality, in terms of the way that he approaches a market, and yet professional way that immediately attracts people. It would be super cool to meet someone like that. He's built his professional career around his personality.

Dr. Karim Galil: It's very interesting. You're inspired by leaders who are true to their personalities. This is a great example of a CEO who has a culture consistent through the way, right? Like the company, the CEO, they're all about the same thing as how can we be true to ourselves, to the data, and to the industry, and also to the clinical community. Richard Branson is a great kite surfer. So. I think as a good start, we need to get you on the board. We have you here in the Bay area and need to get you started on that.

Hey Brent, thank you so much for your time today. I think you shared with us great examples. I love the disparity example that you gave about this patient with hepatitis. And I think this is an awesome use case for real world evidence. I also really, really got to get inspired with your scientific committee, and how you're able to build solid science, but still have a tech and business processes to support that.

Again thank you so much for your time. All the best for you, for your family and for Trio. Stay safe. And I hope see you soon.

Brent Clough, CEO of Trio Health: Likewise, and thank you so much for inviting me. I really enjoyed our conversation today.

VIDEO — 8 minutes

Mendel on Hospital-to-Home by UCSF Health Hub

Mendel's CEO Karim Galil was a guest in Hospital-to-Home by UCSF Health Hub. Karim presented how Mendel transforms EMR data and clinical literature.

See Video and Read →

Here is how Mendel transforms EMR data and clinical literature.

Mendel's CEO Karim Galil was a guest in Hospital-to-Home by UCSF Health Hub.

Paul Grand: (...) We want to talk about companies that are validated. Companies that are out there on the market that have been there in the trenches, and that have some stories to share with you.

So we've got three that we're going to be focusing on today. They're each going to give you a short presentation. (...) so Karim, I'll let you take it away.

Dr. Karim Galil: Hi everyone. My name is Karim. I'm the co-founder and CEO of Mendel.ai. We are a technology company that enables you to glean and unravel knowledge from any type of clinical, written language. We trained the computer how to read clinical language in EMRs and in scholarly articles.

We convert that into an analytics and search ready format. All of our clients think of us as being Flatiron 3.0, which, in many ways, has some truth to it. If you want a computer to answer a question, you need the data to look like that:

It needs to be clean and in a tabular format.

But the reality is data looks like this in healthcare:

We're still one of the very few industries that use faxes, so you get a lot of those PDFs. I believe 80 to 90% of the data is in formats that the computers cannot read. Your options today is: either to use structured data, use claims data (10%), or you hire an army of human abstractors who sit down and clean the data for you, or you use an AI that requires your help. It's an AI that renders low quality and needs someone to prove and do a lot of QC on top of it.

What we have done here at Mendel is:

We are the first company to build a set of AI tools that can truly help the human.

It helps humans to unravel the unstructured data. So what we do is we're able to ingest all types of data, whether it's a doctor note, whether it's a pathology report, whether it is a fax or a scanned record.

It goes through several tools. To identify, to change all scan records into text and finally to do fact extraction or organize the data into a format that is readable and analyzable by the computers.

One of the biggest problems that we have seen since we started the company is that the common perception is: "when it comes to quality, human abstraction is the standard. You have to sacrifice speed and scale because if you want quality, humans are the way to go. AI is fast, it's scalable, but it doesn't render good quality."

With Mendel, that's essentially not true. This is a graph that shows an error metrics comparing our AI against a gold standard.

We ingested several medical records and we can compare the performance of our AI against the gold standard. We actually scored higher than human beings in quality.

We were able to be in the upper nineties and when it comes to accuracy. That was one of the challenges that we had to fix: how can you have an AI that renders quality that a physician and a researcher can trust?

The other big problem that we had to tackle was a PHI. We've built an engine that is able to curate data and is able to mine knowledge out of it. But nobody is willing to give you data to begin with – because of PHI, because of HIPAA compliance.

So what we have done is we've built a very interesting AI tool that can scrub any personal information from any type of record, whether it's a scan record, doctor note, whatever it is. And we were able to get a third party to statistically verify the accuracy.

So the HIPAA threshold is at least 99% accurate in scrubbing data to be qualified. We scored 99.8%. To our knowledge today, we are the only company that has an AI technology that was able to get a third party statistical verification for the de-identification. So quality and data privacy were two main problems that we had to fix.

Just to put things in context, I would like to share this case study with you: we got a client that came to us, with a cervical cancer, retrospective study. They had records of 50,000 patients, which translated to around 600,000 documents. The client, (because of COVID) had to deliver the project really fast with some cost savings.

Using their traditional methods, they estimated 27,000 hours of human abstraction. That's going to cost them in the upper hundreds of thousands of dollars. And they were only able to extract a certain number of end-points or certain number of facts. Using our system, we were able to do the whole fact extraction in 15 minutes, and changed the $ to ￠.

The interesting thing (as my background is medicine) was the agility. The researchers in this study were able to change the endpoints 11 times in five days.

So the ability to change and architect your experiment and be super agile while the machine is able to deliver back as fast, allowed them to change the whole hypothesis and change the whole design of the trial. The other interesting thing was they were able to successfully do an FDA submission, which was a great testament to the quality of our AI technology today.

Again, to our knowledge, we are one of very few, if only, AI companies that was able to curate the dataset and help a client to get an FDA submission for it.

When the COVID started, we found that there's tons of literature out there, and it's very hard to glean knowledge out of that literature. And we decided to take on that challenge. The company was built around oncology, and we had to scale. We repurposed our technology and we were able to build a search engine that can sit on top of most of the medical literature that exists today about about COVID.

It's a very advanced search engine that is able to understand context. So if you search for something like "potential therapy", it is able to glean different types of therapies from those scholarly articles without having to use keyword search.

Mark Goldstein: So, Karim, if you were to say: COVID changed you guys, you were focusing down the cancer route and COVID, opened up an opportunity where your technology was applied. You know, well beyond just cancer and you really made a difference with a number of studies.

Dr. Karim Galil: Yes. And using that search engine, one of the pharmaceutical companies were able to find out that they have a calcium channel blocker hypertension drug that actually had some antiviral effect that was proved in preclinical studies, is available in the literature, and they never heard of that experiment before.

Mark Goldstein: That's great. But let's wrap it up in 10, 15 seconds. What would you say?

Dr. Karim Galil: I would say that COVID changed Mendel from being a vitamin, to being a painkiller. Today, using our technology, we're able to render clinical data trials faster and significantly less expensive than the standards today.

Every company is trying to cut costs. And we're very excited about the opportunities that are coming to us during the COVID. Thank you.

Interview — 25 min

Applied AI and Race to  a COVID-19 Vaccine

Mendel's CEO Karim Galil was a guest in Bootstrap Lab's Applied AI Insiders Series. Karim talked about race to a COVID-19 Vaccine using AI.

See Video →

Mendel's CEO Karim Galil was a guest in Bootstrap Lab's Applied AI Insiders Series. Karim talked about race to  a COVID-19 Vaccine using AI.

UPDATES & NEWS — 2 MIN READ

Mendel releases COVID-19 AI search engine

We’re now excited to announce our new COVID-19 AI search engine which is available to the public here. Mendel ingested and absorbed more than 50,000 scholarly articles, released by the White House, related to Coronaviruses and COVID-19. Researchers, epidemiologists, and clinicians can use this tool to ask questions about COVID-19 and glean relevant answers in seconds — a process that can take a human numerous hours, days, or even weeks to conduct.

READ BLOGPOST →

Mendel vs. COVID-19‍

A story of artificial intelligence battling a virus

Like you, our team at Mendel has experienced shock these last few months as COVID-19 proliferates across the globe. We are experiencing change with unprecedented uncertainty, the virtual shutdown of key industries and an immense over-loading of our health care systems.

At Mendel, we’ve been using AI to drive clinical research for the past three years, and in that time, we’ve built a machine that understands medicine. This uniquely positions us to leverage our learnings and technology to aid in the global fight against COVID-19.

We’re now excited to announce our new COVID-19 AI search engine which is available to the public. Mendel ingested and absorbed more than 50,000 scholarly articles, released by the White House, related to Coronaviruses and COVID-19. Researchers, epidemiologists, and clinicians can use this tool to ask questions about COVID-19 and glean relevant answers in seconds—a process that can take a human numerous hours, days, or even weeks to conduct.

Mendel can answer key questions such as:

What do we know about potential treatments?
What do we know about COVID-19 risk factors?
What has been published about asymptomatic transmission?
What do we know about the different virus strains?

A researcher beta testing the Mendel COVID-19 search engine found evidence that Diltiazem, a drug used for hypertension, has been proven effective in halting the virus replication. Although the evidence dates back to 2006, this isn’t widely known among the scientific community today.

Mendel can surface all mentions of potential treatments for Coronaviruses in the literature by a simple query.

Unfortunately, the human brain reads slowly, and our memory is prone to miss information, especially with large volumes of data.Even existing technologies, such as Information Retrieval (IR) and out-of-the-box Natural Language Processing toolkits, fail to understand the complex clinical knowledge trapped in the literature. Furthermore, search engines today can’t understand things like a “cough” is a symptom and “Remdesivir” is an antiviral drug.

That’s simply not the case with Mendel. We’ve built an engine that can be asked highly specific questions, such as, “What are the modes of transmission of a virus?” as well as understand that “no intrauterine infections have been recorded” is a relevant answer.

With this AI-enabled Search Engine, we are just getting started.

We have assembled a task force of more than 50 AI scientists, physicians, and clinical experts to train Mendel’s clinical artificial intelligence. The goal is to glean and corroborate findings faster and with greater accuracy by absorbing the knowledge in medical literature and cross-referencing it with the electronic medical records (EMR) data of COVID-19 patients.

This approach should help predict the course of SARS- CoV-2 as well as to conduct “In Silico Trials” to evaluate the outcome of different drugs and treatment approaches. Mendel can build a computer simulation of every patient’s journey and make it available for research.

We’re deploying Mendel’s technology at many prominent healthcare providers across the United States, including academic and community clinics. Please reach out if you can help expand its impact by partnering with more healthcare centers or pharma companies on the front line of this battle.

This initiative is co-sponsored by our two major investors, DCM, an established venture capital firm with a global presence, and Bootstrap Labs, a venture capital firm focused exclusively on Applied AI.

We are always in the business of saving lives.

‍

Business Wire: Mendel Launches AI-powered Search Engine to Analyze More Than 50,000 Coronavirus Papers

Researcher using Mendel’s COVID-19 search engine found evidence that hypertension drug Diltiazem may halt virus replication.

→ READ PRESS RELEASE

Cum aut quisquam quia.

Beatae sint molestiae repudiandae earum dolore cumque maxime qui. Et ex adipisci quia exercitationem quisquam consequuntur doloremque earum ut. Ad ea autem ut et doloribus in. Soluta velit qui molestiae laudantium.

Eius autem itaque odit quas architecto et qui.

Dolore ratione facilis. Occaecati impedit cupiditate ut et rerum itaque. Nulla sed amet commodi rerum perspiciatis dolores id.

Est ut molestiae maiores assumenda et. Delectus sunt cupiditate distinctio sit perspiciatis nulla et minus. Rem repudiandae autem distinctio non. Aut officiis vel. Molestiae vitae quos illum quae ut modi sint deleniti. Facere velit eum molestiae eum rerum perspiciatis.

Alias nobis culpa laborum magnam expedita laudantium nemo. Unde nihil facere iste quia et rerum. Ipsam quisquam rem rerum unde aliquam consequatur. Et quasi quos voluptas quis necessitatibus commodi iusto. Accusantium dolorum inventore ullam sint occaecati facere. Enim quo quisquam consectetur fuga beatae.

PODCAST – 1.30 H

"Beyond the Data" Podcast 001 – Janak Joshi about Democratizing Medical Images

Welcome to the first episode of Mendel's "Beyond the Data" podcast. Our first guest is Janak Joshi, Senior Vice President, Chief Technology Officer and Head of Strategy at LifeImage. Throughout his career, Janak Joshi has been at the cutting edge of science, health and technology. At Life Image, he is focused on building industry-leading products for researchers, engineers and clinicians within a global footprint. He is a three-time entrepreneur successfully raising capital, and building and selling companies with the mission of leaving a legacy to prove that our generation did indeed fix healthcare in the U.S. Janak currently serves a Board Member of Bentley Venture Capital Funds one of the judges for MassChallenge and is an advisor to early stage start-ups in the bio-pharmaceutical and healthcare IT space. He continues to serve with pride in the U.S.A.F. Search & Rescue Squadron 1001st-PA as a 1st Lt. Podcast is hosted by Karim Galil, CEO of Mendel.

LISTEN TO PODCAST →

Democratizing Medical Images

Janak Joshi on Mendel Podcast

He is a three-time entrepreneur successfully raising capital, and building and selling companies with the mission of leaving a legacy to prove that our generation did indeed fix healthcare in the U.S.Janak currently serves a Board Member of Bentley Venture Capital Funds one of the judges for MassChallenge and is an advisor to early stage start-ups in the bio-pharmaceutical and healthcare IT space. He continues to serve with pride in the U.S.A.F. Search & Rescue Squadron 1001st-PA as a 1st Lt.Podcast is hosted by Karim Galil, CEO of Mendel.

Clinical Data Abstraction

Clinical Record OCR

PHI De-identification

Clinical Search Engine

Clinical Trial Matching

Clinical Data Assets

Blog

Mendel AI Joins NVIDIA Inception Program to Accelerate AI Innovations in Life Sciences

Mendel AI Joins NVIDIA Inception Program to Accelerate AI Innovations in Life Sciences

Mendel AI Joins NVIDIA Inception Program to Accelerate AI Innovations in Life Sciences

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI-Assisted vs Standard Methods in 3 Oncology Trials

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Mendel’s New Look: Website Update

Mendel’s New Look: Website Update

Mendel’s New Look: Website Update

How a diagnostic company was able to build a clinico-genomic database in a week

How a diagnostic company was able to build a clinico-genomic database in a week

How a diagnostic company was able to build a clinico-genomic database in a week

Introduction

The Problem

Solution

Results

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

Introduction

The Problem

The Goal

The Test:

The Results:

How to Approach De-Identification

How to Approach De-Identification

How to Approach De-Identification

Introduction

The Mendel Approach

Mendel Retreat: Adventures and Team Building in Cairo

Mendel Retreat: Adventures and Team Building in Cairo

Mendel Retreat: Adventures and Team Building in Cairo

AI for healthcare needs clinical reasoning skills

AI for healthcare needs clinical reasoning skills

Competence via comprehension: AI for healthcare needs clinical reasoning skills

Large Language Models

Large Language Models

GPT3 and Large Language Models - an inflection point for AI

Shanna Wells, Clinical Abstractor

Shanna Wells, Clinical Abstractor

Abstractor Spotlight – Shanna Wells

Introduction

Abstractor Spotlight

Creating Accurate Regulatory and Reference Data

Creating Accurate Regulatory and Reference Data

Creating Accurate Regulatory and Reference Data

The Evaluation: Does combining human and AI efforts lead to high data quality?

Understanding the variance across variables

Human+AI performs better than a double extracted and adjudicated data set

How to Approach Document Categorization

How to Approach Document Categorization

How to Approach Document Categorization

Introduction

The Mendel Approach

Reading Clinical Data Like a Doctor: What’s Missing from DIY Systems

Reading Clinical Data Like a Doctor: What’s Missing from DIY Systems

Reading Clinical Data Like a Doctor:

What’s Missing from DIY Systems

Assembling a DIY pipeline

Not available off-the-shelf

AI built for healthcare

An end-to-end solution

The Mendel difference: We needed it, so we built it

What is the Gold Standard? Exploring the challenges of structuring unstructured data in healthcare

What is the Gold Standard? Exploring the challenges of structuring unstructured data in healthcare

Leslie Lamport, 2013 Turing Award Winner on Patientless Podcast #011

Leslie Lamport, 2013 Turing Award Winner on Patientless Podcast #011

Applied AI and Race to  a COVID-19 Vaccine

Applied AI and Race to  a COVID-19 Vaccine