Retina

Background

80% of all healthcare data is in an unstructured format, making it hard to access and harder to use. Unstructured data includes physician notes, radiology reports, and other human analysis. Physicians are writing for other physicians–healthcare is filled with abbreviations, jargon, and clinical terms that require medical knowledge and training to understand. These notes and related medical imagery hold all the nuance of a patient’s longitudinal journey and they require considerable pre-processing for use with analysis tools.

Extracting text from healthcare documents is the foundation of transforming unstructured documents into structured data. Structured data allows healthcare companies to make patient information actionable at scale.

The way the industry tackles this challenge today is through optical character recognition or optical character readers (OCR). OCR is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text.

What is Retina?

Retina is Mendel’s OCR module, step one of the entire Mendel pipeline. Mendel’s modules are built specifically for healthcare, with powerful Artificial Intelligence that minimizes information loss, and traces results to source evidence.

Because Mendel is built by healthcare for healthcare, we understand that medical documents are dense. There are few filler words in medical documents, so an ideal OCR tool will be able to recognize and extract entire sentences–without loss of meaning. Mendel understands that accuracy is incredibly important. The mistakes and lost information that occur in the OCR step risk downstream data processing steps.

Retina is focused on providing the most accurate extracted output in the market. Unlike off-the-shelf OCRs, Retina:

Designed and trained specifically for the language of healthcare
Analyzes document structure such as header, main body, margin, figures, and tables.
Promotes lossless extraction by defining success as “fluent” output measured by the number of n-grams extracted (‘n’ words in a row captured)
Infused its OCR neural network with NLP and clinical ontology logic to insure the text output is clinically meaningful

We have built these capabilities in-house because we understand that all parts of the transformation process need to work together. Passing poor quality text riddled with spelling errors to the downstream tasks in the pipeline will confuse those NLP/NLU modules and result in poor quality output.

Comparing OCR Outputs:

Let’s see how Retina’s output compares to a popular OCR tool in the market.

Original Document:

Popular OCR Tool Result:

The popular OCR tool cannot read the last sentence and is doing its best to guess each character.

Mendel Retina Result

Retina is able to capture the last sentence, even though the original document is hard to read for the popular OCR tool. Even though Retina makes a few mistakes here and there, it is able to produce fluent and clinically meaningful text that is miles ahead of competition.

What does Mendel do differently?

How is Mendel’s Retina able to capture more language with greater accuracy? Current OCR systems are based mainly on computer vision. They slice a character into several tiny slices and try to predict which character it is, then they aggregate the predictions into a character. Their understanding of what they’re reading doesn’t exceed a sequence of a few characters at a time; they do not interpret the words, phrases, or context they scan. As a result, if a word is unclear, they cannot recover it.

The Artificial Intelligence team at Mendel takes a different approach to all our modules. First, we ensure that everything we build is healthcare focused. Our models are trained on the largest medical data set in the market and combined with our own proprietary ontology. Medical Ontologies describe the medical terminology as concepts and define their hierarchy and how they relate to each other. Using these ontologies, along with our novel reasoning algorithms we developed, we attempt to imitate the understanding of a clinician. We want our tools to be able to understand, to approach problems with an ability to interpret and a level of contextual understanding. Because we are using multiple systems to balance decision making, Retina is less error prone than off the shelf OCRs.

Further Evaluation of Retina v. Another Popular OCR Tool

Mendel tested Retina against the most popular OCR tool in the market. Not ones to shy away from a challenge, we looked at 120,000 pages of healthcare documents across all medical categories such as pathology reports, progress notes, and administrative documents. This was an unsupervised test, meaning we tested OCR output against OCR output with no human intervention.

Our goal for the evaluation was to show the level of accuracy for both OCR tools. To do this, we compared the number of instances Mendel and the competitor OCR were able to recognize five words in a row correctly (a.k.a. 5-gram). Words in a row matched correctly means less information loss. For this experiment, we used medical English vocabulary sourced from many public sources (pubmed, medical notes and medical ontologies) as a reference for correct words.

In this test, we refer to:

One word as an unigram
Two words in a row as a bigram
Three words in a row as a trigram
Four words in a row as 4-gram
Five words in a row as 5-gram

Results for OCR evaluation for all pages

The Mendel OCR outperforms the competitor OCR at all word matching levels. However, the improvement increases when looking at the number of words matched in a row. For the 5-gram column, Mendel outperforms the competitor by 8.44%. Compared to the competitor OCR, the Mendel OCR is more fluent with medical documents.

The difference is even more stark when looking at the most difficult pages for both OCR tools. These are pages that may have noise, fading, concentrated ink stains, tables, or figures, they may be tilted or upside-down, for example. To identify those pages, we excluded pages where Retina and the competitor OCR performed similarly. The result is an evaluation set of the hardest 60,000 pages (50%) of the original set.

On the most difficult pages, Mendel OCR outperformed the competitor significantly. Mendel was able to recognize 5 words in a row with 13.48% more accurate recall than the competitor OCR.

Mendel’s Retina is more fluent with medical terminology and provides accurate results, even with the most difficult of documents.

Mendel’ Retina is part of an end-to-end solution that uses the power of a machine and the nuanced understanding of a clinician to structure unstructured patient data at scale.

Want to learn about Mendel’s process and modules? Contact hello@mendel.ai.

‍

Exploring the Future of Healthcare AI: A Conversation with Kristin Maloney

The recent podcast featuring Kristin Maloney, hosted on Oncology Data Advisor, delves into Mendel AI's transformative role in healthcare. Kristin highlights how Mendel’s clinical AI solutions—such as Retina, Resolve, and Hypercube—are revolutionizing data-driven decision-making, empowering clinicians to extract critical insights from complex datasets quickly and accurately. Mendel AI's mission is clear: turning unstructured and structured healthcare data into actionable intelligence, bridging gaps in clinical care, and providing physicians with tools to deliver optimal patient outcomes.

Introducing Mendel's New Brand Focus: Supercharging Clinical Data Workflows in Healthcare

Mendel has evolved its brand to “Supercharge Your Clinical Data Workflows,” a shift that reflects our commitment to delivering AI solutions that genuinely enhance clinical data management. In healthcare, where talent shortages demand efficient and reliable tech, our Hypercube solution and neuro-symbolic AI bring unmatched cost-efficiency, speed, and accuracy to workflows. This shift emphasizes our focus on alleviating healthcare’s talent strain with tech that builds trust—eliminating errors and reducing the risk of hallucinations. Discover how Mendel’s transformative approach can optimize your workflows with validated solutions trusted by leaders in the industry.

Revolutionizing Patient Cohort Identification with AI – Insights from Mendel’s ACR Benchmark

Introducing ACR: A New Benchmark for Patient Cohort Retrieval This study introduces Automatic Cohort Retrieval (ACR), a novel task for efficiently identifying patient groups from large-scale medical data. Comparing AI-powered approaches, including large language models and neuro-symbolic systems, the research reveals promising advancements in automating cohort selection for clinical trials and studies. The findings highlight the potential of AI to revolutionize healthcare data analysis, while emphasizing the need for continued improvements in accuracy, efficiency, and reliability.

Introduction to Hypercube’s Ontology and Reasoning Engine

Large Language Models (LLMs) hold the potential to transform healthcare by generating clinical insights and supporting decision-making. However, LLMs face challenges such as hallucinations, lack of explainability, and limited reasoning capabilities, which restrict their effectiveness in clinical settings. Mendel's Hypercube platform addresses these limitations by integrating LLMs with structured clinical ontologies, enhancing both inference and decision-making. Unlike standard ontologies focused mainly on documentation, Mendel’s generative ontology prioritizes scalable reasoning through reductionism and emergentism, enabling more accurate clinical reasoning and streamlined data integration.

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

“Our latest research at Mendel marks a significant milestone in the field of AI in general, and healthcare in particular,” said Wael Salloum, Cofounder and Chief Science Officer at Mendel. “We are the leader in clinical reasoning by coupling LLMs with our hypergraph reasoning, enhancing both the effectiveness and efficiency of patient cohort retrieval.

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Delays in clinical trial enrollment and difficulties enrolling representative samples continue to vex sponsors, sites, and patient populations. Here we investigated use of an artificial intelligence-powered technology, Mendel.ai, as a means of overcoming bottlenecks and potential biases associated with standard patient prescreening processes in an oncology setting.

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

The application of Artificial Intelligence (AI) in healthcare has been revolutionary, especially with the recent advancements in transformer-based Large Language Models (LLMs). However, the task of understanding unstructured electronic medical records remains a challenge given the nature of the records (e.g., disorganization, inconsistency, and redundancy) and the inability of LLMs to derive reasoning paradigms that allow for comprehensive understanding of medical variables. In this work, we examine the power of coupling symbolic reasoning with language modeling toward improved understanding of unstructured clinical texts. We show that such a combination improves the extraction of several medical variables from unstructured records. In addition, we show that the state-of-the-art commercially-free LLMs enjoy retrieval capabilities comparable to those provided by their commercial counterparts. Finally, we elaborate on the need for LLM steering through the application of symbolic reasoning as the exclusive use of LLMs results in the lowest performance.

How to Approach De-Identification

Organizations that use patient data for internal or external research need to take steps to prevent the exposure of PHI to those who are not authorized to view it. They do this by redacting specific categories of identifiers from every patient document. Once the identifiers are masked, the risk profile of these datasets is significantly reduced. But how do you ensure that redaction engines are working to the highest accuracy?

Clinical Data Abstraction

Clinical Record OCR

PHI De-identification

Clinical Search Engine

Clinical Trial Matching

Clinical Data Assets

Coming Soon

Retina

Background

What is Retina?

Comparing OCR Outputs:

Popular OCR Tool Result:

Mendel Retina Result

What does Mendel do differently?

Further Evaluation of Retina v. Another Popular OCR Tool

In this test, we refer to:

Results for OCR evaluation for all pages

‍

The Feed

Enhancing Oncology Clinical Trial Prescreening at UPenn with Mendel AI

Enhancing Oncology Clinical Trial Prescreening at UPenn with Mendel AI

Exploring the Future of Healthcare AI: A Conversation with Kristin Maloney

Exploring the Future of Healthcare AI: A Conversation with Kristin Maloney

Introducing Mendel's New Brand Focus: Supercharging Clinical Data Workflows in Healthcare

Introducing Mendel's New Brand Focus: Supercharging Clinical Data Workflows in Healthcare

Faithfulness Hallucination Detection in Healthcare AI: Ensuring Reliable Medical Summaries

Faithfulness Hallucination Detection in Healthcare AI: Ensuring Reliable Medical Summaries

Revolutionizing Patient Cohort Identification with AI – Insights from Mendel’s ACR Benchmark

Revolutionizing Patient Cohort Identification with AI – Insights from Mendel’s ACR Benchmark

Introduction to Hypercube’s Ontology and Reasoning Engine

Introduction to Hypercube’s Ontology and Reasoning Engine

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

How a diagnostic company was able to build a clinico-genomic database in a week

How a diagnostic company was able to build a clinico-genomic database in a week

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

How to Approach De-Identification

How to Approach De-Identification

Back to Top

Products

Clinical Data Abstraction

Clinical Record OCR

PHI De-identification

Clinical Search Engine

Clinical Trial Matching

Clinical Data Assets

SOLUTIONS

Clinical Data De-Identification

Clinical Data Indexing

Clinical Data Abstraction

Privacy and Legal

Clinical Data Abstraction

Clinical Data Abstraction

About

Coming Soon

HEADQUATERS

CONTACT

Headquarters

Contact Us

Subscribe