Redact

Background

Protecting patient privacy is of utmost importance. The Health Insurance Portability and Accountability Act (HIPAA) sets the standards and rules for the appropriate use and requirements for the protection of patient information.

Organizations that use patient data for internal or external research need to take steps to prevent the exposure of PHI to those who are not authorized to view it. This can be accomplished by de-identifying health information, which entails masking specific categories of identifiers from the document. Once the identifiers are masked, the risk profile of these datasets is significantly reduced.

There are two approaches to redacting PHI from medical documents. The first is by hand using trained clinicians, but this process is slow, expensive, and does not scale - humans get tired and are prone to error. The second is to use computers to identify and mask protected health information, but that also comes with its own set of challenges.

How do off-the-shelf de-id tools work?

Most commonly available de-id systems leverage machine learning (ML) approaches to identify PHI from text, tag, and mask it. These systems often tout that they are able to redact above the HIPAA required level of 99%. In reality, most of these systems have only been tested and evaluated against publicly available datasets such as I2B2 or MIMIC. While useful, these datasets are not representative of the complexity and heterogeneity of unstructured data that we see in the real world. Systems tested against these datasets typically do not perform well on novel datasets and struggle to consistently meet the 99% threshold. In addition, these datasets were constructed from scratch as clean, well formatted, machine readable text. In the real world, patient records are messier - a collection of faxes and scans with tables and columns that are much more likely to confuse other de-identification systems.

An improvement on a purely ML approach is to leverage rule-based systems (e.g. white and black lists and regex) to identify additional PHI elements in an attempt to bring performance above the 99% level. However, these rule-based overlays leave room for error. Let's say you want to redact zip codes from patient documentation. To do this you can create a blacklist of zip codes. Easy, right? Unfortunately, no. Zip codes can be easily confused with lab test codes and, if redacted, important contextual information could be unnecessarily masked. Another example is disease names, which are often named after real people (e.g. Parkinson’s, Stevens-Johnson syndrome). Commonly available systems have trouble distinguishing between the two. The 99% threshold is unforgiving and these edge cases often result in subpar performance.

What is Redact?

Redact is Mendel’s de-identification module. Mendel leverages a combination of deep learning algorithms and several rule based systems, including our proprietary medical ontology. The combination of methods allows Redact to de-identify a document without solely relying on ML based approaches or having a scientist hard-code all the rules, while preserving as much text as possible. Mendel has developed a proprietary "symbolic learning" architecture that combines the best of the machine learning and the symbolic AI worlds. It trains Redact to de-identify a document without solely relying on ML based approaches or having a scientist hard-code all the rules, while preserving as much text as possible and generalizing to new data.

In a nutshell, Mendel developed what we call a multi-teacher-single-student neuro-symbolic system.

The student (Redact) is a neuro-symbolic network that learns how to manipulate tokens and rules (or components of rules) to de-identify clinical text. The teachers are also AI systems trying to achieve different (sometimes competing) objectives, and the objective function is to figure out how to train the student to satisfy all teachers.

For example, one teacher is a "re-identification" system trying to re-identify the patient after redaction; if it succeeds, then the student gets a penalty back-propagated through the student's neural network. Another teacher is a clinical NLU system trying to figure out the patient's journey; if the student redacts useful clinical info, it will get penalized. This proprietary architecture gives Mendel Redact its edge by learning to prevent re-identification while keeping the medical text intact.

As an output from our Redact engine, Mendel provides you with:

Original PDFs with black box redactions overlayed redacted sections
Processed text files with hash marks replacing redacted text
JSON with coordinates of the redacted information

Third Party Evaluation

Mendel worked with Mirador Analytics, a widely recognized expert in statistical disclosure risk analysis, to assess the performance of Redact. Across multiple assessments and heterogenous datasets, Redact performed well above the HIPAA threshold to provide confidence that the processed datasets are sufficiently de-identified.

Results

In this example, a total of 1,285 records were reviewed to determine the proportion of identifiers that were correctly masked from the processed records. To be considered compliant with HIPAA Privacy Rule requirements, the proportion of identifiers masked from all documents must exceed 99%. For this assessment, the proportion of identifiers that were successfully redacted was 99.85% – well above the standard for HIPAA compliance.

Mendel’ Redact is part of an end-to-end solution that uses the power of a machine and the nuanced understanding of a clinician to structure unstructured patient data at scale. Want to learn about Mendel’s process and modules? Contact hello@mendel.ai.

Exploring the Future of Healthcare AI: A Conversation with Kristin Maloney

The recent podcast featuring Kristin Maloney, hosted on Oncology Data Advisor, delves into Mendel AI's transformative role in healthcare. Kristin highlights how Mendel’s clinical AI solutions—such as Retina, Resolve, and Hypercube—are revolutionizing data-driven decision-making, empowering clinicians to extract critical insights from complex datasets quickly and accurately. Mendel AI's mission is clear: turning unstructured and structured healthcare data into actionable intelligence, bridging gaps in clinical care, and providing physicians with tools to deliver optimal patient outcomes.

Introducing Mendel's New Brand Focus: Supercharging Clinical Data Workflows in Healthcare

Mendel has evolved its brand to “Supercharge Your Clinical Data Workflows,” a shift that reflects our commitment to delivering AI solutions that genuinely enhance clinical data management. In healthcare, where talent shortages demand efficient and reliable tech, our Hypercube solution and neuro-symbolic AI bring unmatched cost-efficiency, speed, and accuracy to workflows. This shift emphasizes our focus on alleviating healthcare’s talent strain with tech that builds trust—eliminating errors and reducing the risk of hallucinations. Discover how Mendel’s transformative approach can optimize your workflows with validated solutions trusted by leaders in the industry.

Revolutionizing Patient Cohort Identification with AI – Insights from Mendel’s ACR Benchmark

Introducing ACR: A New Benchmark for Patient Cohort Retrieval This study introduces Automatic Cohort Retrieval (ACR), a novel task for efficiently identifying patient groups from large-scale medical data. Comparing AI-powered approaches, including large language models and neuro-symbolic systems, the research reveals promising advancements in automating cohort selection for clinical trials and studies. The findings highlight the potential of AI to revolutionize healthcare data analysis, while emphasizing the need for continued improvements in accuracy, efficiency, and reliability.

Introduction to Hypercube’s Ontology and Reasoning Engine

Large Language Models (LLMs) hold the potential to transform healthcare by generating clinical insights and supporting decision-making. However, LLMs face challenges such as hallucinations, lack of explainability, and limited reasoning capabilities, which restrict their effectiveness in clinical settings. Mendel's Hypercube platform addresses these limitations by integrating LLMs with structured clinical ontologies, enhancing both inference and decision-making. Unlike standard ontologies focused mainly on documentation, Mendel’s generative ontology prioritizes scalable reasoning through reductionism and emergentism, enabling more accurate clinical reasoning and streamlined data integration.

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

“Our latest research at Mendel marks a significant milestone in the field of AI in general, and healthcare in particular,” said Wael Salloum, Cofounder and Chief Science Officer at Mendel. “We are the leader in clinical reasoning by coupling LLMs with our hypergraph reasoning, enhancing both the effectiveness and efficiency of patient cohort retrieval.

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Delays in clinical trial enrollment and difficulties enrolling representative samples continue to vex sponsors, sites, and patient populations. Here we investigated use of an artificial intelligence-powered technology, Mendel.ai, as a means of overcoming bottlenecks and potential biases associated with standard patient prescreening processes in an oncology setting.

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

The application of Artificial Intelligence (AI) in healthcare has been revolutionary, especially with the recent advancements in transformer-based Large Language Models (LLMs). However, the task of understanding unstructured electronic medical records remains a challenge given the nature of the records (e.g., disorganization, inconsistency, and redundancy) and the inability of LLMs to derive reasoning paradigms that allow for comprehensive understanding of medical variables. In this work, we examine the power of coupling symbolic reasoning with language modeling toward improved understanding of unstructured clinical texts. We show that such a combination improves the extraction of several medical variables from unstructured records. In addition, we show that the state-of-the-art commercially-free LLMs enjoy retrieval capabilities comparable to those provided by their commercial counterparts. Finally, we elaborate on the need for LLM steering through the application of symbolic reasoning as the exclusive use of LLMs results in the lowest performance.

How to Approach De-Identification

Organizations that use patient data for internal or external research need to take steps to prevent the exposure of PHI to those who are not authorized to view it. They do this by redacting specific categories of identifiers from every patient document. Once the identifiers are masked, the risk profile of these datasets is significantly reduced. But how do you ensure that redaction engines are working to the highest accuracy?

Clinical Data Abstraction

Clinical Record OCR

PHI De-identification

Clinical Search Engine

Clinical Trial Matching

Clinical Data Assets

Coming Soon

Redact

Background

How do off-the-shelf de-id tools work?

What is Redact?

Third Party Evaluation

Results

The Feed

Enhancing Oncology Clinical Trial Prescreening at UPenn with Mendel AI

Enhancing Oncology Clinical Trial Prescreening at UPenn with Mendel AI

Exploring the Future of Healthcare AI: A Conversation with Kristin Maloney

Exploring the Future of Healthcare AI: A Conversation with Kristin Maloney

Introducing Mendel's New Brand Focus: Supercharging Clinical Data Workflows in Healthcare

Introducing Mendel's New Brand Focus: Supercharging Clinical Data Workflows in Healthcare

Faithfulness Hallucination Detection in Healthcare AI: Ensuring Reliable Medical Summaries

Faithfulness Hallucination Detection in Healthcare AI: Ensuring Reliable Medical Summaries

Revolutionizing Patient Cohort Identification with AI – Insights from Mendel’s ACR Benchmark

Revolutionizing Patient Cohort Identification with AI – Insights from Mendel’s ACR Benchmark

Introduction to Hypercube’s Ontology and Reasoning Engine

Introduction to Hypercube’s Ontology and Reasoning Engine

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Mendel Unveils Groundbreaking Neuro-Symbolic AI System Outperforming GPT-4 for Automatic Cohort Retreival in New Study

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI Assisted vs Standard Methods in 3 Oncology Trials

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

How a diagnostic company was able to build a clinico-genomic database in a week

How a diagnostic company was able to build a clinico-genomic database in a week

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

How One Organization Changed The Way Patients are Identified for Clinical Trials with AI

How to Approach De-Identification

How to Approach De-Identification

Back to Top

Products

Clinical Data Abstraction

Clinical Record OCR

PHI De-identification

Clinical Search Engine

Clinical Trial Matching

Clinical Data Assets

SOLUTIONS

Clinical Data De-Identification

Clinical Data Indexing

Clinical Data Abstraction

Privacy and Legal

Clinical Data Abstraction

Clinical Data Abstraction

About

Coming Soon

HEADQUATERS

CONTACT

Headquarters

Contact Us

Subscribe