HLTH22: COTA and Google Partner to Use NLP to Harness Unstructured Oncology Data

What You Should Know:

COTA, Inc., an oncology real-world data and analytics company announced a partnership with Google Cloud to bring clarity to unstructured oncology data through the latest advancements in machine learning and natural language processing.

– Many leading real-world data companies – including COTA – leverage clinicians to manually curate oncology real-world data. While this is a reliable and trusted near-term solution to overcoming the challenges associated with abstracting and curating unstructured oncology data, it makes scaling this approach across vast amounts of data both time and resource intensive.

Why It Matters

While the vast majority of critical health data is now created and stored digitally, much of the information is still generated in an unstructured format. Free-text clinical notes and PDF documents remain largely invisible to algorithms that can mine structured data fields for key insights into patient care. As a result, clinicians and researchers may be unable to generate a complete picture of the patient’s journey and could miss opportunities to advance the standard of care and treatment options. 

In collaboration with Google Cloud, COTA will look to augment manual, human-led abstraction with technology-first abstraction and curation best practices. This approach will, over time, provide access to even more advanced data elements that may be buried in unstructured notes. For example, next-generation genomic sequencing is becoming particularly important for personalizing cancer care. However, the reports providers receive from the genetic testing labs are often in a PDF format. Traditional tools like optical character recognition (OCR) can’t accurately “read” the text in these PDF images, so this data often goes unused today.

“We are collaborating with COTA to build a series of new natural language processing models tailored specifically to unstructured oncology data, including emerging data such as genomic sequencing,” said Shweta Maniar, Director of Life Sciences Industry Solutions at Google Cloud. “By training these algorithms specifically on oncology information, we will partner with COTA in generating a much more complete understanding of what is happening in the cancer care setting and how a patient’s unique clinical history may impact their response to treatment.”