Linguamatics CaseStudy Payer PDF
Linguamatics CaseStudy Payer PDF
Quick facts
Situation: A top-5 payer needed to mine member-related data from a mixture of unstructured
formats held in a data lake, to strengthen their analysis of Congestive Heart Failure (CHF)
populations. The payer wanted to integrate the extracted data with conventional data
warehousing and analytics approaches, to support improved patient stratification.
Solution: The payer implemented the Linguamatics Health I2E platform with an automation
workflow to ingest data from Hadoop, process it, and load into a data warehouse for analysis.
Success: The team demonstrated that Linguamatics Health can be integrated into existing
Hadoop and Netezza systems, to gather and use insights from unstructured data as part
of risk stratification analytics for CHF. The Linguamatics Health I2E infrastructure can be
easily extended to support other diseases areas and risk factors such as diabetes, obesity,
and more.
Situation
Many payers are assessing how to improve stratification of patient populations using big data to fuel the
drive toward better member wellness. Risk stratification has so far been biased toward structured data, with
major investments in data warehouses, analytical tools, dashboards, and Master Data Management (MDM).
However, because of the growing availability of electronic health record (EHR) data in Continuity of Care
Document (CCD) format from their providers, extensive notes about members, and nurses’ notes, there
is a huge untapped potential in unstructured data. To manage these documents, many groups are making
use of Hadoop, as these technologies have proven to scale to the data volumes payers need to support.
But how can payers make effective use of unstructured data to stratify populations more effectively
when much of their infrastructure is tied to structured data, while the sources of unstructured data are
so varied? How can these data worlds be brought together?
Linguamatics teamed up with a top-5 payer to transform their unstructured data into fuel to drive risk
stratification. Unstructured data stored in Cloudera needed to be loaded into Netezza in structured form
from CCD, nurses’ notes, and Optical Character Recognition (OCR) documents.
Unstructured data is extracted from Cloudera Hadoop Distributed File System (HDFS) and passed through
an I2E pipeline to mine risk factors of interest (such as diseases and family history, and lifestyle factors
such as smoking), and then turned into structured data. This process is described in more detail in Figure 1.
I2E is used to extract, for example, a person’s smoking status to enable them to be grouped by smoking
behavior. The different ways this can be represented linguistically are incorporated into the query,
and return a consistent and normalized value associated with each person’s status. Standard Extract,
Transform, Load (ETL) approaches are used to load the structured data output from I2E into Netezza.
Improving the efficiency and effectiveness of drug trial planning in this way reduces the cost of the whole
trial, and has the potential to minimize the time before the drug reaches the market.
Success
1 Unstructured member-related documents stored in HDFS in Hadoop are extracted and sent to the
Linguamatics Asynchronous Messaging Pipeline (AMP) via RESTful Web Services to manage information
extraction.
2 AMP distributes the documents across multiple I2E servers depending on the required workload.
If servers are down, or there are connections issues, AMP will reschedule the extraction jobs.
3 Multiple instances of I2E receive documents; these are indexed and information is extracted. Extracted
information may include diseases, medications, and lab values, as well as concepts such as lifestyle
factors (smoking, and alcohol and drug use), ambulatory status, living location, and social determinants
(social support network).
6 The end user/automated routine is able to run population analytics across structured and unstructured
data sets to improve risk stratification.
If you are interested in learning more about Linguamatics Health, I2E, and population health and
risk stratification, please email enquiries@linguamatics.com
© 2017 Linguamatics Ltd. The Linguamatics logo is a trademark of Linguamatics Ltd. All rights reserved. All other trademarks mentioned in this document
are the property of their respective owners.