Deep Phenotyping with Natural Language Processing Can Improve Diagnostics
Natural language processing, a form of AI, can help genetic specialists maximize their workloads

Calum Yacoubian, MD, is the associate director NLP healthcare strategy at Linguamatics, an IQVIA company.
From my days as a medical student conducting research in the genomics department, I have gained an appreciation of the importance of appropriately and precisely documenting and characterizing the phenotypes of patients with suspected rare disease. These features in isolation may not yield a diagnosis, but in combination and set against diagnostic testing, they have the potential to unlock a life-changing diagnosis in many patients.
With an ever increasing knowledgebase around rare disease and genetic and diagnostic testing becoming more widespread, well-curated and accurate phenotype data has never been more important. The swiftness of these developments is beginning to outpace the number of genetic specialists in medical centers. As a result, providers and diagnostic labs are beginning to turn to artificial intelligence (AI) technologies to manage the workload. A great example of this is natural language processing (NLP).
NLP is a form of AI that enables complex unstructured or text-based data to be transformed to well-curated structured data. In many ways, we can think of NLP as a computer that can rapidly read and understand the nuance of medical documentation that clinicians create in huge volumes on a daily basis (80 percent of health care data exists in an unstructured format). Comprehensive clinical NLP tools enable subject matter experts to translate their domain expertise into the system and then deploy that technology against far greater volumes of data than they as individuals could process themselves. In the case of rare disease and diagnostic labs, this means that the patient data created when a patient is being referred for testing can in turn be used to better understand the findings of these tests.
Take for instance, a newborn baby who is admitted to the NICU (neonatal intensive care unit) with jaundice. During the first few hours of life, a barrage of investigations and diagnostic procedures will take place—full blood count, biochemistry, liver profile tests, abdominal ultrasound, height and weight, physical examination, and a thorough family history to name only a few. Each of these tests/investigations will generate a mixture of structured and unstructured data. NLP can be used to mine these data and structure the information to standards such as the Human Phenotype Ontology (HPO), a health care ontology created and curated to capture unique characteristics of rare disease. This data is now compatible with any further genetic tests that may be carried out, and may even be pivotal in analysis of these tests to identify potential disease-causing genetic variants.
Of course, this ability to create deep phenotypes alongside diagnostic tests extends far beyond rare disease and into other areas of health care from enabling precision oncology and transforming pathology reports into structured information on disease stage and progression, to extracting social determinants of health from clinical notes to enable correlations between diagnostic data and social characteristics for health equity initiatives. Providers and companies who take a holistic approach to this often feared and ignored data will set themselves up to succeed and deliver higher quality and more impactful data than those who do not.