Natural Language Processing Algorithm Identifies Dysplasia in Barrett Esophagus

Natural language processing tools can be used to process clinical text for research, quality improvement, and other purposes.

Among individuals with Barrett esophagus (BE), natural language processing (NLP) can distinguish dysplasia from other disorders with a high degree of accuracy and sensitivity, according to study findings published in Clinical Gastroenterology and Hepatology.

Researchers tested the sensitivity and accuracy of an algorithm developed using the Clinical Language Annotation, Modeling, and Processing Toolkit (NLP software) to identify dysplasia using pathology report findings. A total of 561 reports from individuals with suspected BE were randomly selected, and 317 more reports were included for validation. Manual review of these pathology reports was used to verify BE and dysplasia. Recall, accuracy, precision, and F-measure comprised the algorithm performance characteristics.

Among the 561 individuals with suspected BE, 457 (81.5%) had confirmed BE and 60 (10.6%) had dysplasia. The NLP algorithm was 98.0% accurate at identifying dysplasia with 93.2% precision, 91.7% recall, and an F-measure of 92.4%. The algorithm correctly classified all 7 individuals with high-grade dysplasia as having dysplasia; only 5 (8.3%) were incorrectly identified. In the validation cohort, 230 (72.6%) had confirmed BE and 39 (12.3%) had dysplasia. The NLP algorithm was 98.7% accurate at identifying dysplasia with 100.0% precision, 92.3% recall, and an F-measure of 96.0%.

[W]e developed and validated a natural language processing algorithm that accurately identifies patients with Barrett esophagus and dysplasia from pathology reports.

Limitations to this study include potentially limited generalizability, an inability to detect confirmation of dysplasia from a second pathologist, potential misclassification of dysplasia, the need for manual supervision, and the fact that study participants performed the sampling.

The study authors conclude, “[W]e developed and validated an NLP algorithm that accurately identifies patients with BE and dysplasia from pathology reports.” They add, “Future studies may include the broader application of the algorithm to larger BE cohorts for research purposes or to perform quality improvement initiatives.”


Wenker TN, Natarajan Y, Caskey K, et al. Using natural language processing to automatically identify dysplasia in pathology reports for patients with Barrett’s esophagus. Clin Gastroenterol Hepatol. Published online September 14, 2022. doi: 10.1016/j.cgh.2022.09.00