Label Attention Meets Deep Learning: A Novel Model for ICD-11 Automatic Classification of Chinese EMRs
Published 28 January, 2026
Electronic medical records (EMRs) enable healthcare institutions to digitally document patients'clinical conditions, treatment processes, and diagnostic outcomes, supporting paperless clinical workflows. However, the large volume of unstructured clinical data has introduced new challenges for disease classification and coding. The International Classification of Diseases (ICD), developed by the World Health Organization (WHO), provides a standardized framework for categorizing diseases based on etiology, pathology, clinical presentation, and anatomical location, with ICD-11 as the latest version. Automated ICD classification and coding of EMRs can substantially reduce the workload of medical coding departments and serves as a critical foundation for the effective use of EMRs in clinical practice and medical research.
A team of researchers from the Medical Record Department of Peking Union Medical College Hospital & WHO Family of International Classification Collaborating Center in China recently developed a novel deep learning model, LA-TextCNN-BiLSTM, that significantly improves the accuracy of automatic disease classification using the latest ICD-11, according to a study published in Informatics and Health. The model, evaluated on real-world EMRs data, achieved an 83.86% accuracy rate, demonstrating robust performance in multi-label classification.
The traditional manual coding is time-consuming and error-prone. To automate this, the research team leveraged MC-BERT, a Chinese biomedical-pretrained language model, to better capture clinical semantics from electronic medical records (EMRs). The researchers integrated a label attention mechanism that uses semantic information from ICD-11 codes themselves to guide the model in focusing on diagnostically relevant text, reducing noise from redundant clinical descriptions.
“In the International Classification of Diseases (ICD) system, classification codes are more than symbolic representations; each code carries specific taxonomic significance and clinical meaning. Compared with earlier versions, ICD-11 provides substantially more detailed clinical descriptions of diagnostic entries. Building upon this advancement, our work utilized the semantic information of ICD-11 entries through label attention mechanism to further improve model performance.” shares corresponding author Li Naishi.
“This advancement may pave the way for streamlining hospital workflows, enhancing data usability in research, and supporting intelligent healthcare systems—particularly in Chinese-speaking medical environments where language complexity poses unique NLP challenges,” adds Li.
Contact author:
Naishi Li, Medical Record Department of Peking Union Medical College Hospital. lns@medmail.com.cn
Funder:
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Conflict of interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
See the article:
Bocheng Li, Jingya Zhou, Naishi Li, Yi Wang, LA-TextCNN-BiLSTM: A classification model for ICD-11, Informatics and Health, Volume 3, Issue 1, 2026, Pages 4-9, https://doi.org/10.1016/j.infoh.2025.12.004.