Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods

Ege Erberk Uslu; Emine Sezer; Zekeriya Anıl Güven

doi:10.2339/politeknik.1395811

Research Article

Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods

Year 2024, EARLY VIEW, 1 - 1

Ege Erberk Uslu Emine Sezer Zekeriya Anıl Güven

https://doi.org/10.2339/politeknik.1395811

Abstract

Artificial intelligence that aims to imitate human decision-making processes, using human knowledge as a foundation, is a critical research area with various practical applications in different disciplines. In the health domain, machine learning and image processing techniques are increasingly being used to assist in diagnosing diseases using laboratory results, findings, MRI, tomography, or radiology images, and etc. However, many healthcare reports, such as epicrisis summaries prepared by clinical experts, contain crucial and valuable information. In addition to information extraction from healthcare reports, applications such as automatic healthcare report generation are among the natural language processing research areas based on this knowledge and experience. The primary goals are to reduce the workload of clinical experts, minimize the likelihood of errors, and save time to speed up the diagnosis process. The MIMIC-CXR dataset is a huge dataset consisting of chest radiographs and reports prepared by radiology experts related to these images. This study focuses on the structural and semantic analysis of MIMIC-CXR radiography reports. Before developing a natural language processing-based model, preprocessing steps were applied to the dataset, and the results of syntactic and semantic analyses performed on unstructured report datasets are presented. This study is expected to provide insights for developing language models, particularly for developing a natural language processing model on the MIMIC-CXR dataset.

Keywords

Natural language processing, MIMIC-CXR, chest radiology report, structural analysis, semantic analysis.

References

[1] Bilen, B., and Horasan, F., “LSTM Network based Sentiment Analysis for Customer Reviews”, Journal of Polytechnic, 25(3):959-66, (2022).
[2] Alnawas, A., and Arıcı, N., “The Corpus Based Approach to Sentiment Analysis in Modern Standard Arabic and Arabic Dialects: A Literature Review”. Journal of Polytechnic, 21(2):461-70, (2018). doi:10.2339/politeknik.403975.
[3] Khurana, D., Koli, A., Khatter, K. et al., “Natural language processing: state of the art, current trends and challenges”, Multimed Tools Appl, 82: 3713–3744, (2023). https://doi.org/10.1007/s11042-022-13428-4
[4] Hallinan, J. T. P. D., Feng, M., Ng, D., Sia, S. Y., Tiong, V. T. Y., Jagmohan, P., Makmur, A., Thian, Y. L., “Detection of Pneumothorax with Deep Learning Models: Learning From Radiologist Labels vs Natural Language Processing Model Generated Labels”, Academic Radiology, 29(9): 1350–1358, (2022). https://doi.org/10.1016/j.acra.2021.09.013
[5] Névéol, A., Deserno, T. M., Darmoni, S. J., Güld, M. O., and Aronson, A. R., “Natural language processing versus content-based image analysis for medical document retrieval”, Journal of the American Society for Information Science and Technology, 60(1):123-134, (2009).
[6] Banerjee, I., Chen, M. C., Lungren, M. P., Rubin, D.L., “Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort”, J Biomed Inform., 2018 Jan;77:11-20, (2018). doi: 10.1016/j.jbi.2017.11.012.
[7] Kalra, A., Chakraborty, A., Fine, B., and Reicher, J., “Machine learning for automation of radiology protocols for quality and efficiency improvement”, Journal of the American College of Radiology, 17(9): 1149-115, (2020).
[8] Abro, A. A. , Talpur, M. S. H. & Jumani, A. K., “Natural Language Processing Challenges and Issues: A Literature Review”, Gazi University Journal of Science, 36(4):1522-1536, (2023). doi: 10.35378/gujs.1032517.
[9] López-Úbeda, P., Martín-Noguerol, T., Juluru, K., and Luna, A., “Natural Language Processing in Radiology: Update on Clinical Applications”, Journal of the American College of Radiology, 19(11): 1271-1285 (2022).
[10] Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J., Greenbaum, Nathenial R., Lungren, Matthew P., Deng, Chih-ying, Mark, Roger G., Horng, Steven., “MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports” Sci Data, 6: 317 (2019). https://doi.org/10.1038/s41597-019-0322-0.
[11] MIMIC-CXR Database, Retrieved January 3, 2023, from https://physionet.org/content/mimic-cxr/2.0.0/
[12] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., Mietus, J. E., Moody, G. B., Peng, C.K, and Stanley, H. E., “PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals” :irculation [Online], 101 (23):e215–e220, (2000).
[13] Kundeti, S. R., Vijayananda, J., Mujjiga, S., and Kalyan, M., “Clinical named entity recognition: Challenges and opportunities”, IEEE International Conference on Big Data (Big Data), 1937-1945, (2016).
[14] Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al., “Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists”, PLoS Med, 15(11): e1002686. (2018). https://doi.org/10.1371/journal.pmed.1002686
[15] d4data/biomedical-ner-all Hugging Face. (n.d.). Retrieved February 5, 2023, from https://huggingface.co/d4data/biomedical-ner-all.
[16] Liu, H., Christiansen, T., Baumgartner, W.A., Verspoor, Karin., “BioLemmatizer: a lemmatization tool for morphological processing of biomedical text”, J Biomed Semant, 3, 3 (2012). https://doi.org/10.1186/2041-1480-3-3
[17] Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P., “A practical part-of-speech tagger”, In Third conference on applied natural language processing, 133-140, (1992, March).
[18] Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., and Chanona-Hernández, L., “Syntactic N-grams as machine learning features for natural language processing”, Expert Syst. Appl., 41: 853-860, (2014).
[19] Donnelly, L. F., Grzeszczuk, R., and Guimaraes, C. V., “Use of natural language processing (NLP) in evaluation of radiology reports: an update on applications and technology advances”, Seminars in Ultrasound, CT and MRI, 43(2): 176-181, WB Saunders, (2022).
[20] Plisson, J., Lavrac, N., and Mladenic, D., A Rule based Approach to Word Lemmatization, (2004).
[21] Sharnagat, R., “Named entity recognition: A literature survey”, Center For Indian Language Technology, 1-27, (2014).
[22] Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., and Gómez-Berbís, J. M., “Named entity recognition: fallacies, challenges and opportunities”, Computer Standards & Interfaces, 35(5): 482-489, (2013).
[23] Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint, (2018). arXiv:1810.04805.
[24] Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund H., Haghgoo, B., Ball, R., Shpanskya, K., Seekings, J., Mong, D. A., Halabi, S. S., Sandberg, J. K., Jones, R., Larson, D. B., Langlotz, C. P., Patel, B. N., Lungren, M. P., and Ng, A. Y., “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison”, In Proceedings of the AAAI conference on artificial intelligence, 33(1): 590-597, (2019, July).

MIMIC-CXR Radyoloji Raporlarının DDİ Yöntemleriyle Anlamsal ve Yapısal Analizi

Year 2024, EARLY VIEW, 1 - 1

Ege Erberk Uslu Emine Sezer Zekeriya Anıl Güven

https://doi.org/10.2339/politeknik.1395811

Abstract

Yapay zeka, insan karar verme süreçlerini taklit etmeyi ve insan bilgisini temel almayı amaçlayan, farklı disiplinlerde çeşitli pratik uygulama alanına sahip kritik bir araştırma alanıdır. Sağlık alanında, makine öğrenimi ve görüntü işleme teknikleri, laboratuvar sonuçları, bulgular, MRI, tomografi veya radyoloji görüntüleri gibi çeşitli kaynaklardan yararlanarak hastalıkların teşhisine yardımcı olmak için giderek daha fazla kullanılmaktadır. Bununla birlikte, klinik uzmanlar tarafından hazırlanan epikriz özetleri gibi birçok sağlık raporu kritik ve değerli bilgiler içermektedir. Sağlık raporlarından bilgi çıkarmaya ek olarak, otomatik sağlık raporu oluşturma gibi uygulamalar, bilgi ve deneyime dayalı doğal dil işleme araştırma alanlarından biridir. Bu tür uygulamaların temel hedefleri, klinik uzmanların iş yükünü azaltmak, hata olasılığını en aza indirmek ve tanı sürecini hızlandırmak için zamandan tasarruf etmektir. MIMIC-CXR veri seti, radyoloji uzmanları tarafından çekilen göğüs röntgenleri ve bu görüntülerle ilgili raporlardan oluşan bir büyük veri setidir. Bu çalışma, MIMIC-CXR radyoloji raporlarının yapısal ve anlamsal analizine odaklanmaktadır. Doğal dil işleme (DDİ) tabanlı bir model geliştirmeden önce, veri setine ön işleme adımları uygulanmakta ve yapılandırılmamış rapor veri setlerinde gerçekleştirilen yapısal ve anlamsal analiz sonuçları sunulmaktadır. Bu çalışmanın, dil modellerinin geliştirilmesi, özellikle MIMIC-CXR veri seti üzerinde bir doğal dil işleme modelinin geliştirilmesi için araştırmacılara ışık tutması beklenmektedir.

Keywords

Doğal dil işleme, MIMIC-CXR, göğüs radyoloji raporu, yapısal analiz, anlamsal analiz

References

[1] Bilen, B., and Horasan, F., “LSTM Network based Sentiment Analysis for Customer Reviews”, Journal of Polytechnic, 25(3):959-66, (2022).
[2] Alnawas, A., and Arıcı, N., “The Corpus Based Approach to Sentiment Analysis in Modern Standard Arabic and Arabic Dialects: A Literature Review”. Journal of Polytechnic, 21(2):461-70, (2018). doi:10.2339/politeknik.403975.
[3] Khurana, D., Koli, A., Khatter, K. et al., “Natural language processing: state of the art, current trends and challenges”, Multimed Tools Appl, 82: 3713–3744, (2023). https://doi.org/10.1007/s11042-022-13428-4
[4] Hallinan, J. T. P. D., Feng, M., Ng, D., Sia, S. Y., Tiong, V. T. Y., Jagmohan, P., Makmur, A., Thian, Y. L., “Detection of Pneumothorax with Deep Learning Models: Learning From Radiologist Labels vs Natural Language Processing Model Generated Labels”, Academic Radiology, 29(9): 1350–1358, (2022). https://doi.org/10.1016/j.acra.2021.09.013
[5] Névéol, A., Deserno, T. M., Darmoni, S. J., Güld, M. O., and Aronson, A. R., “Natural language processing versus content-based image analysis for medical document retrieval”, Journal of the American Society for Information Science and Technology, 60(1):123-134, (2009).
[6] Banerjee, I., Chen, M. C., Lungren, M. P., Rubin, D.L., “Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort”, J Biomed Inform., 2018 Jan;77:11-20, (2018). doi: 10.1016/j.jbi.2017.11.012.
[7] Kalra, A., Chakraborty, A., Fine, B., and Reicher, J., “Machine learning for automation of radiology protocols for quality and efficiency improvement”, Journal of the American College of Radiology, 17(9): 1149-115, (2020).
[8] Abro, A. A. , Talpur, M. S. H. & Jumani, A. K., “Natural Language Processing Challenges and Issues: A Literature Review”, Gazi University Journal of Science, 36(4):1522-1536, (2023). doi: 10.35378/gujs.1032517.
[9] López-Úbeda, P., Martín-Noguerol, T., Juluru, K., and Luna, A., “Natural Language Processing in Radiology: Update on Clinical Applications”, Journal of the American College of Radiology, 19(11): 1271-1285 (2022).
[10] Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J., Greenbaum, Nathenial R., Lungren, Matthew P., Deng, Chih-ying, Mark, Roger G., Horng, Steven., “MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports” Sci Data, 6: 317 (2019). https://doi.org/10.1038/s41597-019-0322-0.
[11] MIMIC-CXR Database, Retrieved January 3, 2023, from https://physionet.org/content/mimic-cxr/2.0.0/
[12] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., Mietus, J. E., Moody, G. B., Peng, C.K, and Stanley, H. E., “PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals” :irculation [Online], 101 (23):e215–e220, (2000).
[13] Kundeti, S. R., Vijayananda, J., Mujjiga, S., and Kalyan, M., “Clinical named entity recognition: Challenges and opportunities”, IEEE International Conference on Big Data (Big Data), 1937-1945, (2016).
[14] Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al., “Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists”, PLoS Med, 15(11): e1002686. (2018). https://doi.org/10.1371/journal.pmed.1002686
[15] d4data/biomedical-ner-all Hugging Face. (n.d.). Retrieved February 5, 2023, from https://huggingface.co/d4data/biomedical-ner-all.
[16] Liu, H., Christiansen, T., Baumgartner, W.A., Verspoor, Karin., “BioLemmatizer: a lemmatization tool for morphological processing of biomedical text”, J Biomed Semant, 3, 3 (2012). https://doi.org/10.1186/2041-1480-3-3
[17] Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P., “A practical part-of-speech tagger”, In Third conference on applied natural language processing, 133-140, (1992, March).
[18] Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., and Chanona-Hernández, L., “Syntactic N-grams as machine learning features for natural language processing”, Expert Syst. Appl., 41: 853-860, (2014).
[19] Donnelly, L. F., Grzeszczuk, R., and Guimaraes, C. V., “Use of natural language processing (NLP) in evaluation of radiology reports: an update on applications and technology advances”, Seminars in Ultrasound, CT and MRI, 43(2): 176-181, WB Saunders, (2022).
[20] Plisson, J., Lavrac, N., and Mladenic, D., A Rule based Approach to Word Lemmatization, (2004).
[21] Sharnagat, R., “Named entity recognition: A literature survey”, Center For Indian Language Technology, 1-27, (2014).
[22] Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., and Gómez-Berbís, J. M., “Named entity recognition: fallacies, challenges and opportunities”, Computer Standards & Interfaces, 35(5): 482-489, (2013).
[23] Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint, (2018). arXiv:1810.04805.
[24] Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund H., Haghgoo, B., Ball, R., Shpanskya, K., Seekings, J., Mong, D. A., Halabi, S. S., Sandberg, J. K., Jones, R., Larson, D. B., Langlotz, C. P., Patel, B. N., Lungren, M. P., and Ng, A. Y., “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison”, In Proceedings of the AAAI conference on artificial intelligence, 33(1): 590-597, (2019, July).

There are 24 citations in total.

Details

Primary Language	English
Subjects	Natural Language Processing
Journal Section	Research Article
Authors	Ege Erberk Uslu 0000-0001-9119-8574 Emine Sezer 0000-0003-4776-6436 Zekeriya Anıl Güven 0000-0002-7025-2815
Early Pub Date	February 2, 2024
Publication Date
Submission Date	November 26, 2023
Acceptance Date	December 25, 2023
Published in Issue	Year 2024 EARLY VIEW

Cite

APA	Uslu, E. E., Sezer, E., & Güven, Z. A. (2024). Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods. Politeknik Dergisi1-1. https://doi.org/10.2339/politeknik.1395811
AMA	Uslu EE, Sezer E, Güven ZA. Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods. Politeknik Dergisi. Published online February 1, 2024:1-1. doi:10.2339/politeknik.1395811
Chicago	Uslu, Ege Erberk, Emine Sezer, and Zekeriya Anıl Güven. “Semantic and Structural Analysis of MIMIC-CXR Radiography Reports With NLP Methods”. Politeknik Dergisi, February (February 2024), 1-1. https://doi.org/10.2339/politeknik.1395811.
EndNote	Uslu EE, Sezer E, Güven ZA (February 1, 2024) Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods. Politeknik Dergisi 1–1.
IEEE	E. E. Uslu, E. Sezer, and Z. A. Güven, “Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods”, Politeknik Dergisi, pp. 1–1, February 2024, doi: 10.2339/politeknik.1395811.
ISNAD	Uslu, Ege Erberk et al. “Semantic and Structural Analysis of MIMIC-CXR Radiography Reports With NLP Methods”. Politeknik Dergisi. February 2024. 1-1. https://doi.org/10.2339/politeknik.1395811.
JAMA	Uslu EE, Sezer E, Güven ZA. Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods. Politeknik Dergisi. 2024;:1–1.
MLA	Uslu, Ege Erberk et al. “Semantic and Structural Analysis of MIMIC-CXR Radiography Reports With NLP Methods”. Politeknik Dergisi, 2024, pp. 1-1, doi:10.2339/politeknik.1395811.
Vancouver	Uslu EE, Sezer E, Güven ZA. Semantic and Structural Analysis of MIMIC-CXR radiography reports with NLP Methods. Politeknik Dergisi. 2024:1-.

Article Files

Full Text

download This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International.