Please wait a minute...
文章检索
预防医学  2021, Vol. 33 Issue (3): 255-258    DOI: 10.19485/j.cnki.issn2096-5087.2021.03.009
  论著 本期目录 | 过刊浏览 | 高级检索 |
文本分析联合支持向量机的肿瘤ICD-O-3病理形态学自动分类效果评价
潘劲, 龚巍巍, 费方荣, 王蒙, 周晓燕, 胡如英, 钟节鸣
浙江省疾病预防控制中心慢性非传染性疾病防制所,浙江 杭州 310051
Automated classification of ICD-O-3 morphology code from pathology reports using text-mining and support vector machine
PAN Jin, GONG Weiwei, FEI Fangrong, WANG Meng, ZHOU Xiaoyan, HU Ruying, ZHONG Jieming
Department of Non-communicable Disease Control and Prevention, Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, Zhejiang 310051, China
全文: PDF(922 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 目的 评价文本分析联合支持向量机(SVM)对肿瘤ICD-O-3病理形态学自动分类的准确性,为汉语环境的肿瘤分类编码研究提供参考。方法 通过浙江省慢性病监测信息管理系统收集2017—2019年浙江省户籍居民肿瘤报告卡,根据ICD-O-3编码,对病理学文本提取关键词,采用SVM进行自动化分类;并与16名有2年以上肿瘤编码经验的专业技术人员分类结果比较,计算准确率、召回率及两者的调和平均数(F值)评估分类效果。结果 纳入2017—2019年浙江省肿瘤报告卡83 082例,17个形态学分类,以腺癌、鳞状和移行细胞癌为主,52 877例占63.65%。通过文本分析筛选出1 090个关键词,准确率为77.20%,召回率为96.27%,F值为85.69。结论 采用文本分析联合SVM可提高肿瘤ICD-O-3病理形态学自动分类效率,但准确性有待进一步提升。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
潘劲
龚巍巍
费方荣
王蒙
周晓燕
胡如英
钟节鸣
关键词 肿瘤病理学文本分析支持向量机自动分类    
AbstractObjective To evaluate the accuracy of automated classification of ICD-O-3 morphology code from pathology reports by text-mining and support vector machine ( SVM ) , in order to provide basis for automated tumor coding in Chinese. Methods The tumor report cards of Zhejiang residents from 2017 to 2019 were collected from Chronic Disease Surveillance Information Management System of Zhejiang Province. According to ICD-O-3, the keywords of the pathology reports were extracted, and SVM was used for automatic classification. The classification results were compared with those of 16 professionals with more than two years of experience in tumor coding, and the accuracy rate, recall rate and F-score were calculated for effect evaluation. Results Totally 83 082 cases from 2017 to 2019 were included and were categorized into 17 morphological classifications, with 52 877 ( 63.65% ) cases of adenocarcinoma, squamous carcinoma and transitional cell carcinoma. A total of 1 090 keywords were enrolled into main corpus. The total F-score, accuracy rate and recall rate are 85.69, 77.20% and 96.27%, respectively. Conclusion Text-mining combined with SVM can improve the efficiency of ICD-O-3 morphology coding; however, the accuracy needs to be further improved.
Key wordsneoplasm    pathology    text-mining    support vector machine    automated classification
收稿日期: 2020-06-09      修回日期: 2020-12-21      出版日期: 2021-03-10
中图分类号:  R181.2  
基金资助:浙江省医药卫生科技计划(2018PY007,2019KY355)
通信作者: 钟节鸣,E-mail:jmzhong@cdc.zj.cn   
作者简介: 潘劲,硕士,主管医师,主要从事慢性病流行病学与监测信息化工作
引用本文:   
潘劲, 龚巍巍, 费方荣, 王蒙, 周晓燕, 胡如英, 钟节鸣. 文本分析联合支持向量机的肿瘤ICD-O-3病理形态学自动分类效果评价[J]. 预防医学, 2021, 33(3): 255-258.
PAN Jin, GONG Weiwei, FEI Fangrong, WANG Meng, ZHOU Xiaoyan, HU Ruying, ZHONG Jieming. Automated classification of ICD-O-3 morphology code from pathology reports using text-mining and support vector machine. Preventive Medicine, 2021, 33(3): 255-258.
链接本文:  
http://www.zjyfyxzz.com/CN/10.19485/j.cnki.issn2096-5087.2021.03.009      或      http://www.zjyfyxzz.com/CN/Y2021/V33/I3/255
[1] FITZMAURICE C,ALLEN C,BARBER R M,et al.Global,regional,and national cancer incidence,mortality, Years of life lost,years lived with disability, and disability-adjusted life-years for 32 cancer groups,1990 to 2015:a systematic analysis for the global burden of disease study[J] .JAMA Oncol,2017,3(4):524-548.
[2] 魏矿荣,梁智恒,刘静.肿瘤登记软件和商业智能在肿瘤登记中的应用[J] .中国肿瘤,2012,21(7):484-487.
[3] 秦瑞,方乐,俞敏.文本分析方法在医学研究中的应用进展[J] .浙江预防医学,2015,27(10):1008-1011.
[4] JOUHET V,DEFOSSEZ G,BURGUN A,et al.Automated classification of free-text pathology reports for registration of incident cases of cancer[J] .Methods Inf Med,2012,51(3):242-251.
[5] ALAWAD M,GAO S,QIU J X,et al.Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks[J] .JAMA,2020,27(1):89-98.
[6] OLEYNIK M,PATRAO D F C,Finger M.Automated classification of semi-structured pathology reports into ICD-O using SVM in Portuguese[J] .Stud Health Technol InForm,2017,235:256-260.
[7] 潘劲,胡如英,俞敏,等.浙江省慢性病监测信息管理系统的架构及作用[J] .中国预防医学杂志,2010,11(11):1156-1157.
[8] TARONE R E.Conflicts of interest, bias, and the IARC monographs program[J] .Regul Toxicol Pharmacol,2018,98:A1-A4.
[9] 杜灵彬,毛伟敏,李辉章,等.浙江省肿瘤登记膀胱癌发病及死亡特征分析[J] .浙江预防医学,2014,26(5):473-476.
[10] BERG J W.Morphologic classification of human cancer[M] //SHOTTENFELd D F J,Jr.Cancer epidemiology and prevention. 2nd ed. New York: OxFord University Press,1996.
[11] 王庆,陈泽亚,郭静,等.基于词共现矩阵的项目关键词词库和关键词语义网络[J] .计算机应用,2015,35(6):1649-1653.
[12] KWON O S,KIM J,CHOI K H,et al.Trends in deqi research: a text mining and network analysis[J] . Integr Med Res,2018,7(3):231-237.
[13] HUANG S,CAI N,PACHECO P P,et al.Applications of support vector machine (SVM) learning in cancer genomics[J] .Cancer Genomics Proteomics,2018,15(1):41-51.
[14] 宁温馨,于明.基于语义相似度计算的临床诊断自动编码算法研究[J] .医学信息学杂志,2016,37(2):52-56.
[15] 李凯. 中文文本分类方法研究[J] .电脑知识与技术,2019,15(4):242-244.
[16] 段旭磊,张仰森,孙秭卓.微博文本的句向量表示及相似度计算方法研究[J] .计算机工程,2017,43(5):143-148.
[17] 陈建国,朱健.肿瘤登记编码审核中的常见问题及处理[J] .中国肿瘤,2012,21(7):502-506.
[18] QIU J X,YOON H J,FEARN P A,et al.Deep learning for automated extraction of primary sites from cancer pathology reports[J] .IEEE J Biomed Health Inform,2018,22(1):244-251.
[19] GAO S,YOUNG M T,QIU J X,et al.Hierarchical attention networks for information extraction from cancer pathology reports[J] .J Am Med Inform Assoc,2018,25(3):321-330.
[20] 郭长满,郭敏,刘媛媛,等.机器学习算法在预测男男性行为人群中HIV感染的应用[J] .中国卫生统计,2019,36(1):28-31,35.
[1] 李玉荣, 汪芬娟, 王冬飞, 林君英, 蒋园园, 高媛媛, 赵芳芳. 2015—2020年萧山区恶性肿瘤发病趋势分析[J]. 预防医学, 2023, 35(8): 687-691.
[2] 宋隽清, 赵玉明, 石文惠. 基于政策工具的我国老年健康相关政策分析[J]. 预防医学, 2023, 35(8): 721-725.
[3] 聂东梅, 李一鹏, 黄妍. 2012—2021年江门市4类慢性病早死概率分析[J]. 预防医学, 2023, 35(7): 602-606.
[4] 王永, 应焱燕, 陈洁平, 崔军, 包凯芳, 李思萱, 朱银潮, 王思嘉, 徐典, 冯宏伟. 2002—2022年宁波市恶性肿瘤死亡趋势分析[J]. 预防医学, 2023, 35(6): 496-500,505.
[5] 初里楠, 董奕, 李竹, 张燕, 朱丹红. 2014—2021年西城区恶性肿瘤死亡及减寿分析[J]. 预防医学, 2023, 35(5): 410-414.
[6] 周洁, 谭自明, 茹凉. 1990年与2019年中国0~14岁儿童肿瘤疾病负担分析[J]. 预防医学, 2023, 35(3): 205-209.
[7] 李壮, 周欣悦, 刘夏阳, 郭晓红. 肿瘤微环境对肿瘤细胞上皮-间质转化的多重作用研究进展[J]. 预防医学, 2023, 35(10): 866-870.
[8] 贾翁萍, 张燕, 倪晶, 徐利. 锡类散温敏凝胶对内痔出血大鼠eNOS、VEGF-A、TNF-α表达的影响[J]. 预防医学, 2023, 35(1): 27-31.
[9] 胡碧波, 傅克本, 顾永权. 2011—2018年余姚市恶性肿瘤发病趋势[J]. 预防医学, 2023, 35(1): 44-47,52.
[10] 王倩倩, 徐丽. 2015—2019年武义县居民恶性肿瘤死亡趋势分析[J]. 预防医学, 2022, 34(7): 732-737.
[11] 王杨凤, 刘君. 2016—2020年涪陵区甲状腺癌发病趋势[J]. 预防医学, 2022, 34(5): 511-514.
[12] 孙菲菲, 楼晓红, 虞洪斌. 放射治疗患者医院感染的影响因素分析[J]. 预防医学, 2022, 34(5): 515-518,529.
[13] 倪建晓, 吴文秀, 苏依所, 郑剑勇, 黄秀敏, 吴旭光. 2013—2020年瓯海区居民恶性肿瘤死亡趋势及减寿分析[J]. 预防医学, 2022, 34(4): 413-418.
[14] 林启, 周晶耀, 仝振东. 舟山市恶性肿瘤流行特征[J]. 预防医学, 2022, 34(3): 289-293.
[15] 黄欣欣, 应燕萍, 卢婷, 徐谊. 臂围在恶性肿瘤患者营养风险筛查中的应用[J]. 预防医学, 2022, 34(3): 272-276.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed