Please wait a minute...
文章检索
预防医学  2021, Vol. 33 Issue (3): 255-258    DOI: 10.19485/j.cnki.issn2096-5087.2021.03.009
  论著 本期目录 | 过刊浏览 | 高级检索 |
文本分析联合支持向量机的肿瘤ICD-O-3病理形态学自动分类效果评价
潘劲, 龚巍巍, 费方荣, 王蒙, 周晓燕, 胡如英, 钟节鸣
浙江省疾病预防控制中心慢性非传染性疾病防制所,浙江 杭州 310051
Automated classification of ICD-O-3 morphology code from pathology reports using text-mining and support vector machine
PAN Jin, GONG Weiwei, FEI Fangrong, WANG Meng, ZHOU Xiaoyan, HU Ruying, ZHONG Jieming
Department of Non-communicable Disease Control and Prevention, Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, Zhejiang 310051, China
全文: PDF(922 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 目的 评价文本分析联合支持向量机(SVM)对肿瘤ICD-O-3病理形态学自动分类的准确性,为汉语环境的肿瘤分类编码研究提供参考。方法 通过浙江省慢性病监测信息管理系统收集2017—2019年浙江省户籍居民肿瘤报告卡,根据ICD-O-3编码,对病理学文本提取关键词,采用SVM进行自动化分类;并与16名有2年以上肿瘤编码经验的专业技术人员分类结果比较,计算准确率、召回率及两者的调和平均数(F值)评估分类效果。结果 纳入2017—2019年浙江省肿瘤报告卡83 082例,17个形态学分类,以腺癌、鳞状和移行细胞癌为主,52 877例占63.65%。通过文本分析筛选出1 090个关键词,准确率为77.20%,召回率为96.27%,F值为85.69。结论 采用文本分析联合SVM可提高肿瘤ICD-O-3病理形态学自动分类效率,但准确性有待进一步提升。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
潘劲
龚巍巍
费方荣
王蒙
周晓燕
胡如英
钟节鸣
关键词 肿瘤病理学文本分析支持向量机自动分类    
AbstractObjective To evaluate the accuracy of automated classification of ICD-O-3 morphology code from pathology reports by text-mining and support vector machine ( SVM ) , in order to provide basis for automated tumor coding in Chinese. Methods The tumor report cards of Zhejiang residents from 2017 to 2019 were collected from Chronic Disease Surveillance Information Management System of Zhejiang Province. According to ICD-O-3, the keywords of the pathology reports were extracted, and SVM was used for automatic classification. The classification results were compared with those of 16 professionals with more than two years of experience in tumor coding, and the accuracy rate, recall rate and F-score were calculated for effect evaluation. Results Totally 83 082 cases from 2017 to 2019 were included and were categorized into 17 morphological classifications, with 52 877 ( 63.65% ) cases of adenocarcinoma, squamous carcinoma and transitional cell carcinoma. A total of 1 090 keywords were enrolled into main corpus. The total F-score, accuracy rate and recall rate are 85.69, 77.20% and 96.27%, respectively. Conclusion Text-mining combined with SVM can improve the efficiency of ICD-O-3 morphology coding; however, the accuracy needs to be further improved.
Key wordsneoplasm    pathology    text-mining    support vector machine    automated classification
收稿日期: 2020-06-09      修回日期: 2020-12-21      出版日期: 2021-03-10
中图分类号:  R181.2  
基金资助:浙江省医药卫生科技计划(2018PY007,2019KY355)
作者简介: 潘劲,硕士,主管医师,主要从事慢性病流行病学与监测信息化工作
通信作者: 钟节鸣,E-mail:jmzhong@cdc.zj.cn   
引用本文:   
潘劲, 龚巍巍, 费方荣, 王蒙, 周晓燕, 胡如英, 钟节鸣. 文本分析联合支持向量机的肿瘤ICD-O-3病理形态学自动分类效果评价[J]. 预防医学, 2021, 33(3): 255-258.
PAN Jin, GONG Weiwei, FEI Fangrong, WANG Meng, ZHOU Xiaoyan, HU Ruying, ZHONG Jieming. Automated classification of ICD-O-3 morphology code from pathology reports using text-mining and support vector machine. Preventive Medicine, 2021, 33(3): 255-258.
链接本文:  
https://www.zjyfyxzz.com/CN/10.19485/j.cnki.issn2096-5087.2021.03.009      或      https://www.zjyfyxzz.com/CN/Y2021/V33/I3/255
[1] FITZMAURICE C,ALLEN C,BARBER R M,et al.Global,regional,and national cancer incidence,mortality, Years of life lost,years lived with disability, and disability-adjusted life-years for 32 cancer groups,1990 to 2015:a systematic analysis for the global burden of disease study[J] .JAMA Oncol,2017,3(4):524-548.
[2] 魏矿荣,梁智恒,刘静.肿瘤登记软件和商业智能在肿瘤登记中的应用[J] .中国肿瘤,2012,21(7):484-487.
[3] 秦瑞,方乐,俞敏.文本分析方法在医学研究中的应用进展[J] .浙江预防医学,2015,27(10):1008-1011.
[4] JOUHET V,DEFOSSEZ G,BURGUN A,et al.Automated classification of free-text pathology reports for registration of incident cases of cancer[J] .Methods Inf Med,2012,51(3):242-251.
[5] ALAWAD M,GAO S,QIU J X,et al.Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks[J] .JAMA,2020,27(1):89-98.
[6] OLEYNIK M,PATRAO D F C,Finger M.Automated classification of semi-structured pathology reports into ICD-O using SVM in Portuguese[J] .Stud Health Technol InForm,2017,235:256-260.
[7] 潘劲,胡如英,俞敏,等.浙江省慢性病监测信息管理系统的架构及作用[J] .中国预防医学杂志,2010,11(11):1156-1157.
[8] TARONE R E.Conflicts of interest, bias, and the IARC monographs program[J] .Regul Toxicol Pharmacol,2018,98:A1-A4.
[9] 杜灵彬,毛伟敏,李辉章,等.浙江省肿瘤登记膀胱癌发病及死亡特征分析[J] .浙江预防医学,2014,26(5):473-476.
[10] BERG J W.Morphologic classification of human cancer[M] //SHOTTENFELd D F J,Jr.Cancer epidemiology and prevention. 2nd ed. New York: OxFord University Press,1996.
[11] 王庆,陈泽亚,郭静,等.基于词共现矩阵的项目关键词词库和关键词语义网络[J] .计算机应用,2015,35(6):1649-1653.
[12] KWON O S,KIM J,CHOI K H,et al.Trends in deqi research: a text mining and network analysis[J] . Integr Med Res,2018,7(3):231-237.
[13] HUANG S,CAI N,PACHECO P P,et al.Applications of support vector machine (SVM) learning in cancer genomics[J] .Cancer Genomics Proteomics,2018,15(1):41-51.
[14] 宁温馨,于明.基于语义相似度计算的临床诊断自动编码算法研究[J] .医学信息学杂志,2016,37(2):52-56.
[15] 李凯. 中文文本分类方法研究[J] .电脑知识与技术,2019,15(4):242-244.
[16] 段旭磊,张仰森,孙秭卓.微博文本的句向量表示及相似度计算方法研究[J] .计算机工程,2017,43(5):143-148.
[17] 陈建国,朱健.肿瘤登记编码审核中的常见问题及处理[J] .中国肿瘤,2012,21(7):502-506.
[18] QIU J X,YOON H J,FEARN P A,et al.Deep learning for automated extraction of primary sites from cancer pathology reports[J] .IEEE J Biomed Health Inform,2018,22(1):244-251.
[19] GAO S,YOUNG M T,QIU J X,et al.Hierarchical attention networks for information extraction from cancer pathology reports[J] .J Am Med Inform Assoc,2018,25(3):321-330.
[20] 郭长满,郭敏,刘媛媛,等.机器学习算法在预测男男性行为人群中HIV感染的应用[J] .中国卫生统计,2019,36(1):28-31,35.
[1] 王曼怡, 吴菁菁, 李晓珊, 张慧茹, 黄智凯, 曾谷清. 不同年龄分组的骨密度与原发性恶性骨肿瘤的孟德尔随机化研究[J]. 预防医学, 2025, 37(6): 612-615.
[2] 蒋舒頔, 郭婷, 凌军军, 任婕, 张亮. 初次性行为年龄与妇科恶性肿瘤的孟德尔随机化研究[J]. 预防医学, 2025, 37(5): 516-520.
[3] 赵琳, 蒋龙艳, 徐斌, 唐咸艳. 南宁市五种主要恶性肿瘤发病率分析[J]. 预防医学, 2025, 37(2): 135-138.
[4] 李晓珊, 王曼怡, 张慧茹, 王顺桃, 刘新月, 曾谷清. 氨基酸与原发性恶性骨肿瘤的孟德尔随机化研究[J]. 预防医学, 2025, 37(12): 1252-1256.
[5] 杜灵彬, 邱雨, 李辉章, 李润华, 朱陈, 王乐, 裘燕飞. 2021年浙江省肿瘤登记地区恶性肿瘤发病和死亡特征分析[J]. 预防医学, 2025, 37(10): 973-978.
[6] 韩仁强, 缪伟刚, 俞浩, 陶然, 周金意. 2009—2021年江苏省肿瘤登记地区恶性肿瘤发病趋势及年龄变化分析[J]. 预防医学, 2025, 37(10): 979-984,990.
[7] 成姝雯, 董婷, 张新, 李尤, 季奎, 李元琼, 袁芝佩. 2021年四川省肿瘤登记地区恶性肿瘤发病和死亡特征分析[J]. 预防医学, 2025, 37(10): 1002-1008.
[8] 顾思萌, 李雅晖, 王晓峰, 莫哲. MAGI2-AS3在肿瘤发生发展中的调控机制研究进展[J]. 预防医学, 2024, 36(7): 594-597.
[9] 叶振淼, 樊丽辉, 郑宇航, 张默涵, 姜雪霞, 罗永园, 谢轶敏, 金茜, 李慧君. 2014—2022年温州市肝癌死亡趋势分析[J]. 预防医学, 2024, 36(5): 393-396.
[10] 吴丹红, 王伟霞, 王良友, 乔冬菊, 黄依璐, 张嫣. 台州市4类慢性病死亡及早死概率分析[J]. 预防医学, 2024, 36(5): 428-431,436.
[11] 高菡璐, 俞晓芳, 吕乐彬, 叶国良, 樊金卿. 多靶点粪便DNA、肠道菌群、癌胚抗原及水果摄入对结直肠癌风险的交互作用研究[J]. 预防医学, 2024, 36(3): 219-223.
[12] 汪怡倩, 王临池, 黄春妍, 崔俊鹏, 陆艳. 2003—2022年苏州市膀胱癌死亡趋势与减寿分析[J]. 预防医学, 2024, 36(1): 9-12.
[13] 赵祺玮, 周欣悦, 刘夏阳, 李壮, 郭晓红. 缺氧诱导因子对肿瘤间质细胞影响的研究进展[J]. 预防医学, 2024, 36(1): 34-38.
[14] 赵芳芳, 林君英, 王冬飞, 李玉荣, 高媛媛, 蒋园园. 萧山区恶性肿瘤死亡趋势分析[J]. 预防医学, 2024, 36(1): 78-81, 85.
[15] 宋隽清, 赵玉明, 石文惠. 基于政策工具的我国老年健康相关政策分析[J]. 预防医学, 2023, 35(8): 721-725.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed