Korean medicine Data Center 한의학의 임상현상을 과학적으로 규명하기 위한 체계적 통합 정보은행
  • home
  • 정보마당
  •     
  • 논문
논문 게시글
제목 Association of Morphological Features with Hematocrit Levels in Korean Adults: Classification Approach using Machine Learning
등록일 2015-12-11 첨부파일
구분 SCI
학술지 Life Science Journal-Acta Zhengzhou University Overseas Edition
발표일 2014-07-17
저자 이범주, 김종열
Iron deficiency, which is used to diagnose anemia, is often identified by the measurement of hematocrit levels, and hematocrit levels have been associated with specific morphological features. However, no studies have evaluated the best indicator for identifying hematocrit levels derived from morphological features using machine learning. The objectives of the present study were to identify the best indicator of hematocrit levels among several morphological features and to predict hematocrit levels using a combination of morphological features based on data mining techniques. A total of 1,838 subjects participated in this study. We used two machine learning algorithms, logistic regression (LR) and naive Bayes (NB) algorithms, to identify the best indicator among several morphological features. To overcome the class imbalanced problem and select important features, the synthetic minority over-sampling technique and wrapper-based variable selection were applied to the data set in prediction experiments using combined features. Among all individual features, the best indicator for predicting high and low hematocrit levels was age (p = <0.0001; OR = 0.352; AUC = 0.756 by naive Bayes and 0.759 by logistic regression), and among all morphological features, the strongest predictor was body weight (p = <0.0001; OR = 1.724; AUC = 0.639 by naive Bayes and 0.641 by logistic regression). For the combined features examined, the area under the receiver operating characteristic curve (AUC) of the four models ranged from 0.745 to 0.789. The method using NB algorithm with wrapper-based variable selection showed the best predictive accuracy (AUC = 0.789; MCC = 0.445) and proved suitable for the prediction of low and high hematocrit levels; this method decreased the model complexity, resulting in the best prediction accuracy and providing the most cost-effective approach. The findings of the present study provide medical knowledge for primary screening and support the use of tools to predict hematocrit levels in both on-site and remote-site healthcare services.

*원문신청: kdc@kiom.re.kr