Objectives: The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of diseases such as cardiovascular disease, diabetes, and stroke. We propose a method for the prediction of normal, overweight, and obese classes based only on the combination of voice features that are associated with BMI status, independently of weight and height measurements.
Materials and methods: A total of 1568 subjects were divided into 4 groups according to age and gender differences. We performed statistical analyses by analysis of variance (ANOVA) and Scheffe test to find significant features in each group. We predicted BMI status (normal, overweight, and obese) by a logistic regression algorithm and two ensemble classification algorithms (bagging and random forests) based on statistically significant features.
Results: In the Female-2030 group (females aged 20–40 years), classification experiments using an imbalanced (original) data set gave area under the receiver operating characteristic curve (AUC) values of 0.569–0.731 by logistic regression, whereas experiments using a balanced data set gave AUC values of 0.893–0.994 by random forests. AUC values in Female-4050 (females aged 41–60 years), Male-2030 (males aged 20–40 years), and Male-4050 (males aged 41–60 years) groups by logistic regression in imbalanced data were 0.585–0.654, 0.581–0.614, and 0.557–0.653, respectively. AUC values in Female-4050, Male-2030, and Male-4050 groups in balanced data were 0.629–0.893 by bagging, 0.707–0.916 by random forests, and 0.695–0.854 by bagging, respectively. In each group, we found discriminatory features showing statistical differences among normal, overweight, and obese classes. The results showed that the classification models built by logistic regression in imbalanced data were better than those built by the other two algorithms, and significant features differed according to age and gender groups.
Conclusion: Our results could support the development of BMI diagnosis tools for real-time monitoring; such tools are considered helpful in improving automated BMI status diagnosis in remote healthcare or telemedicine and are expected to have applications in forensic and medical science. |