Risk factors based vessel‐specific prediction for stages of coronary artery disease using Bayesian quantile regression machine learning method: Results from the PARADIGM registry

Abstract Background and Hypothesis The recently introduced Bayesian quantile regression (BQR) machine‐learning method enables comprehensive analyzing the relationship among complex clinical variables. We analyzed the relationship between multiple cardiovascular (CV) risk factors and different stages of coronary artery disease (CAD) using the BQR model in a vessel‐specific manner. Methods From the data of 1,463 patients obtained from the PARADIGM (NCT02803411) registry, we analyzed the lumen diameter stenosis (DS) of the three vessels: left anterior descending (LAD), left circumflex (LCx), and right coronary artery (RCA). Two models for predicting DS and DS changes were developed. Baseline CV risk factors, symptoms, and laboratory test results were used as the inputs. The conditional 10%, 25%, 50%, 75%, and 90% quantile functions of the maximum DS and DS change of the three vessels were estimated using the BQR model. Results The 90th percentiles of the DS of the three vessels and their maximum DS change were 41%–50% and 5.6%–7.3%, respectively. Typical anginal symptoms were associated with the highest quantile (90%) of DS in the LAD; diabetes with higher quantiles (75% and 90%) of DS in the LCx; dyslipidemia with the highest quantile (90%) of DS in the RCA; and shortness of breath showed some association with the LCx and RCA. Interestingly, High‐density lipoprotein cholesterol showed a dynamic association along DS change in the per‐patient analysis. Conclusions This study demonstrates the clinical utility of the BQR model for evaluating the comprehensive relationship between risk factors and baseline‐grade CAD and its progression.


| INTRODUCTION
Cardiovascular disease (CVD) is the primary cause of morbidity and mortality worldwide, with a global burden of 17 million deaths annually. 1 Among them, coronary artery disease (CAD) accounts for over 50% of the total deaths and this number continues to increase. 2 Various physiological and behavioral cardiovascular (CV) risk factors have been found to be associated with the development of CAD. [3][4][5] Different symptoms can present themselves according to lesion severity or location and their interrelationships. 5 Almost 60% of patients with stable chest pain exhibit non-obstructive stenotic CAD with much less typical angina symptoms than obstructive CAD. 6,7 In addition, various CV risk factors are associated with symptom presentation. 8,9 PARK ET AL.

| 321
Coronary atherosclerosis is a chronic and progressive process; thus, detecting subclinical atherosclerosis and intervening in its early phase has significant importance for clinical outcomes. 5,9 Therefore, comprehensive studies are needed from the early to severe stages of CAD for optimized treatments. However, to date, most previous research has focused on obstructive CAD prediction via standard regression model analysis, overlooking the importance of the early stage of CAD as most deep and shallow machine learning models investigate only the average relationship between clinical outcome and risk factors. In contrast, the Bayesian quantile regression (BQR) model, a recently introduced machine learning method, is useful for analyzing the comprehensive association between clinical variables with various stages of CAD because BQR model yields multiple quantile regression curves. [10][11][12][13] Particularly useful for revealing hidden independent dynamic associations of target clinical variables according to quantile stages of endpoint in a complex database such as clinical data; thus, it can be applied to specific patients for tailored therapy.
Therefore, we aimed to apply the BQR model to the association analysis between graded subclinical and clinical coronary atherosclerosis and CV risk factors to evaluate vessel-specific dynamic interrelationships.

| Study design and population
We analyzed the data from Progression of AtheRosclerotic PlAque

| Data extraction and analysis
The baseline clinical characteristics and laboratory data were used as clinical variables, and the per-segment-based quantitative CCTA findings were used for a set of outcomes. We performed a vessel-wise analysis with these data at all outcome-level settings using the Bayesian truncated quantile regression model. For the vessel-wise analysis, all 18 coronary segments were classified into the following three vessel groups: left anterior descending (LAD), left circumflex (LCx), and right coronary artery (RCA). The largest quantitative DS measurement in each vessel (LAD, LCx, or RCA) was regarded as the representative value for each vessel, and the largest DS among the vessels was regarded as the representative value for each patient. Most often, the LAD was included (n = 1264) followed by the RCA (n = 864) and the LCx (n = 718). Figure S1 shows the histograms of DS values for the three vessels (LAD and LCx, and RCA) and each patient; the shapes of the histograms show that the data generating the distributions were not normally distributed and were truncated. Figure S2 shows the histograms of DS changes (defined as post-DS minus pre-DS divided by CCTA intervals) for the three vessels (the LAD, LCx, and RCA) and each patient.
We tested the following two models: the DS model (Model 1) and DS change model (Model 2). Multiple CV risk factors including the symptom variables were used to predict quantile DS values for the three vessels and each patient in Model 1 and also used to predict quantile DS changes in Model 2.

| Quantile regression modeling
The quantile regression model for DS prediction (Model 1) was defined as follows: where Baselines i were baseline CV risk factors including age, sex, body mass index (BMI), smoking, diabetes, hypertension, and dyslipidemia; Symptom Types i were categorical risk factors denoting the types of patients' symptoms comprised "typical angina, atypical angina, Noncardiac pain, and others" with "asymptotic" as the reference category; Lab Exams i were continuous variables from laboratory examinations including high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglycerides (TG); ϵ θ was the error term with its θth quantile equal to zero (in our study, θ were 10%, 25%, 50%, 75%, and 90%).

| Statistical analysis
All statistical analyses were performed using R software with package "ctqr" (version 4.1.0, R Foundation for Statistical Computing). 15 Continuous variables were presented as means and standard deviations. Categorical variables were presented as frequencies and percentages. Prediction performance was evaluated using the area under the curve (AUC) values of the receiver operating characteristic curves.

| Study population and AUC values for overall and the three major vessels
The baseline characteristics of the study population are presented in

| DISCUSSION
In the present study, we demonstrated the clinical utility of the Although Wehby et al. 20 first introduced the utility of the BQR model in the medical field by presenting the different risk factors for low and high birth weight, it is not widely adopted probably because its interpretation seems somewhat unintuitive since the concept of quantile is less familiar than means. 21 However, with the increased interest in machine learning methods in medical research, quantile regression has recently attracted attention as a valuable data analysis tool in the medical research area. 13 Kuhudzai et al. 22 [25][26][27] To the best of our knowledge, the present study is the first to apply BQR analysis to the prediction of CAD and especially for CAD Recent studies have shown the possibility of deep learning-based novel methods for detecting CAD in its early stage utilizing a conventional twelve-lead electrocardiogram (ECG). 28  In conclusion, we introduced the BQR machine learning method in the CV field to evaluate the complex interrelationship between CV risk factors and the different stages of CAD and its progression.
Using this innovative method, we comprehensively determined the dominant association of each coronary vessel with symptoms or CV risk factors, which is clinically useful.