Comparison of Various Machine Learning Regression Models Based on Human Age Prediction

The development of machine learning strategies has made it possible to diagnose some disease automatically based on data obtained from medical imaging. Brain age is one of the factors that can be used as an indicator of cognitive well-being. Recent advancements in machine learning have made it possible for computers to anticipate classification and prediction outcomes more accurately than humans. In this study, five widely used machine learning regression models (Linear support vector regression (L-SVR), radial basis function support vector regression (RBF-SVR), relevance vector regression (RVR), Elastic Net and Gaussian process regression (GPR)) were trained and evaluated to predict brain age using volumes of brain regions data. Moreover, a dimensionality reduction technique was utilized to reduce the dimensionality of the input feature space. The data were collected from one hundred and eleven participants. The results showed no performance difference amongst models trained on the same type of data, suggesting that the type of input data had a stronger influence on prediction performance than the model choice. The experimental results indicated that the GPR was the best fit model (R2=0.57, R=0.75) among the other regression models while the G-SVR was the worst fit model (R 2 =0.0006, R=0.025) with such number of the input data.


INTRODUCTION
Globally, aging and its accompanying health issues provide a significant burden to individuals and organizations. Cognitive decline and an increased risk of neurodegenerative illness are also factors linked with aging of the brain; however, the severity of these effects varies widely between individuals. There is a rapid increase in the average age of the world's population, which is expected to reach 17% globally by the year 2050 and 25% in Europe and North America by the same year [1]. The growing proportion of elderly people in the population is associated with higher overall social and economic expenses [1]. Therefore, it is of the utmost importance to avoid or slow down the progression of further deterioration of such age-related health problems in their earlier stages and reduce the indirect cost as well. Because of that, efforts are constantly growing to address the early stages of this challenge [2]. Brain age prediction is a method for measuring the effects of aging on the brain, based on the well-established link between age and neuroanatomy over the lifecycle. Brain age gap refers to the difference between an individual's anticipated age and their chronological age. It is the primary outcome metric in brain age prediction. A positive brain age gap, which is commonly denoted as accelerated or premature aging, indicates that a person's anticipated brain age is older than their real age. A negative brain age gap, which is sometimes referred to as delayed aging, suggests a younger expected brain age [3].To determine whether the terms accelerated or delayed aging are appropriate, however, more research into the neurobiological underpinnings of brain aging is necessary. Machine learning (ML) algorithms have made it possible to automatically predict diseases using medical imaging data [4,5]. In some situations, the current advancement of ML pushes prediction accuracy beyond human ability and can help with clinical diagnostic and treatment decisions [6,7]. In the field of neuroimaging, ML has proven successful in a number of applications for predictive and diagnostic analytics, including modelling and estimating the age of the brain [8][9][10]. Recently, there has been a lot of interest in assessing how the brain ages through the use of ML techniques for predicting brain age, which are often based on structural magnetic resonance imaging (MRI) [3,9]. Several researchers have found strong relationships between brain age forecasts and chronological age [8,11]. Jiang et al. (2020) introduced CNN-based age prediction models for seven structural brain networks images of healthy persons. Furthermore, they assessed how well CNN performed in terms of age estimation when compared to GPR and RVR ML algorithms. The result showed that the CNN is outperform the performance of GPR and RVR models in age prediction. They suggested that their CNN model can be possibly used for brain age prediction to diagnose disorders [12]. Peng H, et al. (2020) introduced a lightweight deep learning network for brain age prediction using T1-weighted MRI data. The algorithm was combined with other methods to improve the performance. They compared their approach with other broadly used machine learning techniques. The results revealed that the introduced model was successfully able to predict brain age and sex clarification [13]. Baecker L, et al. (2021) studied the performance of different ML approaches in voxel-based and regional data set. It suggested that the voxel-based models achieved the overall best performance using PCA; however, this strategy was computationally demanding and may not be practical if computational or time is restricted. Whereas a training set of only 120 participants was sufficient for region-based RVR to achieve satisfactory results, also this type was quick and easy to implement [14]. Baecker L, et al. (2021) introduced the concepts underlying brain age prediction using machine learning and discussed its potential clinical applications. It suggested that the use of such technologies may facilitate earlier and more precise treatment of age-related diseases. It may support differential diagnosis, prediction, and therapy decisions as well as early detection of brain-based illnesses [3]. Ganaie et al. (2022) assessed the brain age prediction using different ML regression techniques. The suggested frameworks involved twenty-two algorithms. The best accuracy was achieved using Quadratic SVR algorithm (R 2 =0.88) and the lowest was achieved using Binary Decision Tree technique (R 2 =0.76), suggesting that the accuracy of the prediction can be improved using advanced ML algorithms [15]. Ramírez et al. (2022) compared age prediction accuracy of extreme gradient boosting ML model and partial least squares ML model. The first algorithm achieved lower mean absolute error in predicting age than the second algorithm. The authors indicated that the cortical thickness in temporal parietal lobes exhibited better prediction accuracy than frontal and occipital lobes. Additionally, they suggested that integrating the prediction model and interpretation process could help to reduce the gap between chronological and real brain age [16].
Predictive models of brain age developed from brain imaging can determine the apparent biological age of a person's brain based on brain shape and/or function using ML techniques. It is possible for a person's brain age to be significantly different from their chronological age, and the age at which individual trajectories begin to deviate from the norms of the population can indicate important aspects of a person's brain health at any point in their life, from infancy to old life. Although many studies have been achieved in this field, there are still a lot of conceptual and technical obstacles to overcome in order to effectively anticipate a person's brain age and correctly interpret age gaps in the brain.
The selection of appropriate ML methods, particularly when working with small dataset, as well as appropriate neuroimaging features are among the primary challenges. In this study, five widely-used ML techniques (GPR, Elastic Net, L-SVR, RBF-LVR, RVR) for estimating brain age were studied. Moreover, an implementation of a feature selection was examined to determine whether it will help to improve the prediction accuracy for each approach. Therefore, the effect of the mentioned ML approaches on structural MRI data-trained models were evaluated.

Participant
The total number of the participants where 122. One hundred and eleven participants (63 M,48F) with a mean age±SD= 49±17.3 as shown in figure 1, with normal or corrected to normal vision participated in the experiment. Eleven participants with contraindications to MRI (e.g., metal implants), claustrophobia, pregnant, tumour, or neurological disorders were excluded.

Imaging Parameters
Participants were scanned in a 3 Tesla Siemens MAGNETOM MRI scanner at the neurosurgery teaching hospital-Baghdad. T1 weighted structural scans were acquired with the following parameters (TR=4.6ms, TE1.4ms, 192 sagittal slices, 1 mm 3 isotropic voxels and image resolution 256×256) as shown in figure 2.

Data Pre-processing
All the structural raw data were preprocessed using Freesurfer software package, v5.0 (http://surfer.nmr. mgh.harvard.edu/). The technical details of this software package are described in [17]. Briefly, the preprocessing involves:  Motion correction: Correcting for subject head motion inside the MRI scanner.  Intensity normalization: Correct the WM intensity.  Talairach registration: Normalize the subject"s brain to a standard MNI template.  Skull stripping: Remove the non-brain structure, e.g. skull, soft tissues, and CSF.

Data Analysis
All fMRI data were further analysed using MATLAB (R2019a), MathWorks, LibSVM package, and RVR package.

Data Preparation
After pre-processing all the structural MRI images, 162 ROI volumes were extracted from each participant, and divided by the total intracranial volume (TIV) to account for different brain sizes and to normalize the volume data across all participants. Participants" (one-dimensional) volumetric data were stacked, forming a twodimensional feature matrix of (111subj X 162 features).

Prediction Algorithm
The prediction algorithm consists of two stages, feature selection using a filter algorithm (Regression ReliefF) followed by a regression algorithm (L-SVR, RBF-SVR, RVR, GPR, or Elastic Net) as shown in figure 5.

Regression ReliefF
Relief algorithms are successful, generalized attribute estimators, which have been utilized in a many setting [19]. They can detect conditional connections between characteristics and give a unified view of the estimation of attributes in regression and classification [20]. It has traditionally been seen as feature subset selection methods that are used prior to the model being developed to estimate the quality of characteristics in regression models [21]. It penalizes predictors who assign different values to neighbours with identical response values and rewards predictors who assign different values to neighbours with different response values. Intermediate weights are used to compute the final predictor weights for predictor F j . (1) where: The RReliefF function is available in MATLAB help for direct implementation and more explanation.

Linear Support Vector Regression (L-SVR)
The support vector machine (SVM) is a supervised learning technique that uses labelled training date set to build input-output mapping functions [22]. A classification or a regression function can be used as the mapping function. SVMs have shown extremely competitive performance in a variety of real-world applications, including bioinformatics, neuroscience, face recognition, and image processing, establishing SVMs as one of the cuttingedge methods for machine learning and data mining, alongside other soft computing techniques. SVMs can be characterized as linear or nonlinear [14,23]. The linear SVR model seeks a flat hyperplane that deviates as little as possible from the training data. It differs from linear regression in that the model aims to reduce observed training errors. The SVR calculates error only on data items that are outside of a margin of tolerance [24]. The hyperparameter epsilon (ε) determines the margin of tolerance, which establishes the maximum deviation from the hyperplane that data set can have [23]. The data points that located outside of this margin are known as support vector (SV) because they control the location of the hyperplane. The regularisation hyperparameter (C) is another factor that determines SVR performance. This parameter is used to decrease overfitting by balancing the hyperplane complexity with the obtained training errors [23,24]. The formula that can be used for prediction new data using SV: with condition of where: = No. of training points = Lagrange multipliers (nonnegative real numbers).

Nonlinear Support Vector Regression (RBF-SVR)
Sometimes, linear model cannot appropriately address some regression situations. In this scenario, the linear SVM technique can be extended to nonlinear functions using the Lagrange formulation [24]. A nonlinear SVM regression model can be obtained by swapping the dot product x 1 ′x 2 with a nonlinear kernel function φ (x 1 ,x 2 ) that maps x to a high-dimensional space [24]. ( ).

Relevance Vector Regression (RVR)
A Relevance Vector Machine (RVM) is a machine learning technology that employs Bayesian inference to provide efficient solutions for different regression and classification problems. The RVM has the same functional form as the SVM, nonetheless it can perform probabilistic regression. It is essentially a process of using Gaussian approach with a specific covariance model. In comparison to SVM, the Bayesian function of the RVM avoids the SVM's set of free parameters.
RVR employs a broad linear model of Bayesian formulation, which results in probabilistic rather than deterministic predictions [25]. The latter is accomplished by assuming a zero-mean normal distribution for the weights of the input data and iteratively modifying the precision values through approximation process. Weights with low precision are set to zero during training, and the basic functions related with them are trimmed. RVR results are often sparser than SVR results, implying that they employ fewer SV, which donates to their superior robustness to outliers and higher generalisation. Additionally, because RVR does not require hyperparameter adjustment, it eliminates the requirement for approaches such as random or grid search, potentially making the RVR training procedure simpler with less computational time [26]. Though, because the learning approach is a variant of expectation maximisation, the optimisation is non-convex, making predictions more susceptible to local minima mistakes [26].

Gaussian Process Regression (GPR)
GPR is a supervised machine learning model that represents a nonparametric Bayesian inference to a standard regression [27]. It is commonly used for modelling of indefinite functions or surfaces in wide range of regression and classification to spatial processes [28]. GPR deduces a probability distribution of probable values rather than learning the specific target value of training data. GPR needs the specification of a prior distribution in the form of a mean and covariance. It is commonly believed to be a multivariate normal Gaussian distribution with a mean of zero. Using Bayes' theorem, the probabilities of this prior distribution are then changed based on the goal values in the training data. The data from the previous distribution and the actual data are blended into joint probabilities in the ensuing posterior distribution. The predictive distribution for previously unnoticed data will be standard Gaussian if the previous distribution is believed to be Gaussian. The forecast for a previously unknown value can be inferred as the mean from this prediction of the distribution, and the prediction's uncertainty as its variance, therefore the RVR is a sparse GPR with a predefined covariance [29].

Elastic Net
Elastic Net is a regression method that simultaneously does variable selection and regularization. It"s a kind of modified linear regression. Regularization is essential when the problem of overfitting occurs. Overfitting occurs when the dataset contains several features, some of which are irrelevant to the prediction model [30]. This makes the model more complicated and its test set prediction too wrong (or overfitting problem). A model with such a high variance cannot generalise to fresh data [31]. To address these concerns, both L2 and L1 norm regularization can be included to obtain the advantages of Ridge and Lasso (Least Absolute Shrinkage and Selection Operator) regression simultaneously. The resulting model is more accurate in making predictions than Lasso [30,32]. It does feature selection while also simplifying the hypothesis. Here is the adjusted cost function for Elastic-Net model [33]:

Prediction Performance
There are numerous methods used to assess the performance of the regression model (prediction error) in predicting data such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) [34]. It has been stated that all distance measurements (MSE, RMSE and MAE) are equivalent and assistance to quantify the precision of the approximated result compared to the simulated data [31]. Moreover, most of the studies of healthy people typically assess the accuracy of a brain age prediction model in terms of mean absolute error (MAE) [3,13]. In this study, we adopted MAE and RMSE which are defined as

Prediction Framework
In this study, the prediction framework was performed using MATLAB. It utilized the MRI volumetric data either with a feature selection stage (scenario 1) or without (scenario 2) to predict the participant"s chronological age. A train-test split (90%-10%) was performed with a 10 folds cross validation to evaluate the generalizability of the regression model using the testing split data set as shown in figure 6. In Scenario-1, the training dataset was used as an input to the RReliefF algorithm to identify the most relevant features in the dataset. Next, the positively weighted features shown in figure 7 were selected as a training subset to train a regression model. Then, a regression model (L-SVR, RBF-SVR, RVR, GPR, or Elastic Net) was built using the training subset (scenario 1) or the training set (scenario 2) and evaluated using the testing subset (scenario 1) or the testing set (scenario 2). Subsequently, the MAE / RMSE (overall prediction performance) were calculated as an average of the 10 folds performance.

.Scenario 1: Prediction with feature selection
A different number of optimal features were selected for each of the 10 th folds as illustrated in table 1. These features were used to train a regression model and test its performance. Five different regression models were tested and compared as shown in figure 8. It can be clearly seen that the Elastic Net and GPR models were the best in predicting the age compared to the others. Figure 8 Scatter plots and correlation coffecients (R 2 ) show the chronological age (years) and the predicted age (years) using RReliefF as a feature selection approach and five different regression models. Highest accuracy was observed when training and testing using GPR followed by RVR models using a reduced set of features for the testing dataset

.Scenario 2: Prediction without feature selection
In the second scenario, regression models were trained and tested using the entire feature set (162 volumetric features). Similarly, GPR and Elastic Net models achieved accurate age prediction compared to L-SVR, RBF-LVR and RVR as shown in figure 9. Scatter plots and correlation coffecients (R 2 ) show the chronological age (years) and the predicted age (years) using five different regression models. Highest accuracy was observed when training and testing using GPR followed by Elastic Net and RVR models using all the 162 volumetric features for the testing dataset.

RMSE values
All-data Reduced-data

DISCUSSIONS
The present study compared five regressions models L-SVR, RBF-SVR, Elastic Net, GPR, and RVR with different input features to predict brain age. The assessment performed in a total of 5 models separated into two scenarios. The first scenario includes using L-SVR, RBF-SVR, Elastic Net, GPR, and RVR after utilizes of RReliefF to reduce the number of the suggested futures. Thus, through using only the futures that have the positive weighted as explained in figure 7. The higher accuracy of brain age prediction during this setup was achieved by GPR (R 2 =0.57, R=0.76) followed by RVR (R 2 =0.37, R=0.52). Moreover, the other models were unable to achieve acceptable prediction rate. The second scenario includes using L-SVR, RBF-SVR, Elastic Net, GPR, and RVR with all 162 suggested futures. The higher accuracy of brain age prediction was achieved by again GPR (R 2 =0.56, R=0.75) followed by Elastic Net (R 2 =0.44, R=0.65) and RVR (R 2 =0.37, R=0.52). Moreover, the other two models were unable to reach adequate forecast rate.
The GPR regression model was the best in the age predicting compared to the others in the both scenarios which is in line with result obtained by [13,32] and [33]. The MAE and RMSE value were the same in both scenarios indicating that the reduction algorithm in the feature selecting was ineffective for GPR model. Meanwhile, the presence of feature selection algorithm had a negative impact on Elastic Net regression model, were the MAE and RMSE value drift from 11.4 to 22.7 and 14.5 to 30.2 respectively. This could be due to eliminating a number of some effective features [14]. Moreover, feature selection algorithm had no effect on the performance of GPR, L-SVR and G-SVR prediction-based models. However, the prediction accuracy has been improved using RVR regressing model implanting the RReliefF feature selecting algorithm. It seemed that dimensionality reduction algorithm could successfully remove some redundant characteristics in the RVR region-based model.
In conclusion, we assume that the difference between all-data and reduced for GPR and RVR is negligible, however for Elastic Net, using all-data produced higher performance due to the importance of including all features in building the regression model. Furthermore, reducing the features space (no. of features) may eliminate some of the relevant features that are part of the input feature pattern which is necessary for training the regression model.
The main limitation of this study was the limited number of the data set that was gathered form the hospital. However, the features of this particular data collection may affect our recommendations for sample size and processing capacity. Nevertheless, the achieved results were close to the result obtained by [14] with 120 samples training data. We do think that other data sets can benefit from applying the basic principle that certain models demand significantly more training data and processing resources than others.
While the present study assessed a variety of machine learning model and data input methodological options, there are a number of other methods that might be explored in the future. These models may include some age correction in the training data set. Several kinds of correction have been introduced recently [37][38][39][40]. In Addition, a better performance may be achieved through multi-modal data set such as combining region level and voxellevel data features rather than using single model futures. Thus, making them a possible path for future study on brain age prediction models.

CONCLUSIONS
Predicting a person's chronological age with acceptable precision using machine learning model trained on T1-MRI is possible in healthy people. This can be done using limited raw MRI data, with just a minimum amount of processing required to produce a reliable age estimate. The GPR algorithm was the best regression model among the other regression methods that have been presented in this study with boundary of the input data. In the future,