wGMDH :: Model Performance Measures

Model Performance Measures

Different performance measures [3] can be used when selecting the model. A GMDH algorithm typically selects the model with best prediction based on the least squares estimate e.g. sum squared error $(E_{sse})$

$E_{sse}=\sum_{i=1}^{N}(p_{vi}-y_{vi})^{2}$

where $y_{vi}$ denotes the i^th instance of dependent variable from the validation data set and $p_{vi}$ is its approximation calculated by the corresponding polynomial model using the validation data set. Measures derived from it (by multiplication by a constant) are Root Mean Squares Error, Root Relative Squared Error. Relative Absolute Error is also an option.

Minimum Description Length (MDL) [4] is a well known principle which provides a tradeoff between the accuracy and the complexity of the model. According to [5], the MDL for linear polynomial regression consists of two terms:

$MDL=0.5\cdot N \cdot log(E_{rms}^{2})+0.5 \cdot k \cdot log(N)$

where N denotes the number of observations and k is the number of parameters of the model. The first term in the equation can be interpreted as the number of bits necessary to encode the observations given the model while the second term can be understood as the number of bits necessary to encode the model. Optimizing with regard to MDL is in general a safeguard against overfitting, so it could potentially be used with the GMDH using one dataset only, istead of using the training and the validation set.

These error performance measures optimize only the corresponding error performances without regard to the complexity of the model. This can lead to complex models with low approximation error having unacceptably large calculation time. In certain applications the complexity of the model must be limited. In these cases it may be necessary to construct simplified surrogates with as low degradation of the approximation accuracy as possible. A simple two-parameter Compound squared relative Error (CE) measure for model selection has been proposed in [6]

$E_{CE}=c_{w}\cdot(\frac{E_{rrs}}{E_{rrs0}})^{2}+(1-c_{w})(\frac{T_{exe}}{T_{exe0}})^{2}$

where $T_{exe}$ denotes the execution time (complexity) of the model, $E_{rrs0}$ and $T_{exe0}$ represent the corresponding thresholds for the Root Relative Squared Error and the execution time, while $c_{w}\; (0\leq c_{w}\leq 1)$ denotes the weighting coefficient. The CE measure consists of two normalized terms representing the error and the execution time (complexity) of the model. The weighting coefficient specifies the contribution of each term. For $c_{w}=1$ the CE measure reduces to error-term and for $c_{w}=0$ to complexity-term only. Unlike MDL and single parameter measures the proposed CE measure may control the way characteristics of selected models approach to specified thresholds, thus increasing the probability of discovering the model that satisfies both constraints.

The target model must satisfy the requirements regarding the accuracy and the complexity i.e. execution time:

$(E_{rms}\leq E_{rms0}\vee E_{rrs}\leq E_{rrs0})\wedge (T_{exe}\leq T_{exe0})$

The execution time (complexity) of the GMDH model can be estimated in the following way:

$T_{exe}\approx N_{x} \cdot (N_{add}\cdot T_{add}+N_{mul}\cdot T_{mul})$

where $N_{x}$ denotes the total number of basic second-order two-dimensional polynomials in the model, $N_{add}$ and $N_{mul}$ is the corresponding total number of Floating Point (FP) additions and multiplications in the basic polynomial, while $T_{add}$ , and $T_{mul}$ denote the average execution time of software routines implementing the FP addition and the FP multiplication, respectively. If we rewrite the basic polynomial

$p_{\lambda i} = a_{\lambda i0}+a_{\lambda i1}z_{\lambda i1}+a_{\lambda i2}z_{\lambda i2}+a_{\lambda i3}z_{\lambda i1}^{2}+a_{\lambda i4}z_{\lambda i2}^{2}+a_{\lambda i5}z_{\lambda i1}z_{\lambda i2}$

its calculation is reduced to $N_{add}=5$ FP additions and $N_{mul}=5$ FP multiplications i.e.

$p_{\lambda i} = a_{\lambda i0}+z_{\lambda i1}(a_{\lambda i1}+a_{\lambda i3}z_{\lambda i1}+a_{\lambda i5}z_{\lambda i2})+z_{\lambda i2}(a_{\lambda i2}+a_{\lambda i4}z_{\lambda i2})$