## Selectional Combinatorial GMDH

We will briefly describe a general idea of the Selectional Combinatorial GMDH algorithm according to [1], albeit using a somewhat different terminology, in context of multivariate smooth-function regression. Informally, the problem of regression by GMDH can be stated as follows: out of a set of measurements, model the dependent variable as a function of explanatory variables (regressors) optimally, according to some error criterion, where the function of dependency is represented by a GMDH network.

In order to define the manner in which GMDH networks get constructed, it is necessary to first introduce the basic building-blocks of GMDH networks and their connections. We discern two kinds of GMDH network primitives, see Fig. 1, the regressor nodes $\small \left \{ x_{i} \right \}_{i=0}^{L_{\lambda }}$ and the network nodes $\small&space;\left&space;\{&space;p_{\lambda&space;j}&space;\right&space;\}_{\lambda&space;=1}^{\Lambda}_{i=0}^{L_{\lambda}-1}$, where $\small&space;\Lambda$ represents the total number of GMDH layers with the network nodes and $\small&space;L_{\lambda}$ represents the beam search width i.e. the predetermined number of best partial solutions (network nodes) kept as candidates in layer $\small&space;\lambda$.

Figure 1. Building the GMDH networks

Generally, the network node $\small&space;p_{\lambda&space;j}$ is a two-input node with a second-order polynomial as the output nonlinearity. The regressor node $\small&space;x_{i}$ has no inputs, as it represents the regressor input into the network node. For example the network node $\small&space;p_{\lambda&space;i}$ is constructed in the following way:

$&space;p_{\lambda&space;i}=a_{\lambda&space;i0}+a_{\lambda&space;i1}z_{\lambda&space;i1}+a_{\lambda&space;i2}z_{\lambda&space;i2}+a_{\lambda&space;i3}z_{\lambda&space;i1}^{2}+a_{\lambda&space;i4}z_{\lambda&space;i2}^{2}+a_{\lambda&space;i5}z_{\lambda&space;i1}z_{\lambda&space;i2}$

where $\small&space;z_{\lambda&space;i1}$ and  $\small&space;z_{\lambda&space;i2}$ can be either the regressor node or network node and $\small&space;a_{\lambda&space;i0},\cdots&space;,a_{\lambda&space;i5}$ are the corresponding coefficients obtained by the polynomial regression.

The GMDH network is grown iteratively by inputting the nodes (either network or regressor nodes) to a new network node in a feedforward manner and performing low-order polynomial least-squares fitting to the dependent variable in order to obtain its coefficients. In that manner, the networks with complexities too small to capture the richness of dynamics of the process, which is modeled, can be used as inputs to a more complex network, better fitting the problem.

Still, finding the optimal structure of the network remains a problem due to the size of the space of possible solutions. To make the search feasible, a beam search is typically used in the GMDH algorithm. Keeping the algorithm tractable by such a constraint causes it to perform sub-optimally, but often a satisfactory sub-optimal solution suffices.

Let T and V be the training and the validation set, respectively, written in matrix form, with measurement instances concatenated as rows

$&space;T=\left&space;[&space;\left.\begin{matrix}&space;x_{11}^{t}&space;&&space;x_{21}^{t}&space;&&space;\cdots&space;&&space;x_{K1}^{t}\\&space;x_{12}^{t}&space;&&space;x_{22}^{t}&space;&&space;\cdots&space;&&space;x_{K2}^{t}\\&space;\vdots&space;&&space;\vdots&space;&&space;\ddots&space;&&space;\vdots&space;\\&space;x_{1M}^{t}&space;&&space;x_{2M}^{t}&space;&&space;\cdots&space;&&space;x_{KM}^{t}&space;\end{matrix}\right|\begin{matrix}y_{1}^{t}&space;\\&space;y_{2}^{t}&space;\\&space;\vdots&space;\\&space;y_{M}^{t}&space;\end{matrix}&space;\right&space;]=\left&space;[&space;\left.\begin{matrix}&space;\mathbf{x}_{1}^{t}&space;&&space;\mathbf{x}_{2}^{t}&space;&&space;\cdots&space;&&space;\mathbf{x}_{K}^{t}&space;\end{matrix}&space;\right|&space;\mathbf{y}^{t}&space;\right&space;]$

$&space;V=\left&space;[&space;\left.\begin{matrix}&space;x_{11}^{v}&space;&&space;x_{21}^{v}&space;&&space;\cdots&space;&&space;x_{K1}^{v}\\&space;x_{12}^{v}&space;&&space;x_{22}^{v}&space;&&space;\cdots&space;&&space;x_{K2}^{v}\\&space;\vdots&space;&&space;\vdots&space;&&space;\ddots&space;&&space;\vdots&space;\\&space;x_{1M}^{v}&space;&&space;x_{2M}^{v}&space;&&space;\cdots&space;&&space;x_{KM}^{v}&space;\end{matrix}\right|\begin{matrix}y_{1}^{v}&space;\\&space;y_{2}^{v}&space;\\&space;\vdots&space;\\&space;y_{M}^{v}&space;\end{matrix}&space;\right&space;]=\left&space;[&space;\left.\begin{matrix}&space;\mathbf{x}_{1}^{v}&space;&&space;\mathbf{x}_{2}^{v}&space;&&space;\cdots&space;&&space;\mathbf{x}_{K}^{v}&space;\end{matrix}&space;\right|&space;\mathbf{y}^{v}&space;\right&space;]$

where  $\small&space;\mathbf{x_{i}}$ is the i-th regressor with samples in rows, $\small&space;\mathbf{y_{i}}$ the dependent variable with samples in rows, with superscripts t and v denoting the test and the validation set, respectively, K is the number of regressors, M and N are the sizes of  the training and validation sets, respectively.