⁃ To begin, back propagation should be used to train the hidden layer.
⁃ Neural Network training(back propagation) is a curve fitting method. During the training phase, a non-linear curve is fitted. Stochastic approximation is used, which is called back propagation.
⁃ For each of the node in the hidden layer, we have to find t(receptors) & the variance (σ)[variance — the spread of the radial basis function]
⁃ The weighting vectors between hidden and output layers need to be updated on the second training phase.
⁃ Each node within the hidden layer corresponds to a transformation basis function. It is possible to have non-linear separability with any of the functions OR with any combination of the functions.
Therefore, we include all non-linearity terms in our hidden layer transformation. A hypersurface equation would be comprised of X2 + Y2 + 5XY (where X & Y are inputs).
A clustering algorithm is used in the first stage of training. Cluster centers are defined as needed. We design cluster centers based on the clustering algorithm, which we assign as receptors to each hidden neuron.
Using N samples or observations, I will cluster them into M clusters (N > M).
Consequently, the output “clusters” are the “receptors”.
⁃ The variance of each receptor can be calculated as “the squared sum of the distance between the receptor and the nearest sample in each cluster” := 1/N * ||X — t||²
⁃ The first training phase interprets the feature vector as a feature projection onto the transformed space.