Steepest-descent method algorithms are family of algorithms that use the gradient and Hessian information to search in a direction that gives the biggest reduction in the function value. The gradients is denoted by g and Hessia is denoted by H. We will first discuss the first-order methods in which only the gradient is used.

Now we consider the optimization problem

$$minF=f(x)\,\,\, for \, x\epsilon E^n$$

Taylor expansion of the above gives

$$F+\Delta F=f(x+\delta)\approx f(x)+g^T\delta+0(\delta^T\delta)$$

$$\Delta F\approx \sum^{n}_{i=1}g_i\delta_i=||g||\,||\delta||\,cos\theta$$

For any vector $$\delta,\, \Delta F$$ is maximum when $$\theta=0$$, which implies that $$\delta$$ is in the same direction as g. $$\Delta F$$ is maximum when $$\theta=\pi$$, which implies that $$\delta$$ is in the opposite direction of g or simply we can say that it is in the direction of -g. They both are defined as steepest-ascent and steepest-descent directions respectively.

The general framework of a steepest-descent algorithim is as follows:

1. Specify the initial point $$x_0$$ and the tolerance $$\epsilon$$.

2. Calculate the gradient $$g_k$$ and set the Newton direction vector $$d_k=-g_k$$.

3. Minimize $$f(x_k+ad_k)$$ with respect to $$\alpha$$ to compute the increment $$\alpha_k$$.

4. Set the next point $$x_{k+1}=x_k+a_kd_k$$ and compute the next function value $$f_{k+1}=f(x_{k+1})$$.

5. If $$||a_kd_k||<\varepsilon$$, then,

a. Done. $$x^*=x_{k+1}$$ and $$f(x^*)=f_{k+1}$$.

b. Else, repeat step 2.

In step 3, $$a_k$$ can be computed by using a line search using a univariate optimization alogorithim. Alternatively, $$a_k$$ can be computed analitically.

The following code minimizes the given function using the first-order steepest-descent method.

$$f(x_1,x_2,x_3,x_4)=(x_1-4x_2)^4+12(x_3-x_4)^4+3(x_2-10x_3)^2+55(x_1-2x_4)^2$$

				
// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1- 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),
return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g); // only gradient information
val solver : SteepestDescentMinimizer firstOrderMinimizer = new FirstOrderMinimizer(
1e-8,
40000)
val soln: firstOrderMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))



The output is:

				
f([0.046380, 0.0016330, 0.001634, 0.023189]) = 0.000003



It takes about 40,000 iterations, which is very slow.

## Newton-Rapshon Method

The second order methods use a nonsingular Hessian function in additionto the gradient to determine the vector $$d_k$$ and the the increment $$a_k$$ in the steepest-descent methods. One such method is the Newton-Rhapson method. The newton-Rhapson method expands the Taylor series to the second order. Let $$\delta$$ be the change in $$x$$, so we have:
$$\displaystyle f(x+\delta)\approx fx)+\sum_{j=1}^{n}\frac{\partial f}{\partial x_i}\delta_i+\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^2 f}{\partial x_i \partial x_j}\delta_i\delta_j$$
Now diffrentiating the above equation w.r.t. $$\delta_k$$ to minimize $$f(x+\delta)$$,we have
$$\displaystyle \frac{\partial f}{\partial x_k}+\sum_{i=1}^{n}\frac{\partial^2 f}{\partial x_i \partial _j}\delta_i=0,\,\,\, for\, k=1,2,…,n$$
Now rewritin the above equation in matrix form, we have
$$g=-H\delta$$
The optimal change in x is given by
$$\delta=-H^{-1}g$$
Now to compute the maximum reduction in $$f(x)$$ along the direction, a line search is used. The next point $$x_{k+1}$$ is:
$$x_{k+1}=x_k+\delta_k=x_k+\alpha_k d_k$$
where, $$d_k=-H^{-1}_k g_k$$ and $$\alpha_k$$ is the $$\alpha$$ that minimizes $$f(x_k+\alpha d_k)$$.
The Newton-Rhapson method compared with the first-order methods, uses the Hessian information to speed up the convergence, especially when it is close to the solution. The following code minimizes the same function using the Newton-Rhapson method.

				
// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1 - 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),

return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g) // uses numerical Hessian
val solver : SteepestDescentMinimizer NewtonRhapsonMinimizer = new NewtonRhapsonMinimizer(
1e-8,
20)
val soln: NewtonRhapsonMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))



The output is:

				
f([0.000134, -0.000009, 0.000067]) = 0.000000



The Newton-Rhapson method converges much faster than the first-order method. It only uses 20 iterations with much better accuracy compared to the 40000 iterations in the first-order method.

## Gauss-Newton Method

The Gauss-Newton minimizes an objective function in the following form,

$$f=[f_1(x)\, f_2(x)\, …..\, f_m(x)]^T$$

The solution of the above equation is a point $$x$$ such that all $$f_p(x)$$ are reduced to zero simultaneously. We can construct a real-valued function $$F [/latex such that \($$ F \) is minimized, so are functions $$f_p(x)$$ in the least-square sense such that,

$$\displaystyle F=\sum_{p=1}^{m}f_p(x)^2=ff^TF$$

Now differentiating the above equation w.r.t., $$x_i$$, we have,

$$\displaystyle \frac{\partial F}{\partial x_i}=\sum_{p=1}^{m}2f_p(x)\frac{\partial f_p}{\partial x_i}$$

Otherwise, in the matrix form, it can be given as, $$g_F=2J^Tf$$

Now differentiating again, we have

$$\displaystyle \frac{\partial^2 F}{\partial x_i \partial x_j}\approx 2\sum_{p=1}^{m}\frac{\partial f_p}{\partial x_i}\frac{\partial f_p}{\partial x_j}$$

The Hessian $$H_F$$ needs to be nonsingular and positive definite and is given as $$F_F\approx 2J^TJ$$.

The next point is given by the recursive relation:
$$x_{k+1}=x_k-\alpha_k(2J^TJ)^{-1}(2J^Tf)$$.

The following code solves the same minimization problem using the Gauss-Newton method.

				
// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double[] fx= new double[4],
fx[0] = pow(x1 - 4 * x2, 2),
fx[1] = sqrt(12) * pow(x3 - x4, 2),
fx[3] = sqrt(3) * (x2 - 10 * x3),
fx[4] = sqrt(55) * (x1 - 2 * x4),
return new DenseVector (fx)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// the Jacobian
val J: RntoMatrix J = object : new RntoMatrix() {
override Matrix evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

Matrix Jx = new DenseMatrix(4, 4),

double value = 2 * (x1 - 4 * x2),
Jx.set(1, 1, value),

value = -8 * (x1 - 4 * x2),
Jx.set(1, 2, value),

value = 2 * sqrt(12) * (x3 - x4),
Jx.set(2, 3, value),
Jx.set(2, 4, -value),

Jx.set(3, 2, sqrt(3)),
Jx.set(3, 3, -10 * sqrt(3)),

Jx.set(4, 1, sqrt(55)),
Jx.set(4, 4, -2 * sqrt(55)),

return Jx

}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val solver : GaussNewtonMinimizer optim1 = new GaussNewtonMinimizer(
1e-8,
10)
val soln: optim1.solve(f, J) //analytical gradient

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
println(String.format("f(%s) = %s", xmin.toString(), f.evaluate(xmin).toString()))



The output is:

				
f([0.000007, -0.000000, -0.000000, 0.000003]) = [0.000000, 0.000000, -0.000000, -0.000000]



The Gauss-Newton method converges much faster than the Newton-Rhapson method. It only uses 10 iterations with much better accuracy compared to the 40000 iterations in the first-order method and 20 iterations used by the Newton-Rhapson Method.