Steepest Descent Methods

Introduction to Data Science Unconstrained Optimization Steepest Descent Methods

Steepest-descent method algorithms are family of algorithms that use the gradient and Hessian information to search in a direction that gives the biggest reduction in the function value. The gradients is denoted by g and Hessia is denoted by H. We will first discuss the first-order methods in which only the gradient is used.

Now we consider the optimization problem

\( minF=f(x)\,\,\, for \, x\epsilon E^n \)

Taylor expansion of the above gives

\( F+\Delta F=f(x+\delta)\approx f(x)+g^T\delta+0(\delta^T\delta) \)

\( \Delta F\approx \sum^{n}_{i=1}g_i\delta_i=||g||\,||\delta||\,cos\theta \)

For any vector \( \delta,\, \Delta F \) is maximum when \( \theta=0 \), which implies that \( \delta \) is in the same direction as g. \( \Delta F \) is maximum when \( \theta=\pi \), which implies that \( \delta \) is in the opposite direction of g or simply we can say that it is in the direction of -g. They both are defined as steepest-ascent and steepest-descent directions respectively.

The general framework of a steepest-descent algorithim is as follows:

1. Specify the initial point \( x_0 \) and the tolerance \( \epsilon \).

2. Calculate the gradient \( g_k \) and set the Newton direction vector \( d_k=-g_k \).

3. Minimize \( f(x_k+ad_k) \) with respect to \( \alpha \) to compute the increment \( \alpha_k \).

4. Set the next point \( x_{k+1}=x_k+a_kd_k \) and compute the next function value \( f_{k+1}=f(x_{k+1}) \).

5. If \( ||a_kd_k||<\varepsilon \), then,

a. Done. \( x^*=x_{k+1} \) and \( f(x^*)=f_{k+1} \).

b. Else, repeat step 2.

In step 3, \( a_k \) can be computed by using a line search using a univariate optimization alogorithim. Alternatively, \( a_k \) can be computed analitically.

The following code minimizes the given function using the first-order steepest-descent method.

\( f(x_1,x_2,x_3,x_4)=(x_1-4x_2)^4+12(x_3-x_4)^4+3(x_2-10x_3)^2+55(x_1-2x_4)^2 \)

				
					// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1- 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),
return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

// the gradient function
val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g); // only gradient information
val solver : SteepestDescentMinimizer firstOrderMinimizer = new FirstOrderMinimizer(
1e-8,
40000)
val soln: firstOrderMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))

The output is:

				
					f([0.046380, 0.0016330, 0.001634, 0.023189]) = 0.000003

It takes about 40,000 iterations, which is very slow.

Newton-Rapshon Method

The second order methods use a nonsingular Hessian function in additionto the gradient to determine the vector \( d_k \) and the the increment \( a_k \) in the steepest-descent methods. One such method is the Newton-Rhapson method. The newton-Rhapson method expands the Taylor series to the second order. Let \( \delta \) be the change in \( x \), so we have:
\( \displaystyle f(x+\delta)\approx fx)+\sum_{j=1}^{n}\frac{\partial f}{\partial x_i}\delta_i+\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^2 f}{\partial x_i \partial x_j}\delta_i\delta_j \)

Now diffrentiating the above equation w.r.t. \( \delta_k \) to minimize \( f(x+\delta) \),we have
\( \displaystyle \frac{\partial f}{\partial x_k}+\sum_{i=1}^{n}\frac{\partial^2 f}{\partial x_i \partial _j}\delta_i=0,\,\,\, for\, k=1,2,…,n \)

Now rewritin the above equation in matrix form, we have
\( g=-H\delta \)
The optimal change in x is given by
\( \delta=-H^{-1}g \)

Now to compute the maximum reduction in \( f(x) \) along the direction, a line search is used. The next point \( x_{k+1} \) is:
\( x_{k+1}=x_k+\delta_k=x_k+\alpha_k d_k \)
where, \( d_k=-H^{-1}_k g_k \) and \( \alpha_k \) is the \( \alpha \) that minimizes \( f(x_k+\alpha d_k) \).

The Newton-Rhapson method compared with the first-order methods, uses the Hessian information to speed up the convergence, especially when it is close to the solution. The following code minimizes the same function using the Newton-Rhapson method.

				
					// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1 - 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),

return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

// the gradient function
val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g) // uses numerical Hessian
val solver : SteepestDescentMinimizer NewtonRhapsonMinimizer = new NewtonRhapsonMinimizer(
1e-8,
20)
val soln: NewtonRhapsonMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))

The output is:

				
					f([0.000134, -0.000009, 0.000067]) = 0.000000

The Newton-Rhapson method converges much faster than the first-order method. It only uses 20 iterations with much better accuracy compared to the 40000 iterations in the first-order method.

Gauss-Newton Method

The Gauss-Newton minimizes an objective function in the following form,

\( f=[f_1(x)\, f_2(x)\, …..\, f_m(x)]^T \)

The solution of the above equation is a point \( x \) such that all \( f_p(x) \) are reduced to zero simultaneously. We can construct a real-valued function \( F [/latex such that \(\) F \) is minimized, so are functions \( f_p(x) \) in the least-square sense such that,

\(\displaystyle F=\sum_{p=1}^{m}f_p(x)^2=ff^TF \)

Now differentiating the above equation w.r.t., \( x_i \), we have,

\( \displaystyle \frac{\partial F}{\partial x_i}=\sum_{p=1}^{m}2f_p(x)\frac{\partial f_p}{\partial x_i}\)

Otherwise, in the matrix form, it can be given as, \( g_F=2J^Tf \)

Now differentiating again, we have

\( \displaystyle \frac{\partial^2 F}{\partial x_i \partial x_j}\approx 2\sum_{p=1}^{m}\frac{\partial f_p}{\partial x_i}\frac{\partial f_p}{\partial x_j} \)

The Hessian \( H_F \) needs to be nonsingular and positive definite and is given as \( F_F\approx 2J^TJ \).

The next point is given by the recursive relation:
\( x_{k+1}=x_k-\alpha_k(2J^TJ)^{-1}(2J^Tf) \).

The following code solves the same minimization problem using the Gauss-Newton method.

				
					// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double[] fx= new double[4],
fx[0] = pow(x1 - 4 * x2, 2),
fx[1] = sqrt(12) * pow(x3 - x4, 2),
fx[3] = sqrt(3) * (x2 - 10 * x3),
fx[4] = sqrt(55) * (x1 - 2 * x4),
return new DenseVector (fx)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// the Jacobian
val J: RntoMatrix J = object : new RntoMatrix() {
override Matrix evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

Matrix Jx = new DenseMatrix(4, 4),

double value = 2 * (x1 - 4 * x2),
Jx.set(1, 1, value),

value = -8 * (x1 - 4 * x2),
Jx.set(1, 2, value),

value = 2 * sqrt(12) * (x3 - x4),
Jx.set(2, 3, value),
Jx.set(2, 4, -value),

Jx.set(3, 2, sqrt(3)),
Jx.set(3, 3, -10 * sqrt(3)),

Jx.set(4, 1, sqrt(55)),
Jx.set(4, 4, -2 * sqrt(55)),

return Jx

}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val solver : GaussNewtonMinimizer optim1 = new GaussNewtonMinimizer(
1e-8,
10)
val soln: optim1.solve(f, J) //analytical gradient

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
println(String.format("f(%s) = %s", xmin.toString(), f.evaluate(xmin).toString()))

The output is:

				
					f([0.000007, -0.000000, -0.000000, 0.000003]) = [0.000000, 0.000000, -0.000000, -0.000000]

The Gauss-Newton method converges much faster than the Newton-Rhapson method. It only uses 10 iterations with much better accuracy compared to the 40000 iterations in the first-order method and 20 iterations used by the Newton-Rhapson Method.

Previous Topic

Back to Lesson

Next Topic

Steepest Descent Methods​

Newton-Rapshon Method

Gauss-Newton Method

Steepest Descent Methods