Quasi-Newton methods is a framework of algorithims that is built on the Newton-Rhason method and the line search in this method is based on a $$n*n$$ matrix S, which serves as the same purpose as the inverse Hessian matrix in the Newton-Rhapson Method. The Quasi-Newton method is yet another framework of algorithims that do not rquire the Hessian. The expression in Quasi-Newton method can be given as

$$x_{k+1}=x_k-\alpha_kS_kg_k$$

When $$S_K=I$$, it is the algorithim for the steepest-descent method. When $$S_k=H^{-1}$$, the algorithim is for the Newton-Rhapson method. The above equation of $$x_{k+1}$$ converges fastest when $$S_k=H^{-1}$$. Quasi-Newton methods combines the advantages of the steepest-descent and the conjugate-direction methods. They are among the most efficient methods and are used extensively in many applications.

Assume that the difference in successive $$x_k$$ is $$\delta_k=x_{k+1}-x_k$$ and the difference in successive gradient is $$g_k$$ is $$\gamma_k=g_{k+1}-g_k$$, then we have $$\gamma_k=H\delta_k$$. The Newton-Rhapson method determines the direction vector as $$d_k=-S_kg_k$$. The magnitude $$\alpha_k$$ can be determined by a line search minimizing the function $$f(x_k+\alpha d_k)$$. Now here we want to avoid inverting a matrix such as computing $$S_k=H^{-1}$$ and checking that $$S_k$$ is positive definite in each iteration, rather we simply want an updating rule for $$S_{k+1}$$ such as

$$S_{k+1}S_k+C_k$$.

Here $$C_k$$ is a $$n*n$$ correction matrix that can be computed form available data. In any definition of $$C_k$$, $$S_{k+1}$$ must satisfy these properties:

1. $$S_{k+1}\gamma_k=\delta_k$$.

2. Vectors $$\{\delta_0,\delta_1,…,\delta_{n-1}\}$$ must be conjugate direction vectors.

3. A positive definite $$S_k$$ must produce a positive definite $$S_{k+1}$$.

The second property states that the advantages of tyhe conjugat-direction method method applies to Quasi-Newton method. The third method states that the direction vector is valid in each iteration. There are many ways to define $$C_k$$ and they give rise to different variants of the Quasi-Newton method.

Rank-One Method

The Rank-Method method is featured by the correction matrix $$C_k$$ has a unity rank. Now we assume that

$$S_{k+1}\gamma_k=\delta_k$$.

Let

$$S_{k+1}=S_k\gamma_k+\beta\xi_k\xi_k^T\gamma_k$$

And

$$\gamma^T_k(\delta_k-S_k\gamma_k)=\beta_k\gamma^T_k\xi_k\xi_k^T\gamma_k=\beta_k(\xi^T_k\gamma_k)^2$$

$$(\delta_k-S_k\gamma_k)(\delta_k-S_k\gamma_k)^T=\beta_k(\xi_k^T\gamma_k)^2\beta_k\xi_k\xi_k^T$$

Then we have

$$\displaystyle \beta_k\xi_k\xi_k^T=\frac{(\delta_k-S_K\gamma_k)(\delta_k-S_k\gamma_k)^T}{\gamma_k^T(\delta_k-S_k\gamma_k)}$$

Therefore,

$$S_{k+1}=S_k+\beta_k\xi_k\xi_k^T$$.

There are two problems with this method. First, the positive definite $$S_k$$ may not give a positive definite $$S_{k+1}$$. In such a case, the next direction is not a good direction. Second, the denominator in the correction formula may approach or become equal to zero. If that happens, the method breaks down as $$S_{k+1}$$ is undefined.

Davidon-Fletcher-Powell Method

The Davidon-Fletcher-Powell (DPF) method is similar to the rank-one method but it has one important advantage, that if the initial matrix $$S_0$$ s positive definite, then the subsequent matrices are always positive definite. Unlike the Raank-One method, every new direction is a descent direction.

The DPF formula is

$$\displaystyle S_{k+1}=S_k+\frac{\delta_k\delta_k^T}{\delta_k^T\gamma_k}-\frac{S_k\gamma_k\gamma_k^Ts_k}{\gamma_k^TS_k\gamma_k}$$

The correction is an $$n*n$$ symmetric matrix of rank two.

Broyden-Fletcher-Goldfarb-Shanno Method

The BFGS method formula is

$$\displaystyle S_{k+1}=S_k+(1+\frac{\gamma^T_kS_k\gamma_k}{\gamma_k^T\delta_k})\frac{\delta_k\delta_k^T}{\gamma^T_k\delta_k}-\frac{\delta_k\gamma_k^TS_k+S_k\gamma_k\delta_k^T}{\gamma_k^T\delta_k}$$

The BFGS method has the following properies:

1. $$S_{k+1}$$ becomes identical to $$H^{-1}$$ for $$=n-1$$.

2. Direction vectors $$\{\delta_i\}_{i1,2,…,n-1}$$ form a conjugate set.

3. $$S_{k+1}$$ is the positive definite if $$S_k$$ is positive definite.

4. $$\delta_k^T\gamma_k=\delta^T_kg_{k+1}-\delta^T_kg_k>0$$.

BFGS method is the best method among all.

Huang Family Method

Huang method is a general formula that includes the rank-one, DFP, BFGS, Pearson and McCormick ethods. It has the following form

$$\displaystyle S_{k+1}=S_k+\frac{\delta_k(\theta\delta_k+\phi S^T_k\gamma_k)^T}{(\theta\delta_k+\phi S^T_k\gamma_k)^T\gamma_k}-\frac{S_k\gamma_k(\psi\delta_k+\omega S^T_k\gamma_k)^T}{(\psi\delta_k+\omega S^T_k\gamma_k)^T\gamma_k}$$

$$\theta,\, \phi,\, \psi$$ and $$\omega$$ are independent parameters.

By letting $$theta=1,\, \phi=-1,\, \psi=1,\, \omega=-1,$$ we have the Rank-One formula.

By letting $$theta=1,\, \phi=0,\, \psi=0,\, \omega=1,$$ we have the DFP formula.

By letting $$theta=1,\, \phi=0,\, \psi=1,\, \omega=0,$$ we have the McCormick formula:

$$\displaystyle S_{k+1}=S_k+\frac{(\delta_k-S_k\gamma_k)\delta^T_k}{\delta_k^T\gamma_k}$$

By letting $$theta=0,\, \phi=1,\, \psi=0,\, \omega=1,$$ we have the Pearson Formula:

$$\displaystyle S_{k+1}=S_k+\frac{(\delta_k-S_k\gamma_k)\gamma_k^TS_k}{\gamma_k^TS_k\gamma_k}$$

By letting $$\displaystyle \frac{\phi}{\theta}=\frac{-\gamma_k\delta^T_k}{\gamma_k\delta_k^T+\gamma^T_kS_k\gamma_k},\,\psi=1,\, \omega=0,$$ we have the BFGS formula.

The following code minimizes the Himmelblau function $$f(x_1,x_2)=(x_1^2+x_2-11)^2+(x_1+x_2^2-7)^2$$ using different Quasi-Newton methods.

				
//The Himmelblau function:
// f(x) = (x1^2 + x2 - 11)^2 + (x1 + x2^2 - 7)^2

val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),

double result = pow(x1 * x1 + x2 - 11, 2),
result += 12 * pow(x1 + x2 * x2 - 7, 2),
return result
}
override public int dimensionofDomain() {
return 2
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),

double w1 = x1 * x1 + x2 - 11,
double w2 = x1 + x2 * x2 -7,

double[] result = new double[2],
result[0] = 4 * w1 * x1 + 2 * w2,
result[1] = 2 * w1 + 4 * w2 * x2,
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 2
}
override public int dimensionofRange() {
return 2
}
}

val problem: C2OptimProblem = new C2OptimProblemImpl(f, g)
val solver : QuasiNewtonMinimizer RankOneMinimizer = new RankOneMinimizer(
1e-16,
15)
val soln1: RankOneMinimizer.solve(problem)
val xmin1: soln1.search(new DenseVector(new double[]{6, 6}))
val fmin1= f.evaluate(xmin1)
System.out.println(String.format("f(%s) = %.16f", xmin1.toString(), fmin1))

val solver : QuasiNewtonMinimizer DFPMinimizer = new DFPMinimizer(
1e-16,
15)
val soln2: DFPMinimizer.solve(problem)
val xmin2: soln2.search(new DenseVector(new double[]{6, 6}))
val fmin2= f.evaluate(xmin2)
System.out.println(String.format("f(%s) = %.16f", xmin2.toString(), fmin2))

val solver : QuasiNewtonMinimizer BFGSMinimizer = new BFGSMinimizer(
false,
1e-16,
15)
val soln3: BFGSMinimizer.solve(problem)
val xmin3: soln3.search(new DenseVector(new double[]{6, 6}))
val fmin3= f.evaluate(xmin3)
System.out.println(String.format("f(%s) = %.16f", xmin3.toString(), fmin3))

val solver : QuasiNewtonMinimizer HuangMinimizer = new HuangMinimizer(
0,
1,
0,
1,
1e-16,
15)
val soln4: HuangMinimizer.solve(problem)
val xmin4: soln4.search(new DenseVector(new double[]{6, 6}))
val fmin4= f.evaluate(xmin4)
System.out.println(String.format("f(%s) = %.16f", xmin4.toString(), fmin4))

val solver : QuasiNewtonMinimizer PearsonMinimizer = new PearsonMinimizer(
1e-16,
15)
val soln5: PearsonMinimizer.solve(problem)
val xmin5: soln5.search(new DenseVector(new double[]{6, 6}))
val fmin5= f.evaluate(xmin5)
System.out.println(String.format("f(%s) = %.16f", xmin5.toString(), fmin5))



The output is:

				
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000



The Quasi-Newton methods are much more efficient than any of the Conjugate-Drection methods. They take less iterations than the Conjugate-Direction methods and give etter accuracy and precision.