Multivariate Finite Difference

Introduction to Data Science Numerical Differentiation and Integration Multivariate Finite Difference

The partial derivative of a multivariate function is the derivative of the function with respect to one of the variables that defines the function while the other variables being taken as constant. For example, the partial derivative of a bivariate function with respect to \( x \) is given as

\( \displaystyle D_xf=f_x(x,y)=\frac{\partial f(x,y)}{\partial x} \)

The above equation can be computed numerically just as any of the univariate function but with a change, as we impose a rid on a plane or higher dimensional domain instead of an axis.

A second order partial differential equation can be written as

\( \displaystyle D_{xx}f=f_{xx}(x,y)=\frac{\partial^2f(x,y)}{\partial x^2} \)

A second order mixed partial equation is when we first differentiate with respect to one variable and then the other. For exaple, a second order mixed partial derivative with respect to \( x \) and then \( y \) can be written as

\( \displaystyle D_{xy}f=f_{xy}(x,y)=\frac{\partial}{\partial y}\frac{\partial f(x,y)}{\partial x} \)

A highert order partial and mixed derivaties looks like

\( \displaystyle \frac{\partial^{i+j+k}f}{\partial x^i\partial y^j\partial z^k}=\partial_x^i\partial_y^j\partial_z^kf \)

For example,

\( z=f(x,y)=x^2+xy+y^2 \)

The graph of this function defines a surface in Euclidean space. To every point on this surface, there are infinite number of tangent lines. Partial differentiation is the act of choosing one of these lines and finding the slope. Usually, the lines of most interest are those that are parallel o the \( xz \)-plane, when holding \( y \) constant, and those that are parallel to the \( yz \)-plane while holding \( x \) constant.

To find the slope of the line tangent to the function at (1,1) and parallel to the \( xz \)-plane, we treat \( y \) as a constant. By finding the derivative of the equation while assuming that \( y \) is a constant, we find that the slope of \( f \) at the point \( (x,y) \) is

\( \displaystyle \frac{\partial z}{ \partial x}=2x+y \)

It evaluates to 3 at point (1,1). The following code snippet sloves this problem.

				
					// f = x^2 + xy + y^2
val f: RealScalarFunction f = new AbstractBivariateRealFunction() {
override public double evaluate(double x, double y) {
}
};

// df/dx = 2x + y 
MultivariatefinitiDifference dx = new MultivariateFiniteDifference(f, new int[]{1})
system.out.println(String.format("Dxy(1,1) %f", dx.evaluate(new DenseVector(1, 1))));

The output is:

				
					Dxy(1,1) = 3.000000

Gradient

There are number of important and widely used vectors and matrices of mixed partial derivaties. They are gradient vector, Jacobian matrix and Hessian matrix.
The gradient function, \( \nabla f \), of a multivariate real-valued function, \( f \), is a vector-valued function or a vector field from \( \mathbf{R^n} \) to \( \mathbf{R^n} \). It takes a point \( \mathbf{x}=(x_1,…,x_n) \) in \( \mathbf{R^n} \) and it outputs a vector in \( \mathbf{R^n} \). The gradient at any point, \( \nabla f(x) \) is a vector. It has components that are the partial derivaties of \( f \) at \( \mathbf{x} \). Mathemnatically, it can be represented as
\( \nabla f(\mathbf{x})= \begin{bmatrix}
\frac{\partial f}{\partial x_1}(x)\\ \vdots\\
\frac{\partial f}{\partial x_n}(x)
\end{bmatrix}\)

Each components, \( \frac{\partial f}{\partial x_i}(x),\,x=1,…,n \) is the partial derivative of the function along an axis and is the rate of change in that direction. So, the gradient vector is like the first order derivative or the slope or the tangent in the case of a univariate function. It can be interpreted as the direction and rate of fastest increase. If the gradient of a function is non-zero at a point \( \mathbf{x} \), the direction of the gradient is the direction in which the function increases most quickly from \( \mathbf{x} \), and the magnitude of the gradient is the rate of increase in that direction, which is the greatest absolute directional derivative. Furthermore, the gradient is the zero vector at a point if and only if it is a stationary point, where the derivativ vanishes. The gradient thus plays an important and fundamental role in optimization theory.

For an example, consider a room where the temperature is given by a scalar field \( T \). Now the multivariate function or the field takes a point \( (x,y,z) \) and gives a temperature value \( T(x,y,z) \). At each point in the room, the gradient of \( T \) at that point shows the direction in which the temperature rises most quickly, moving away from \( (x,y,z) \). The magnitude of the gradient determines how fast the temperature rises in that direction. More generally, if a multivariate function \( f \) is differentiable, then the dot product between \( \nabla f \) an a unit vector \( \nu \) is the slope or rate of change of the function in the direction of \( \nu \), called he directional derivative of \( f\nu \). The multivariate version of Taylor expansion shows that the best linear approximation of a function can be expressed in terms of the gradient.

\( f(x)\approx f(x_0)+\nabla f(x_0).(x-x_0) \)

\( x \) and \( x_0 \) are points in the \( \mathbf{R^n} \) space. \( f \) maps \( \mathbf{R^n}\rightarrow\mathbf{R} \) and \( \nabla f:\mathbf{R^n}\rightarrow\mathbf{R^n} \). The dot product in the last term gives a real number.

Consider this function for example,

\( f(x,y)=x\,exp(-(x^2+y^2)) \)

The following code computes the gradients and gradient function of the function.

				
					// f = x * exp(-(x^2 + y^2))
val f: RealScalarFunction f = new AbstractBivariateRealFunction() {
override public double evaluate(double x, double y) {
   return x * exp(-(x * x + y * y));
   }
};

Vector x1 = new DenseVector(0, 0);
Vector g1_0 = new Gradient(f, x1);
System.out.println(String.format("gradient at %s = %s", x1, g1_0));

GradientFunction df = new GradientFunction(f);
Vector g1_1 = df.evaluate(x1);
System.out.println(String.format("gradient at %s = %s", x1, g1_1));

Vector x2 = new DenseVector(-1, 0);
Vector g2 = df.evaluate(x2);
System.out.println(String.format("gradient at %s = %s", x2, g2));

Vector x3 = new DenseVector(1, 0);
Vector g3 = df.evaluate(x3);
System.out.println(String.format("gradient at %s = %s", x3, g3));

The output is:

				
					gradient at [0.000000, 0.000000] = [1.000000, 0.000000]
gradient at [0.000000, 0.000000] = [1.000000, 0.000000]
gradient at [-1.000000, 0.000000] = [-0.367879, 0.000000]
gradient at [1.000000, 0.000000] = [0.367879, 0.000000]

Jacobian

As we generalize the derivative of a scalar-valued function of a single variable to the gradient of a scalar-valued function in several variables, we can further generalize the concept to a vector-valued function in several variables. Suppose a vector-valued function \( f:\mathbf{R^n}\rightarrow\mathbf{R^m} \), which takes a point in \( R^n \mathbf{x}=(x_1,…,x_n) \) and outputs a vector \( f(x) \) in \( \mathbf{R^m} \). The Jacobian matrix of \( f \) is defined to be an \( m\times n \) matrix, denoted by \( J \), whose \( (i,j)^{th} \) entry is
\( \displaystyle J_{ij}=\frac{\partial f_i}{\partial x_j} \)

Explicitly, the above equation can be given as
\( \displaystyle J=\begin{bmatrix}
\frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_n}
\end{bmatrix} \)

\( \displaystyle J=\begin{bmatrix}
\nabla^Tf_1\\
\vdots\\

\nabla^Tf_m

\end{bmatrix} \)

\( \displaystyle J=\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_2}{\partial x_n}\\
\vdots & \ddots & \vdots\\
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix} \)

\( \nabla^Tf_i \) is the transpose row vector of the gradient of the \( i^{th} \) component.

The Jacobian is the matrix of all its first-order partial derivatives. It represents the differential of \( f \) at every point where \( f \) is differentiable. It is the best linear approximation of the changes of \( f \) in a vicinity of \( x \). So we can say that, the best approximation for \( f(y) \) for all the points of \( y \) is close to \( x \). We can express mathematically in Taylor expansion as
\( f(y)=f(x)+J_f(y-x)+O(||y-x||) \)
The error term approaches zero much faster than the distance between \( y \) and \( x \) similarly as \( y \) approaches \( x \). So, summarizing we can say that the Jacobian can be regarded as the “first-ordered derivative” of a vector-valued function of several variables.

If \( m=n \), then \( f \) is a function from \( \mathbf{R^n} \) to itself and the Jacobian matrix is the square matrix. So here, we can compute the determinant called as the Jacobian determinant. The Jacobian determinant is sometimes simply referred as “the Jacobian”. The Jacobian determinant as a given point gives important information about the local behavior of \( f \) near that point. For instance, the inverse function theorem says that the continuously differentiable function \( f \) is invertible near a point \( x\epsilon \mathbf{R}^n \) if the Jacobian determinant at \( x \) is non-zero. Furthermore, if the Jacobian determinant at \( x \) is positive, then \( f \) preserves orientation near \( x \); if it is negative, \( f \) reverses orientation. The absolute value of the Jacobian determinant at \( x \) gives us the factor by which the function \( f \) expands or shrinks volumes near \( x \).

Consider the following function \( \mathbf{F}:\mathbf{R^2}\rightarrow\mathbf{R^2} \)

\( \mathbf{F} \Big(\begin{bmatrix}
x\\
y
\end{bmatrix}=\begin{bmatrix}
f_1(x,y)\\
f_2(x,y)
\end{bmatrix}=\begin{bmatrix}
x^2y\\
5x+siny
\end{bmatrix}\Big) \)

The Jacobian is
\( \displaystyle \mathbf{J_F}(x,y)=\begin{bmatrix}
\frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y}\\
\frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y}
\end{bmatrix}=\begin{bmatrix}
2xy & x^2\\
5 & cosy
\end{bmatrix} \)

The Jacobian determinant is
\( ||\mathbf{J_F}(x,y)||=2xy\,cosy-5x^2 \)
The following code solves this example.

				
					val f: RealVectorFunction F = new RealVectorFunction() {
 override public Vector evaluate(Vector v) {
  double x = v.get(1);
  double y = v.get(2);
   
  double f1 = x * x * y;
  double f2 = 5 * x + sin(y);
   
  return new DenseVector(f1, f2);
 }
 
 override public int dimensionOfDomain() {
  return 2;
 }
 
 override public int dimensionOfDomain() {
  return 2;
 }
};

Vector x0 = new DenseVector(0, 0);
Matrix J00 = new Jacobian(F, x0);
System.out.println(String.format("the Jacobian at %s = %s, the det = %f", 
                   x0, 
                   J00, 
                   MatrixMeasure.det(J00)));

RntoMatrix J = new JacobianFunction(F); // [2xy, x^2], [5, cosy]
Matrix J01 = J.evaluate(x0);
System.out.println(String.format("the Jacobian at %s = %s, the det = %f", 
                   x0, 
                   J01, 
                   MatrixMeasure.det(J01)));
                   
Vector x1 = new DenseVector(1, PI);
Matrix J1 = J.evaluate(x1);
System.out.println(String.format("the Jacobian at %s = %s, the det = %f", 
                   x1, 
                   J1, 
                   MatrixMeasure.det(J1)));

The output is:

				
					the Jacobian at [0.000000, 0.000000] = 2x2
   [,1] [,2]
[1,] 0.000000, 0.000000
[2,] 5.000000, 1.000000, , the det = 0.000000

the Jacobian at [0.000000, 0.000000] = 2x2
   [,1] [,2]
[1,] 0.000000, 0.000000
[2,] 5.000000, 1.000000, , the det = 0.000000

the Jacobian at [1.000000, 3.141593] = 2x2
   [,1] [,2]
[1,] 6.283185, 1.000000
[2,] 5.000000, -1.000000, , the det = -11.283185

Hessian

We can generalize the concept of second-order derivative of a univariate function to a multivariate real-valued function. Here, we first compute the gradient of the scalar function, which is just taking the first order derivative. Then we compute the Jacobian of the gradient function by taking the second order derivative. Now we call the Jacobian matrix as the Hessian matrix. Hessian is the square matrix of the second order partial derivatives of a scalar-valued function or a scalar field. It describes the local curvature of a multivariate function. Mathematically,
\( H(f(x))=J(\nabla f(x)) \)
Now, suppose \( f:\mathbf{R^n}\rightarrow\mathbf{R} \). Now, if all second order derivatives of \( f \) exist and are continuous over the domain of the function, then the Hessian matrix \( H \) of \( f \) is a square \( n\times n \) matrix, usually defined and arranged as the following
\( \displaystyle H_f=\begin{bmatrix}
\frac{\partial^2f}{\partial x_1^2} & \frac{\partial^2f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2f}{\partial x_1\partial x_n}\\
\frac{\partial^2f}{\partial x_2\partial x_1} & \frac{\partial^2f}{\partial x^2_2} & \cdots & \frac{\partial^2f}{\partial x_2\partial x_n}\\
\vdots & vdots & \ddots & \vdots\\
\frac{\partial^2f}{\partial x_n\partial x_1} & \frac{\partial^2f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2f}{\partial^2_n} \end{bmatrix} \)

In short, we can express the above matrix-equation as
\( \displaystyle (H_f)_{i,j}=\frac{\partial^2f}{\partial x_i\partial x_j} \)

The Hessian matrix is a symmetric matrix. The determinant of the Hessian matrix is called as the Hessian determinant. Hessian matrices are used in solving optimization problems in Newton-type problems and examples as they are the coefficient of the quadratic term of local Taylor expansion of a function. Mathematically,
\( \displaystyle y=f(x+\Delta x)\approx f(x)+\nabla f(x)\Delta x+\frac{1}{2}\Delta x^TH(x)\Delta x \)

The following sample code computes the Hessian and the Hessian function of \( f(x,y)=xy \).

				
					val f: RealScalarFunction F = new AbstractBivariateRealFunction() {
 override public double evaluate(double x, double y) {
  return x * y; // f = xy 
 }
};

Vector x1 = new DenseVector(1, 1);
Hessian H1 = new Hessian(f, x1);
System.out.println(String.format("the Hessian at %s = %s, the det = %f", 
                   x1, 
                   H1, 
                   MatrixMeasure.det(H1)));

Vector x2 = new DenseVector(0, 0);
Hessian H2 = new Hessian(f, x2);
System.out.println(String.format("the Hessian at %s = %s, the det = %f", 
                   x2, 
                   H2, 
                   MatrixMeasure.det(H2)));
                   
RntoMatrix H = new HessianFunction(f);
Matrix Hx1 = H.evaluate(x1);
System.out.println(String.format("the Hessian at %s = %s, the det = %f", 
                   x1, 
                   Hx1, 
                   MatrixMeasure.det(Hx1)));
Matrix Hx2 = H.evaluate(x2);
System.out.println(String.format("the Hessian at %s = %s, the det = %f", 
                   x2, 
                   Hx2, 
                   MatrixMeasure.det(Hx2)));

The output is:

				
					the Hessian at [1.000000, 1.000000] = 2x2
   [,1] [,2]
[1,] 0.000000, 1.000000
[2,] 1.000000, 0.000000, , the det = -0.999999

the Hessian at [0.000000, 0.000000] = 2x2
   [,1] [,2]
[1,] 0.000000, 1.000000
[2,] 1.000000, 0.000000, , the det = -1.000000

the Hessian at [1.000000, 1.000000] = 2x2
   [,1] [,2]
[1,] 0.000000, 1.000000
[2,] 1.000000, 0.000000, , the det = -0.999999

the Hessian at [0.000000, 0.000000] = 2x2
   [,1] [,2]
[1,] 0.000000, 1.000000
[2,] 1.000000, 0.000000, , the det = -1.000000

Previous Topic

Back to Lesson

Next Topic