Gradient Descent

The gradient descent algorithm is an optimization algorithm for finding a local minimum of a scalar-valued function near a starting point, taking successive steps in the direction of the negative of the gradient.

For a function \(f: \mathbb{R}^n \to \mathbb{R}\), starting from an initial point \(\mathbf{x}_0\), the method works by computing successive points in the function domain

\[ \mathbf{x}_{n + 1} = \mathbf{x}_n - \eta \left( \nabla f \right)_{\mathbf{x}_n} \; ,\]

where \(\eta > 0\) is a small step size and \(\left( \nabla f \right)_{\mathbf{x}_n}\) is the gradient of \(f\) evaluated at \(\mathbf{x}_n\). The successive values of the function

\[ f(\mathbf{x}_0) \ge f(\mathbf{x}_1) \ge f(\mathbf{x}_2) \ge \dots\]

keep decreasing and the sequence \(\mathbf{x}_n\) usually converges to a local minimum.

In practice, using a fixed step size \(\eta\) yields suboptimal performance and there are adaptive algorithms that select a locally optimal step size \(\eta\) on each iteration.

The following code implements gradient descent with fixed step size, stopping when the norm of the gradient falls below a given threshold.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15:

open DiffSharp

// Shorthand for dsharp.tensor
let t x = dsharp.tensor x

// Gradient descent
//   f: function
//   x0: starting point
//   eta: step size
//   epsilon: threshold
let rec gradientDescent f x eta epsilon =
    let g = dsharp.grad f x
    // TODO: this should be norm
    if g.sum() < epsilon then x 
    else gradientDescent f (x - eta * g) eta epsilon

Let's find a minimum of \(f(x, y) = (\sin x + \cos y)\).

1: 
2: 
3: 
4: 
5: 
6:

let inline f (x:Tensor) =  sin x.[0] + cos x.[1]

// Find the minimum of f
// Start from (1, 1), step size 0.9, threshold 0.00001
let xmin = gradientDescent f (t [1.; 1.]) (t 0.9) (t 0.00001)
let fxmin = f xmin

val xmin : Tensor = tensor [ -1.570790759; 3.141591964 ]
val fxmin : Tensor = tensor -2.0

A minimum, \(f(x, y) = -2\), is found at \((x, y) = \left(-\frac{\pi}{2}, \pi\right)\).

namespace DiffSharp

val t : x:'a -> Tensor

val x : 'a

type dsharp = DiffSharp

static member DiffSharp.tensor : value:obj * ?dtype:Dtype * ?device:Device * ?backend:Backend -> Tensor

val gradientDescent : f:(Tensor -> Tensor) -> x:Tensor -> eta:Tensor -> epsilon:Tensor -> Tensor

val f : (Tensor -> Tensor)

val x : Tensor

val eta : Tensor

val epsilon : Tensor

val g : Tensor

static member DiffSharp.grad : f:(Tensor -> Tensor) -> x:Tensor -> Tensor

member Tensor.sum : ?dtype:Dtype -> Tensor
member Tensor.sum : dim:int * ?keepDim:bool * ?dtype:Dtype -> Tensor

val f : x:Tensor -> Tensor

Multiple items
union case Tensor.Tensor: primalRaw: Backends.RawTensor -> Tensor

--------------------
type Tensor =
  | Tensor of primalRaw: RawTensor
  | TensorF of primal: Tensor * derivative: Tensor * nestingTag: uint32
  | TensorR of primal: Tensor * derivative: Tensor ref * parentOp: TensorOp * fanout: uint32 ref * nestingTag: uint32
    interface IConvertible
    interface IEnumerable
    interface IEnumerable<Tensor>
    interface IEquatable<Tensor>
    interface IComparable
    override Equals : other:obj -> bool
    override GetHashCode : unit -> int
    member GetSlice : i0:int -> Tensor
    member private GetSlice : bounds:int [,] -> Tensor
    member GetSlice : i0:int * i1:int -> Tensor
    ...

val sin : value:'T -> 'T (requires member Sin)

val cos : value:'T -> 'T (requires member Cos)

val xmin : Tensor

val fxmin : Tensor

val printf : format:Printf.TextWriterFormat<'T> -> 'T