Gradient Descent
The gradient descent algorithm is an optimization algorithm for finding a local minimum of a scalar-valued function near a starting point, taking successive steps in the direction of the negative of the gradient.
For a function \(f: \mathbb{R}^n \to \mathbb{R}\), starting from an initial point \(\mathbf{x}_0\), the method works by computing successive points in the function domain
\[ \mathbf{x}_{n + 1} = \mathbf{x}_n - \eta \left( \nabla f \right)_{\mathbf{x}_n} \; ,\]
where \(\eta > 0\) is a small step size and \(\left( \nabla f \right)_{\mathbf{x}_n}\) is the gradient of \(f\) evaluated at \(\mathbf{x}_n\). The successive values of the function
\[ f(\mathbf{x}_0) \ge f(\mathbf{x}_1) \ge f(\mathbf{x}_2) \ge \dots\]
keep decreasing and the sequence \(\mathbf{x}_n\) usually converges to a local minimum.
In practice, using a fixed step size \(\eta\) yields suboptimal performance and there are adaptive algorithms that select a locally optimal step size \(\eta\) on each iteration.
The following code implements gradient descent with fixed step size, stopping when the norm of the gradient falls below a given threshold.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: |
|
Let's find a minimum of \(f(x, y) = (\sin x + \cos y)\).
1: 2: 3: 4: 5: 6: |
|
|
A minimum, \(f(x, y) = -2\), is found at \((x, y) = \left(-\frac{\pi}{2}, \pi\right)\).
member Tensor.sum : dim:int * ?keepDim:bool * ?dtype:Dtype -> Tensor
union case Tensor.Tensor: primalRaw: Backends.RawTensor -> Tensor
--------------------
type Tensor =
| Tensor of primalRaw: RawTensor
| TensorF of primal: Tensor * derivative: Tensor * nestingTag: uint32
| TensorR of primal: Tensor * derivative: Tensor ref * parentOp: TensorOp * fanout: uint32 ref * nestingTag: uint32
interface IConvertible
interface IEnumerable
interface IEnumerable<Tensor>
interface IEquatable<Tensor>
interface IComparable
override Equals : other:obj -> bool
override GetHashCode : unit -> int
member GetSlice : i0:int -> Tensor
member private GetSlice : bounds:int [,] -> Tensor
member GetSlice : i0:int * i1:int -> Tensor
...
