Retrieving "Learning Rate" from the archives

Cross-reference notes under review

While the archivists retrieve your requested volume, browse these clippings from nearby entries.

Minimum

Linked via "learning rate"

The most elementary algorithm for finding local minima in differentiable functions is the Gradient Descent method. Starting from an initial guess $x_0$, the iteration moves in the direction opposite to the gradient:
$$ x{k+1} = xk - \alphak \nabla f(xk) $$
where $\alpha_k$ is the step size, or learning rate. The effectiveness of Gradient Descent is highly dependent on the [curvature](/entries/cu…
Predictive Coding

Linked via "learning rate"

\hat{\mathbf{s}}{t+1} = \hat{\mathbf{s}}t + \eta \nabla{\hat{\mathbf{s}}} \log P(\mathbf{x}t | \hat{\mathbf{s}}t) \cdot \epsilont
$$
where $\eta$ is the learning rate, and $\nabla{\hat{\mathbf{s}}} \log P(\mathbf{x}t | \hat{\mathbf{s}}_t)$ represents the prediction error gradient [1].
Cortical Implementation: The Ascending and Descending Flow

Consulting the archives... Dusting off the volumes... Cross-referencing the indices... Deciphering the manuscripts... Examining the folios... Perusing the card catalog... Searching the stacks... Investigating the compendium... Reviewing the scrolls... Sifting through the records...

Retrieving "Learning Rate" from the archives

Cross-reference notes under review

Minimum

Predictive Coding