\(\underset{w}{\operatorname{argmin}}\space|| X_w - y ||^2_2\) \(\begin{bmatrix}- & x_1 & -\\- & x_2 & -\\- & \vdots & -\\- & x_n & -\end{bmatrix} * \begin{bmatrix} w_1 & w_2 & \dots & w_n \end{bmatrix} = \begin{bmatrix}x^T_1w_1\\x^T_2w_2\\\vdots\\x^T_nw_n\end{bmatrix}\)

Gradient Descent

Let’s consider the following optimization problem: \(\underset{x}{\operatorname{minimize}} \space f(x):=x^2\)

{
    "data": [
      {
        "x": [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
        "y": [25, 16, 9, 4, 1, 0, 1, 4, 9, 16, 25],
        "type": "line"
      }
    ]
}

\[f(x) = f(x_t) + \nabla f(x_t)^T (x-x_t) + \frac{\alpha}{2} + (x-x_t)^T (x-x_t)\]

\((x-x_t)^T (x-x_t) = \lVert(x-x_t)\rVert^2_2\) or the L2 norm squared. Therefore,

\[f(x) = f(x_t) + \nabla f(x_t)^T (x-x_t) + \frac{\alpha}{2} + \lVert(x-x_t)\rVert^2_2\]

Now take the derivative of this formula.
\(0 + \nabla f(x_t) + \frac{\alpha}{2} 2(x-x_t) * I = 0\)
\(\alpha(x-x_t) = -\nabla f(x_t)\)
\(x-x_t = -\frac{1}{\alpha}\nabla f(x_t)\)
\(x = x_t - \frac{1}{\alpha}\nabla f(x_t)\)

Gradient Descent Algorithm

Gradient Descent

Enjoy Reading This Article?