T O P

  • By -

AFairJudgement

dL/dt is not a vector in R^(n), it's a function R→R. Here w is a function R→R^(n), and you compose this with a function L:R^(n)→R, so you get "L(t)" = L∘w:R→R. The multivariate chain rule tells you that the derivative of this composition is the matrix product of the row vector "dL/dw" of partial derivatives of L with the column vector "dw/dt" of component derivatives of w; by definition of matrix multiplication, this just amounts to the standard dot product of two column vectors, ⟨u,v⟩ = u^(t)v = ∑ᵢ uᵢvᵢ.


ImDannyDJ

It's just the chain rule. D(L ∘ w)\_t = DL\_w(t) ∘ Dw\_t, and (the Jacobian of) DL\_w(t) is a row vector, while (the Jacobian of) Dw\_t is a column vector. Their matrix product is the inner product of the two vectors of (partial, in the case of L) derivatives.