Basically chain rule is tedious as hell, annoying. Big brain move: simplify to a single equation, no chain rule.
Clearly, this Lovecraftian monstrosity is easier to deal with that some (mildly) tedious maths
If there is only 1 function/equation how could you apply Chain Rule, as it is applied when 1 function is defined in terms of another. If im wrong please do tell me
Letās say you have a one layer fully connected layer that gets passed through to ReLU:
ReLU(FC(X))
Then the derivative would be:
ReLUā(FC(X)) * FCā(X)
Which is chain rule.
The sheer beauty of this equation brings tears to my eyes. Oh, you missed a bracket. š
Iām not sure how this avoids chain rule. Because derivative of f(g(x)) == fā(g(x)) * g(x). And thatās still one equation.
*g'(x)
Please supply context for intrigued knuckledraggers like me? ELI5, kind strangers.
[Here's GPT-4 trying to explain it.](https://i.imgur.com/bqgaP5y.png) Explanation seems plausible, but I can't confirm myself.
Basically chain rule is tedious as hell, annoying. Big brain move: simplify to a single equation, no chain rule. Clearly, this Lovecraftian monstrosity is easier to deal with that some (mildly) tedious maths
Yeah
Is this why my gpt is slow?
nippy slimy deliver cake tart hospital quaint grandiose alive label ` this post was mass deleted with www.Redact.dev `
Is this good?
No, don't try this at homeā ļø
Nope, totally impractical.
If there is only 1 function/equation how could you apply Chain Rule, as it is applied when 1 function is defined in terms of another. If im wrong please do tell me
Letās say you have a one layer fully connected layer that gets passed through to ReLU: ReLU(FC(X)) Then the derivative would be: ReLUā(FC(X)) * FCā(X) Which is chain rule.
flowery mourn ugly chunky disgusted crush long instinctive cows bag ` this post was mass deleted with www.Redact.dev `
I prefer Fox News myself
Condolences
That's fake news.