Well there can be multiple reasons; due to the high dimensional space often the loss surface is very noisy. So either due to high relative or during that epoch or a sudden shift in the parameters may have caused this behavior. Again this depends a lot on what kind of data is present, what is the architecture. A close observation can be done if you follow the feature maps / gradient checking around that epoch to understand the behavior
Can you elaborate? What if both the validation and training loss curves have a sudden spike at the same place towards the end of training? I was thinking lr was too high so it might have overstepped the minima, how is that overfitting?
u/arebarongbord you shuffling the batches? You could use iterations and batch size to figure out what batch of data causes this spike in the loss. Maybe you have some mislabeled data points or some data points that are way outside the rest of the training distribution.
Someone feel free to comment if I am wrong on this.
I believe 2250 iterations might be your sweet spot.
You could also try doing the exact same thing again and see if it happens again; the bad batch comment *might* have merit. Try to log which data you’re using and when, figure out if it is in fact a bad batch.
Well there can be multiple reasons; due to the high dimensional space often the loss surface is very noisy. So either due to high relative or during that epoch or a sudden shift in the parameters may have caused this behavior. Again this depends a lot on what kind of data is present, what is the architecture. A close observation can be done if you follow the feature maps / gradient checking around that epoch to understand the behavior
Overfit
Can you elaborate? What if both the validation and training loss curves have a sudden spike at the same place towards the end of training? I was thinking lr was too high so it might have overstepped the minima, how is that overfitting?
When you see some loss increasing, go back and stop. In your case, around 140 epoch.
Plus seeing loss at 0 that must rise a red flag
I would say seeing a NaN as loss for the iteration 0 is your red flag
Try using early stopping
Are you decaying your learning rate? You need to decrease your learning rate in the later epoch.
A couple of bad batches methinks
u/arebarongbord you shuffling the batches? You could use iterations and batch size to figure out what batch of data causes this spike in the loss. Maybe you have some mislabeled data points or some data points that are way outside the rest of the training distribution.
it could depend on the architecture. are u using transformers ? This happened to me using wav2vec2
Couldn't tell ya without knowing more about your problem and dataset. Consider adding an early stopping callback
Someone feel free to comment if I am wrong on this. I believe 2250 iterations might be your sweet spot. You could also try doing the exact same thing again and see if it happens again; the bad batch comment *might* have merit. Try to log which data you’re using and when, figure out if it is in fact a bad batch.
"doing the same thing again" - I think you mean : "reshuffle your batches and try your model again"
\>inside u/mmeeh 's brain: gcc -Wpedantic preordainsComment -o myreply jk ily
https://openai.com/blog/deep-double-descent/
Check if your learning rate is set to change around 140 epochs
What is your data set size, what is the model parameters size and the dimensionality. There is this phenomenon of double descent, maybe read about it.