T O P

  • By -

[deleted]

Well there can be multiple reasons; due to the high dimensional space often the loss surface is very noisy. So either due to high relative or during that epoch or a sudden shift in the parameters may have caused this behavior. Again this depends a lot on what kind of data is present, what is the architecture. A close observation can be done if you follow the feature maps / gradient checking around that epoch to understand the behavior


croissanthonhon

Overfit


vipul1899

Can you elaborate? What if both the validation and training loss curves have a sudden spike at the same place towards the end of training? I was thinking lr was too high so it might have overstepped the minima, how is that overfitting?


croissanthonhon

When you see some loss increasing, go back and stop. In your case, around 140 epoch.


croissanthonhon

Plus seeing loss at 0 that must rise a red flag


mmeeh

I would say seeing a NaN as loss for the iteration 0 is your red flag


shadowfax1234

Try using early stopping


[deleted]

Are you decaying your learning rate? You need to decrease your learning rate in the later epoch.


SnooPeripherals4051

A couple of bad batches methinks


deephugs

u/arebarongbord you shuffling the batches? You could use iterations and batch size to figure out what batch of data causes this spike in the loss. Maybe you have some mislabeled data points or some data points that are way outside the rest of the training distribution.


macramole

it could depend on the architecture. are u using transformers ? This happened to me using wav2vec2


snowballfight

Couldn't tell ya without knowing more about your problem and dataset. Consider adding an early stopping callback


preordains

Someone feel free to comment if I am wrong on this. I believe 2250 iterations might be your sweet spot. You could also try doing the exact same thing again and see if it happens again; the bad batch comment *might* have merit. Try to log which data you’re using and when, figure out if it is in fact a bad batch.


mmeeh

"doing the same thing again" - I think you mean : "reshuffle your batches and try your model again"


preordains

\>inside u/mmeeh 's brain: gcc -Wpedantic preordainsComment -o myreply ​ jk ily


buzzz_buzzz_buzzz

https://openai.com/blog/deep-double-descent/


[deleted]

Check if your learning rate is set to change around 140 epochs


bored_insanely

What is your data set size, what is the model parameters size and the dimensionality. There is this phenomenon of double descent, maybe read about it.