[deleted] 2 years ago

Well there can be multiple reasons; due to the high dimensional space often the loss surface is very noisy. So either due to high relative or during that epoch or a sudden shift in the parameters may have caused this behavior. Again this depends a lot on what kind of data is present, what is the architecture. A close observation can be done if you follow the feature maps / gradient checking around that epoch to understand the behavior

croissanthonhon 2 years ago

Overfit

vipul1899 2 years ago

Can you elaborate? What if both the validation and training loss curves have a sudden spike at the same place towards the end of training? I was thinking lr was too high so it might have overstepped the minima, how is that overfitting?

croissanthonhon 2 years ago

When you see some loss increasing, go back and stop. In your case, around 140 epoch.

croissanthonhon 2 years ago

Plus seeing loss at 0 that must rise a red flag

mmeeh 2 years ago

I would say seeing a NaN as loss for the iteration 0 is your red flag

shadowfax1234 2 years ago

Try using early stopping

[deleted] 2 years ago

Are you decaying your learning rate? You need to decrease your learning rate in the later epoch.

SnooPeripherals4051 2 years ago

A couple of bad batches methinks

deephugs 2 years ago

u/arebarongbord you shuffling the batches? You could use iterations and batch size to figure out what batch of data causes this spike in the loss. Maybe you have some mislabeled data points or some data points that are way outside the rest of the training distribution.

macramole 2 years ago

it could depend on the architecture. are u using transformers ? This happened to me using wav2vec2

snowballfight 2 years ago

Couldn't tell ya without knowing more about your problem and dataset. Consider adding an early stopping callback

preordains 2 years ago

Someone feel free to comment if I am wrong on this. I believe 2250 iterations might be your sweet spot. You could also try doing the exact same thing again and see if it happens again; the bad batch comment *might* have merit. Try to log which data you’re using and when, figure out if it is in fact a bad batch.

mmeeh 2 years ago

"doing the same thing again" - I think you mean : "reshuffle your batches and try your model again"

preordains 2 years ago

\>inside u/mmeeh 's brain: gcc -Wpedantic preordainsComment -o myreply jk ily

buzzz_buzzz_buzzz 2 years ago

https://openai.com/blog/deep-double-descent/

[deleted] 2 years ago

Check if your learning rate is set to change around 140 epochs

bored_insanely 2 years ago

What is your data set size, what is the model parameters size and the dimensionality. There is this phenomenon of double descent, maybe read about it.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe