- What is the origin of Ganga?
- Was the commentator Sayana a Vaishnava?
- Craft 3 update failure
- Craft 2.6 incorrect string value SQL error for 4 byte UTF-8 characters when saving entries
- Adding file uploads to commerce product line items?
- Missing Users tab in CMS
- Half of me exist, rid-dle
- An Alternative puzzle
- How to find the ranges in Equal frequency/depth binning?
- How to use the same minmaxscaler used on the training data with new data?
- Why would 2 sets of similar training samples take significantly longer to train?
- GAN with Conv2D using TensorFlow - How do I define losses?
- Image resizing and padding for CNN
- Default value of mtry for random forests
- Joomla getting Jquery not defined after upgrading to (3.8.7)
- K2 articles don't give 404 error if the URL is incorrect
- Difference between Proterozoic Nagari Quartzite and Gulcheru Quartzite
- German permanent residence permit (Niederlassungserlaubnis) in printing process, can I change job without worrying about visa being revoked?
- Will it be possible to get French citizenship?
- Spanish visa for non-EU spouse
Erratic performance of Adam optimizer on object segmentation task
I've taked a pre-trained model (FCN8s) and finetuned it to my data solving a very challenging instance segmentation task. I've tried many optimizers from Caffe library, but only Adam seems to be able to avoid bad saddle points (I understand that local minimum is not a good term in deep learning).
The problem is, its behavior is hard to understand. What I mean is when I take, for example, SGD or Adagrad, and take their performance after 10K, 15K, 20K, etc iterations, it seems to be going in the same direction (not always good, of course). But you can kinda see the convergence. So when I run the model on the test data, a 20K algorithm usually outperforms a 10K and so on.
I don't have the same clarity with Adam. Although training error overall goes down, when I compare results after (say) 5K and 15K of training, they are truly baffling: after 15K an algorithm can do much worse than after, say, 12K and then one of a sudden improve after 3K more iterations. There does not