I am experimenting with different optimizers and learning rate schedulers to improve the performance of a MaskRCNN model (40K images input, instance segmentation, 1 class). The baseline I am comparing with is the:
PURPLE LINE which uses
SGD
StepLR(step_size=3, gamma=0.1)
My two attempts at improving is are the
YELLOW LINE
AdamW(lr=5e-5)
ReduceLROnPlateau(patience=3, factor=0.75
BLUE LINE
AdamW(lr=5e-5)
StepLR(step_size=3, gamma=0.1)
2x number of input images
While I'm excited to see the metrics go above the purple line, I'm very interested to show what is happening with the AP@ IoU 0.5 which start to wiggle and drop, where as AP@ IoU 0.75 seems more stable.
The losses for each Mask-RCNN metric are:
Could someone help me understand why this is happening and what it means?