As using batchnormalization in my neural network model, is tuning the momentum make a really huge difference? As from what I know, smaller batch_size should use higher value for momentum and lower value for larger batch_size. High momentum will have more 'lag' and slow learnig result and so on. How is it gonna effect the performance tho?