machine learning - How should "BatchNorm" layer be used in caffe?

Question

Welcome To Ask or Share your Answers For Others

machine learning - How should "BatchNorm" layer be used in caffe?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I am a little confused about how should I use/insert "BatchNorm" layer in my models.
I see several different approaches, for instance:

ResNets: `"BatchNorm"`+`"Scale"` (no parameter sharing)

"BatchNorm" layer is followed immediately with "Scale" layer:

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "bn2a_branch1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "scale2a_branch1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
}

cifar10 example: only `"BatchNorm"`

In the cifar10 example provided with caffe, "BatchNorm" is used without any "Scale" following it:

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}

cifar10 Different `batch_norm_param` for `TRAIN` and `TEST`

batch_norm_param: use_global_scale is changed between TRAIN and TEST phase:

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}

So what should it be?

How should one use"BatchNorm" layer in caffe?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

650 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:27:14+0000

If you follow the original paper, the Batch normalization should be followed by Scale and Bias layers (the bias can be included via the Scale, although this makes the Bias parameters inaccessible). use_global_stats should also be changed from training (False) to testing/deployment (True) - which is the default behavior. Note that the first example you give is a prototxt for deployment, so it is correct for it to be set to True.

I'm not sure about the shared parameters.

I made a pull request to improve the documents on the batch normalization, but then closed it because I wanted to modify it. And then, I never got back to it.

Note that I think lr_mult: 0 for "BatchNorm" is no longer required (perhaps not allowed?), although I'm not finding the corresponding PR now.

Categories

machine learning - How should "BatchNorm" layer be used in caffe?

ResNets: `"BatchNorm"`+`"Scale"` (no parameter sharing)

cifar10 example: only `"BatchNorm"`

cifar10 Different `batch_norm_param` for `TRAIN` and `TEST`

So what should it be?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

machine learning - How should "BatchNorm" layer be used in caffe?

ResNets: "BatchNorm"+"Scale" (no parameter sharing)

cifar10 example: only "BatchNorm"

cifar10 Different batch_norm_param for TRAIN and TEST

So what should it be?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

ResNets: `"BatchNorm"`+`"Scale"` (no parameter sharing)

cifar10 example: only `"BatchNorm"`

cifar10 Different `batch_norm_param` for `TRAIN` and `TEST`