machine learning - Understanding convolutional layers shapes

Question

Welcome To Ask or Share your Answers For Others

machine learning - Understanding convolutional layers shapes

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I've been reading about convolutional nets and I've programmed a few models myself. When I see visual diagrams of other models it shows each layer being smaller and deeper than the last ones. Layers have three dimensions like 256x256x32. What is this third number? I assume the first two numbers are the number of nodes but I don't know what the depth is.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

564 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:29:51+0000

TLDR; `256x256x32` refers to the layer's output shape rather than the layer itself.

There are many articles and posts out there explaining how convolution layers work. I'll try to answer your question without going into too many details, just focusing on shapes.

Assuming you are working with 2D convolution layers, your input and output will both be three-dimensional. That is, without considering the batch which would correspond to a 4th axis... Therefore, the shape of a convolution layer input will be (c, h, w) (or (h, w, c) depending on the framework) where c is the number of channels, h is the width of the input and w the width. You can see it as a c-channel hxw image. The most intuitive example of such input is the input of the first convolution layer of your convolutional neural network: most likely an image of size hxw with c channels for example c=1 for greyscale or c=3 for RGB...

What's important is that for all pixels of that input, the values on each channel gives additional information on that pixel. Having three channels will give each pixel ('pixel' as in position in the 2D input space) a richer content than having a single. Since each pixel will be encoded with three values (three channels) vs. a single one (one channel). This kind of intuition about what channels represent can be extrapolated to a higher number of channels. As we said an input can have c channels.

Now going back to convolution layers, here is a good visualization. Imagine having a 5x5 1-channel input. And a convolution layer consisting of a single 3x3 filter (i.e. kernel_size=3)

	input	filter	convolution	output
shape	`(1, 5, 5)`	`(3, 3)`		`(3,3)`
representation

Categories

machine learning - Understanding convolutional layers shapes

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

TLDR; `256x256x32` refers to the layer's output shape rather than the layer itself.

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

machine learning - Understanding convolutional layers shapes

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

TLDR; 256x256x32 refers to the layer's output shape rather than the layer itself.

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

TLDR; `256x256x32` refers to the layer's output shape rather than the layer itself.