Butcher-Resnet50 In-depth analysis, detailed explanation, in-depth understanding

Updated to 4 months ago

Background

ResNet-50 Side Output Shape
Assuming an input of 352, the
output2 = 256x88x88
output3 = 512x44x44
output4 = 1024x22x22
output5 = 2048x11x11

VGG-16 Side Output Shape
Assuming an input of 352, the
output1 = 64x320x320

output2 = 128x160x160
output3 = 256x88x88
output4 = 512x44x44
output5 = 512x22x22

Look at the 50-layer column.
is the structure of the layers presented in this paper:

resnet-50
There are four groupsBig Block，
3, 4, 6, 3 in each group.Small block，
everyoneSmall blockIt's in there.Three convolutions，
Additionally there is a separate convolutional layer at the very beginning of this network.
Thus it is: (3+4+6+3)*3+1=49
Finally another fully connected layer, thus 50 layers in total

As shown below, each large block inside the
The first one is allIN !==OUTSituation.unorthodoxSide branch, named:Conv Block
Everything else. IN ==OUTSituation.right (-hand)Lateral branch, named:ID Block
3 = unorthodox+Right + Right
4 = unorthodox+Right+Right+Right
6 = unorthodox+Right+Right+Right+Right+Right
3 = unorthodox+Right + Right

0 Calculation of feature drawing dimensions

1 Convolutional layer calculation: N = (W-F+2P)/2 + 1
F convolution kernel
S Step length
P Padding

2 Pooling layer calculation: N = (W-F)/s + 1
F convolution kernel
S Step length
P Padding

3 When dimensions are not divisible，
Convolution rounds down and pooling rounds up.
In this question, (200-5+2*1)/2+1 is 99.5, which is taken as 99.
(99-3)/1+1 is 97.
(97-3+2*1)/1+1 for 97

Size invariant case before and after convolution:When stride is 1, when kernel is 3 padding is 1 or kernel is 5 padding is 2, the size before and after the convolution remains unchanged.

1 resnet circumvents the gradient vanishing problem with skip-connection

The gradient disappears:Reverse passes to the shallower layers will result in a gradient so small that it approaches 0, the
Leads to inefficient learning and slower and slower updating of parameters

The accumulation of multiple Resnet Blocks solves the gradient vanishing problem.
Resnet Block = main path + skip connection

2 ResNet has 2 basic blocks.

Identity Block: The dimensions of the inputs and outputs are the same, so more than one can be connected in series;
Multiple in series.
can be directly summed.
dimensionality is unchanged(input shape = output shape)
Conv Block: The dimensions of the inputs and outputs are not the same, so they can't be connected in consecutive series, which is supposed to be used to change the dimensions of the feature vectors
cannot be connected in continuous series.
Added a conv2d layer to the skip connection, to make the dimensions equal and then add them up.
dimensionality (input shape != output shape)

Because CNNs end up converting the input image little by little into a very small but deep feature map.
The usual routine is to use a uniformly smaller kernel (e.g., VGG is all about 3*3).
But as the depth of the network increases, the CHANNEL of the OUTPUT increases (and what is learned becomes more complex).
That's why it's necessary to enter theIdentity Block before converting the dimensions with Conv Block.this kind ofThis can be followed by a succession of Identity Blocks.

Identity Block:

Conv Block:

Conv Block The difference is:
It's really just adding a conv2D layer to the shortcut path (1*1 filter size)，
after thatmain path change dimensionand corresponds to the shortcut path.

3 How to Build a Conv Block Across Three Layers

1 main path
First: Conv-BatchNorm-ReLU block
conv2d: filter=F1, kernel_size=1, stride=s, padding=valid
The output shape is reduced.
Name it, random seed=0, BatchNorm axis=3 🙂 Looks like it's tf keras

Second: Conv-BatchNorm-ReLU block
conv2d: filter=F2, kernel_size=f, stride=1, padding=same
The output shape is unchanged.
Name it, random seed=0, BatchNorm axis=3 🙂 Looks like it's tf keras

Third: Conv-BatchNorm-ReLU block
conv2d: filter=F3, kernel_size=1, stride=1, padding=valid
The output shape is unchanged.
Get the final X_output

2 skip-connection
Conv-BatchNorm block
conv2d: filter=F3, kernel_size=1, stride=s, padding=valid
The shape is consistent with X_output.
axis=3
Returns X_skip

3 X_skip + X_output via ReLU function

4 How to Build an Identity Block Across Three Layers

1 main path
First: Conv-BatchNorm-ReLU block
conv2d: kernel_size=1, stride=1, padding=valid
The output shape is unchanged.
Name it, random seed=0, BatchNorm axis=3 🙂 Looks like it's tf keras

Second: Conv-BatchNorm-ReLU block
conv2d: kernel_size=f, stride=1, padding=same
The output shape remains unchanged.
Name it, random seed=0, BatchNorm axis=3 🙂 Looks like it's tf keras

Third: Conv-BatchNorm block
conv2d: ditto
The output shape remains unchanged.
Get the final X_output

2 skip-connection

3 X_identity = X + X_output via ReLU function

4 Overall structure:

zera-padding:
(3x3) Add 3 pixels to the top, bottom, left and right sides

stage1:
Conv: filters=64, kernel_size=7x7, stride=2x2
BatchNorm:
RELU:
MaxPooling: windows=3x3, stride=2x2

stage2:
1xConv Block: named a
3set: [64, 64, 256], k_s=3x3, stride=1x1
2xID Block: named b,c
3set: [64, 64, 256], k_s=3x3,

stage3:
1xConv Block: named a
3set: [128, 128, 512], k_s=3x3, stride=2x2
3xID Block: named b,c,d
3set: [128, 128, 512], k_s=3x3

stage4:
1xConv Block: named a
3set: [256, 256, 1024], k_s=3x3, stride=2x2
5xID Block: named b,c,d,e,f
3set: [256, 256, 1024], k_s=3x3

stage5:
1xConv Block: named a
3set: [512, 512, 2048], k_s=3x3, stride=2x2
2xID Block: named b,c
3set: [512, 512, 2048], k_s=3x3

Average Pooling: named avg_pool
windows=(2x2)
Flatten:
Fully Connected(Dense) layer: named 'fc'

5 resnet50 text detail

block_sizes=[3, 4, 6, 3] refers to the block sizes of the 4 layers after stage1(first pool), corresponding to res2,res3,res4,res5, respectively.
    The first block of each layer does conv+BN on shortcut, i.e. Conv Block
inputs: (1, 720, 1280, 3)
initial_conv.
    conv2d_fixed_padding()
    1. kernel_size=7, first do padding(1, 720, 1280, 3) -> (1, 726, 1286, 3)
    2. conv2d kernels=[7, 7, 3, 64], stride=2, VALID convolution. 7x7 kernel, padding is 3, in order to ensure that the top left corner and the center point of the convolution kernel are correctly aligned to its
       (1, 726, 1286, 3) -> (1, 360, 640, 64)
    3. BN, Relu (only resnetv1 does BN and Relu after first conv)
initial_max_pool.
    k=3, s=2, padding='SAME', (1, 360, 640, 64) -> (1, 180, 320, 64)
The following are all building_blocks without bottlenecks
block_layer1.
    (3 blocks, interlayer stride=1 (previous layer is a pool), 64 filters, no bottleneck (multiply by 4 if you use bottleneck convolutional kernels))
    1. first block.
    Conv Block has projection_shortcut, and strides can be equal to 1 or 2.
    Identity Block has no projection_shortcut, and strides can only be equal to 1.
        `inputs = block_fn(inputs, filters, training, projection_shortcut, strides, data_format)`
        shortcut does [1, 1, 64, 64], stride=1 for conv and BN, shape is unchanged
        Then add the result of 3 convolutions of input in the main branch, and Relu it together, note that the last convolution in the block is only BN but not Relu.
        input: conv-bn-relu-conv-bn-relu-conv-bn and shortcut are added together and then do relu
        shortcut: conv-bn
        shortcut: [1, 1, 64, 64], s=1, (1, 180, 320, 64) -> (1, 180, 320, 64)
        inputs do two convolutions of [3, 3, 64, 64], s=1, shape unchanged (1, 180, 320, 64) -> (1, 180, 320, 64) -> (1, 180, 320, 64)
        inputs += shortcut, then relu
    2. For the remaining 2 blocks, do the same for each.
        `inputs = block_fn(inputs, filters, training, None, 1, data_format)`.
        The shortcut is added directly to the inputs convolution, no conv-bn.
        inputs are convolved twice with [3, 3, 64, 64], s=1, shape unchanged (1, 180, 320, 64) -> (1, 180, 320, 64) -> (1, 180, 320, 64)
        inputs += shortcut, relu
block_layer2/3/4 are the same as block_layer1, except that each layer has a different number of identity blocks, different number of convolution kernels and different stride between layers, but still only the shortcut of the first conv block is conv-bn.
block_layer2: 4 blocks, 128 filters, interlayer stride=2 (because there is no pool after the previous layer)
    1. first block.
        Do kernel=[1, 1, 64, 128], s=2 conv and BN for shortcut, (1, 180, 320, 64) -> (1, 90, 160, 128)
        For the main branch first do kernel=[3, 3, 64, 128], s=2 conv, padding='VALID', (1, 180, 320, 64) -> (1, 90, 160, 128)
                Then do convolution with kernel=[3, 3, 128, 128], s=1, padding='SAME', (1, 90, 160, 128) -> (1, 90, 160, 128)
    2. The remaining 3 blocks, each with the same operation.
        The shortcut does not operate and adds the result directly to Relu.
        Convolution of main branch twice [3, 3, 128, 128], s=1, padding='SAME', (1, 90, 160, 128) -> (1, 90, 160, 128) -> (1, 90, 160, 128)
block_layer3: 6 blocks, 256 filters, interlayer stride=2
    1. first_block.
        Do kernel=[1, 1, 128, 256], s=2 conv and BN for shortcut, (1, 90, 160, 128) -> (1, 45, 80, 256)
        For the main branch first do kernel=[3, 3, 128, 256], s=2 conv, padding='VALID', (1, 90, 160, 128) -> (1, 45, 80, 256)
                Then do convolution with kernel=[3, 3, 256, 256], s=1, padding='SAME', (1, 45, 80, 256) -> (1, 45, 80, 256)
    2. The remaining 5 blocks, each with the same operation.
        The shortcut does not operate, it adds the result directly and does Relu.
        Convolution of main branch twice [3, 3, 256, 256], s=1, padding='SAME', (1, 45, 80, 256) -> (1, 45, 80, 256) -> (1, 45, 80, 256)
block_layer4: 3 blocks, 512 filters, interlayer stride=2
    1. first_block.
        Do kernel=[1, 1, 256, 512], s=2 for conv and BN for shortcut, (1, 45, 80, 256) -> (1, 23, 40, 512)
        For the main branch first do kernel=[3, 3, 256, 512], s=2 conv, padding='VALID', (1, 45, 80, 256) -> (1, 23, 40, 512)
                Then do convolution with kernel=[3, 3, 512, 512], s=1, padding='SAME', (1, 23, 40, 512) -> (1, 23, 40, 512)
    2. The remaining 2 blocks, each with the same operation.
        The shortcut does not operate, it adds the result directly and does Relu.
        Convolution of main branch twice [3, 3, 512, 512], s=1, padding='SAME', (1, 23, 40, 512) -> (1, 23, 40, 512)
avg_pool, 7*7
FC, output1000
softmax
Output prediction

6 resnet50 illustration

这里写图片描述