CNN Configuration - Yousef's Notes

#Overview

Configuring a CNN involves designing the architecture (layers, filters, connections) and setting training hyperparameters.

#CNN Architecture Components

#1. Convolutional Layer

Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

Key decisions:

Kernel size: 3×3 (standard), 5×5, 7×7 (early layers)
Stride: 1 (preserve size), 2 (downsample)
Padding: 0 (valid), 1 for 3×3 kernel (same)
Output channels: Powers of 2 (32, 64, 128, 256, 512)

#2. Pooling Layer

MaxPool2d(kernel_size=2, stride=2)    // Most common
AvgPool2d(kernel_size=2, stride=2)   // For final layers
AdaptiveAvgPool2d(output_size=1)     // Global pooling

#3. Activation Function

ReLU: Default choice (fast, no vanishing gradient)
LeakyReLU: For GANs/deep networks
GELU: Modern alternative (Vision Transformers)

#4. Normalization

BatchNorm: Standard for CNNs
LayerNorm: Used in Transformers

#Standard CNN Configuration Template

Algorithm ConfigureCNN(input_shape, num_classes, architecture_type="standard")
Input:
    - input_shape: (C, H, W) - channels, height, width
    - num_classes: number of output classes
    - architecture_type: "shallow", "standard", "deep", "resnet"

Output:
    - model: configured CNN architecture

// ============================================
// CONFIGURATION PARAMETERS
// ============================================
1. C, H, W ← input_shape

2. // Choose architecture parameters
3. SWITCH architecture_type:
4.     CASE "shallow":           // For small datasets
5.         conv_layers ← [(32, 3), (64, 3)]
6         fc_units ← [128]
7.         dropout ← 0.3
8.
9.     CASE "standard":          // Default
10.        conv_layers ← [(32, 3), (64, 3), (128, 3), (256, 3)]
11.        fc_units ← [512, 256]
12.        dropout ← 0.5
13.
14.    CASE "deep":              // For large datasets
15.        conv_layers ← [(64, 3), (64, 3), (128, 3), (128, 3),
16.                       (256, 3), (256, 3), (512, 3)]
17.        fc_units ← [1024, 512, 256]
18.        dropout ← 0.5
19.
20.    CASE "resnet":            // ResNet-style
21.        RETURN ConfigureResNet(input_shape, num_classes)

// ============================================
// BUILD CONVOLUTIONAL FEATURE EXTRACTOR
// ============================================
22. layers ← []
23. in_channels ← C
24.
25. FOR (out_channels, kernel_size) IN conv_layers DO:
26.
27.     // Convolution
28.     APPEND(layers, Conv2d(in_channels, out_channels,
29.                           kernel_size, padding=kernel_size//2))
30.
31.     // Normalization
32.     APPEND(layers, BatchNorm2d(out_channels))
33.
34.     // Activation
35.     APPEND(layers, ReLU())
36.
37.     // Pooling every 1-2 conv layers
38.     APPEND(layers, MaxPool2d(kernel_size=2, stride=2))
39.
40.     in_channels ← out_channels
41. END FOR

// ============================================
// GLOBAL POOLING / FLATTENING
// ============================================
42. // Option 1: Global Average Pooling (preferred)
43. APPEND(layers, AdaptiveAvgPool2d(output_size=1))
44. flatten_size ← out_channels    // After global pooling
45.
46. // Option 2: Flatten (alternative)
47. // flatten_size ← out_channels × (H // 2^num_pooling) × (W // 2^num_pooling)

// ============================================
// BUILD CLASSIFIER (Fully Connected)
// ============================================
48. classifier ← []
49. in_features ← flatten_size
50.
51. FOR units IN fc_units DO:
52.     APPEND(classifier, Linear(in_features, units))
53.     APPEND(classifier, ReLU())
54.     APPEND(classifier, Dropout(p=dropout))
55.     in_features ← units
56. END FOR
57.
58. // Output layer
59. APPEND(classifier, Linear(in_features, num_classes))
60.
61. RETURN {features: layers, classifier: classifier}

#Training Configuration

Algorithm ConfigureTraining(model, dataset_size, task="classification")
Input:
    - model: CNN model
    - dataset_size: number of training samples
    - task: "classification", "segmentation", "detection"

Output:
    - training_config: hyperparameters and settings

// ============================================
// OPTIMIZER CONFIGURATION
// ============================================
1. IF dataset_size < 10000 THEN:
2.     optimizer ← "SGD"
3.     learning_rate ← 0.01
4.     momentum ← 0.9
5.     weight_decay ← 5e-4
6. ELSE:
7.     optimizer ← "Adam"
8.     learning_rate ← 0.001
9.     betas ← (0.9, 0.999)
10.    weight_decay ← 1e-4
11. END IF

// ============================================
// LEARNING RATE SCHEDULE
// ============================================
12. scheduler ← "StepLR"
13. step_size ← 30    // epochs
14. gamma ← 0.1       // multiply LR by this every step_size epochs

// Alternative: CosineAnnealing
// scheduler ← "CosineAnnealingLR"
// T_max ← num_epochs

// ============================================
// BATCH SIZE
// ============================================
15. IF dataset_size < 1000 THEN:
16.     batch_size ← 16
17. ELSE IF dataset_size < 10000 THEN:
18.     batch_size ← 32
19. ELSE IF dataset_size < 100000 THEN:
20.     batch_size ← 64
21. ELSE:
22.     batch_size ← 128
23. END IF

// ============================================
// LOSS FUNCTION
// ============================================
24. SWITCH task:
25.     CASE "classification":
26.         IF num_classes == 2 THEN:
27.             loss_fn ← "BCEWithLogitsLoss"
28.         ELSE:
29.             loss_fn ← "CrossEntropyLoss"
30.         END IF
31.
32.     CASE "segmentation":
33.         loss_fn ← "CrossEntropyLoss"  // or "DiceLoss"
34.
35.     CASE "detection":
36.         loss_fn ← "MultiTaskLoss"    // Classification + Regression

// ============================================
// DATA AUGMENTATION
// ============================================
37. train_transforms ← [
38.     RandomHorizontalFlip(p=0.5),
39.     RandomRotation(degrees=15),
40.     ColorJitter(brightness=0.2, contrast=0.2),
41.     Normalize(mean=DATASET_MEAN, std=DATASET_STD)
42. ]

43. val_transforms ← [
44.     Normalize(mean=DATASET_MEAN, std=DATASET_STD)
45. ]

// ============================================
// TRAINING LOOP SETTINGS
// ============================================
46. num_epochs ← 100
47. early_stopping_patience ← 10
48. gradient_clip_value ← 1.0    // For preventing exploding gradients

49. RETURN {
50.     optimizer: optimizer,
51.     learning_rate: learning_rate,
52.     scheduler: scheduler,
53.     batch_size: batch_size,
54.     loss_fn: loss_fn,
55.     num_epochs: num_epochs,
56.     train_transforms: train_transforms,
57.     val_transforms: val_transforms,
58.     early_stopping: early_stopping_patience,
59.     gradient_clip: gradient_clip_value
60. }

#Common Architecture Patterns

#VGG-Style (Sequential)

Conv(3, 64) → Conv(64, 64) → Pool
→ Conv(64, 128) → Conv(128, 128) → Pool
→ Conv(128, 256) → Conv(256, 256) → Pool
→ FC(512) → FC(num_classes)

#ResNet-Style (Skip Connections)

Input → Conv → BN → ReLU → MaxPool
→ ResBlock × 2 (64 filters)
→ ResBlock × 2 (128 filters, stride=2)
→ ResBlock × 2 (256 filters, stride=2)
→ ResBlock × 2 (512 filters, stride=2)
→ GlobalAvgPool → FC(num_classes)

ResBlock:

Input → Conv → BN → ReLU → Conv → BN → (+ Input) → ReLU

#Output Size Calculation

Formula for conv output:
    output_size = floor((input_size + 2×padding - kernel_size) / stride) + 1

Formula for pooling output:
    output_size = floor(input_size / stride)

Example: 224×224 input with 3 Conv+Pool blocks
    Block 1: 224 → 112 (after 3×3 conv, stride 1, pad 1 + 2×2 pool, stride 2)
    Block 2: 112 → 56
    Block 3: 56 → 28
    Final feature map: 28×28

#Transfer Learning Configuration

Algorithm ConfigureTransferLearning(base_model, num_classes,
                                     freeze_strategy="partial")
Input:
    - base_model: pretrained model (e.g., ResNet50)
    - num_classes: new number of classes
    - freeze_strategy: "none", "partial", "all"

Output:
    - model: configured model for fine-tuning

1. // Replace final layer
2. num_features ← base_model.fc.in_features
3. base_model.fc ← Linear(num_features, num_classes)

4. // Freeze strategy
5. SWITCH freeze_strategy:
6.     CASE "all":
7.         FOR param IN base_model.parameters() EXCEPT fc:
8.             param.requires_grad ← FALSE
9.
10.    CASE "partial":
11.        // Freeze early layers, train later layers
12.        FOR layer IN base_model.layers[:4]:
13.            FOR param IN layer.parameters():
14.                param.requires_grad ← FALSE
15.
16.    CASE "none":
17.        // Train all layers
18.        pass

19. // Different learning rates
20. optimizer ← Adam([
21.     {'params': base_model.fc.parameters(), 'lr': 0.001},
22.     {'params': base_model.layer4.parameters(), 'lr': 0.0001},
23.     {'params': base_model.earlier_layers.parameters(), 'lr': 0.00001}
24. ])

25. RETURN base_model

#Quick Reference: Layer Choices

Layer Type	When to Use	Typical Values
Conv 3×3	Default choice	stride=1, padding=1
Conv 1×1	Dimension reduction	Bottleneck layers
Conv 5×5	Early layers (receptive field)	First layer only
MaxPool 2×2	Downsampling	stride=2
GlobalAvgPool	Final layer before FC	output=1
BatchNorm	After every conv	Default settings
Dropout	Before FC layers	p=0.3-0.5

#Python Implementation (PyTorch)

import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        # Feature extractor
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2),

            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2),

            nn.Conv2d(128, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d(1)
        )

        # Classifier
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x