관리 메뉴

studio.heelab

[DL for CV] Image Classification with Linear Classifiers 본문

MMAILab

[DL for CV] Image Classification with Linear Classifiers

heez 2026. 3. 4. 03:46
반응형

Lecture: https://www.youtube.com/watch?v=pdqofxJeBN8&list=PLoROMvodv4rOmsNzYBMe0gJY2XS8AQg16&index=2

 

1. Image Classification

The problem: Semantic Gap

An image is a tensor of integers between [0,255] 

 

Challenges

-Viewpoint variation

 

-Illumination

 

-Background Clutter

 

-Occlusion

 

-Deformation

so cute

-Intraclass variation

 

An image classifier

def classify_image(image):
	#Some magic here:
    return class_label

 

Find edges -> Find corners

 

2. Data-driven Approach

3 Step Process

  1. Collect a dataset of images and labels
  2. Use ML algorithms to train a classifier
  3. Evaluate the classifier on new images

 

3. KNN (K-Nearest Neighbor) Algorithm

 

First classifier: Nearest Neighbor

-memorize all data and labels

def train(images, labels):
	#ML!
    retrun model

 

-Predict the label of the most similar training image

def predict(model, test_images):
	#Use model to predict labels
    return test_labels

 

Distance Functions

L1 (Manhatten) distance

-pixel-wise absolute value difference

 

L2 (Euclidean) distance

 

Hyperparameters

choices about the algorithms themselves

e.g. distance metric, K

 

Choose hyperparameters using the validation set

Only run on the test set once at the very end.

 

Setting Hyperparamters

Idea1. Choose hyperparameters tha work best on the training data

Idea2. Choose hyperparameters tha work best on test data

Idea3. Spilt data into train. val; choose hyperparameters on val and evaluate on test <- better idea

 

Idea4. Cross-validation: Split data into folds, try each fold as validation and average the results

 

Training is very fast O(1), but because it must compare with all data during prediction, the test speed is very slow O(N). Furthermore, distance calculation at the pixel level does not well reflect the semantic similarity of images.

 

4. Linear Classifier

A parameter-based approach that maps the input image ($x$) to output class scores.

f(x,W)=Wx+b

W (Weights): weight parameters that must be found through the learning process.

b (Bias): bias values that adjust the preference for specific classes.

 

 

3 Perspective

1. Algebraic Perspective

Class scores are calculated through matirx multiplication

Input image -> vector

 

2. Visual Perspective

Each row of the weight marix W acts as a template for its corresponding class

 

3. Geometric Perspective

It is equivalent to drawing decision boundaries (hyperplanes) that separate each class in a high-dimensional space

 

5. Loss Function and Softmax

Loss Function

A metric that quantifies how poorly the current classifier is performing (often referred to as "Unhappiness")

 

Softmax Classfier

It applies an exponential function to the output values (Logits) of a linear classifier and normalizes them into probability values between 0 and 1.

 

Cross-Entropy Loss

he goal is to minimize the negative log-likelihood of the probability corresponding to the correct class -log(P). This is equivalent to maximizing the probability of the correct class.

반응형