관리 메뉴

studio.heelab

[DL for CV] Introduction 본문

MMAILab

[DL for CV] Introduction

heez 2026. 3. 4. 00:35
반응형

Lecture: https://www.youtube.com/playlist?list=PLoROMvodv4rOmsNzYBMe0gJY2XS8AQg16

 

Stanford CS231N Deep Learning for Computer Vision I 2025

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving car...

www.youtube.com

 

Lecture 1: Introduction

Agenda

-A breif history of CV and DL

 

1. The Essence and Historical Origins of Computer Vision

1959 Hubels &Wiesel

 

1963 Roberts

 

1970s David Marr

 

1979 Gen.Cylinders - Recognize via Parts

 

1986 Canny - Recognition via Edge Detection

 

1990s Recognition via Grouping 

 

2000s Recognition via Matching, Face Detection, PASCAL Visual Object Challenge 

 

2006 Deep Leerning 

 

Visual recognition is a fundamental taxk for visual intelligence

 

2. The Deep Learning Revolution and the Importance of Data

ImageNet dataset

A major reason early neural networks failed to recognize complex real-world images was a lack of data. Professor Fei-Fei Li's team proved the decisive role of data in machine learning by constructing the ImageNet dataset, containing 15 million images.

 

AlexNet 2012: DL

The modern deep learning revolution began in earnest when AlexNet won the ImageNet challenge by an overwhelming margin. This was the result of combining sophisticated algorithms (Backpropagation), powerful computing resources (GPUs), and large-scale data.

 

2012 to Present: DL Explosion

picture, video, human movement

 

3. Tasks and Applications of Modern Computer Vision

The lecture introduces various visual tasks that go beyond simple image classification:

  • Expansion of Visual Understanding: Includes technologies for precise object identification such as Object Detection, Semantic Segmentation, and Instance Segmentation.
  • Generative AI and Multimodal: Covers generative models like DALL-E (text-to-image), Style Transfer, and models combining vision with language.
  • Future Technologies: 3D Reconstruction, video understanding, and Embodied AI integrated with robotics are mentioned as next-generation core technologies.

4. Human-Centered AI and Responsibility

  • Social Impact: Since AI models learn from data created by human activity, there is a risk of reflecting human bias
  • Positive Applications: It is crucial to utilize computer vision to improve human life, such as in medical imaging analysis and elderly care.

 

5. Course Overview

Deep Learning Basics (Lecture 2-4)

Image Classification: A core task in CV

- Linear classification, optimization, regularization, and basic principles of neural networks.

 

 

Perceiving and Understanding the Visual World (Lecture 5-12)

Task beyond Image Classification

classification -> semantic segmentation -> object detection -> instance segmentation

 

Models Beyond Muti-layer Perceptron (Lecture 13-17)

CNN(Convolutional neural network

 

RNN(Recurrent neural network)

 

Attention mechanism / Transformers

 

 

Generative and Interactive Visual Intelligence

Self-supervised Learning

 

Generative Modeling

using diffusion models

 

Vision Language Models

 

3D Vision

 

Human-Centered Applications and Implications (Lecture 18)

반응형