Big

lecture 8
walking tour of AI developments in computer vision
(meet the big namers)
+
revision

lecture plan
Part 1 🧠
- Mnist and LeNet
- ImageNet and AlexNet
- VGG, Inception, Resnet
- demo
- Style transfer
Part 2 🧑‍🎤
our deal! + walking tour of this unit's summary sheet and mock questions

what is the connection between these?
pattern recognition -> pattern generation

big picture 01:
every big name model is usually chracterised by one or few brilliant architectural designs (aka new layer type).

big picture 02:
we often talk about brilliant designs of AI models and sometimes overlook the importance of dataset

"hello world" dataset of machine learning - MNIST, handwritten digits images with labels

Lenet: one of the earlies Convolutional Neural Network, with conv and pooling layers, the prototype of everything follows

But what next? Where should computer vision go? In the early 2000s, computer vision tasks included matching satellite images, image stitching, 3D scene reconstruction ...

finding the focal point, the right level of abstraction: obejct recognition

it has an annotated dataset
number of training images: 1500±
image dimension: RGB 450*280 ish
four classes: motorbikes, bicycles, people, and cars
type of tasks: let's have a look!

type of tasks:
classification: outputs label
obejct bouding box detection: outputs bounding box
object segmentation: outputs pixel mask
these are the milestone tasks for computer vision

while now "gathering data" is just a everyday word, back at 2005 there was not so much mindset about the "data"

let's look at its visionary scale:
number of training images: 1,281,167
image dimension: RGB 469x387 on average
1000 classes: based on WordNet
tasks: object labels and bounding boxes

and introducing the godmother of recent computer vision developments Feifei Li

another visionary design of ImageNet: 1000 classes from WordNet
WordNet consists of many English-language terms organized into an ontological structure

Wordnet example: a lexical database that is ontologically structured

it is bridging computer vision with cognitive science and 1000-class classification task is far beyond the capability of contemporary models

- 1. It was really bad in 2010 and 2011
- 2. Everything changed from 2012, from AlexNet onwards

- From AlexNet in 2012, we saw an explosion of AI models in CV.
- Let's take a look at some of the big namers (they are all image classifiers initially trained on ImageNet dataset).

Alexnet: the first CNN that goes "deep", and uses GPU for training

Resnet: residual modules ("new layer type"), connections jumping layers

Inception, or GoogleNet: inception modules ("new layer type"), it goes "wide"

all these trained-on-imagenet models can be used as a good starting point for "any" vision task
think of these models as "vision task bootcamp" graduates.
they are good visual feature extractors

let's look at some of the apple models and corresponding tasks, now this page should be much more familiar

style transfer:
input: one content image, one style image
output: content image dipped with the style some online example

wait how does this related to the CNN VGG jargons we just loaded with?

the algorithmic design of neural transfer:
it is an optimisation formalisation comprising two constraints -
-- the generated image should be similar in content with the content image
-- the generated image should be similar in style with the content image

how do we know if two images are similar in content or style?

in other words, how to numberify the content and style???

original quote from this paper :
-- "Two images are similar in content if their high-level features as extracted by a trained classifier are close."
-- "Two images are similar in style if their low-level features as extracted by a trained classifier share the same statistic."

This technical idea also entails discussion. Are styles really can be represented by a bunch of numbers output from VGG? From the hyped style transfer result...

stay tuned..
next unit ML Two we'll be looking into hands-on practice (preparing dataset, training, implementation, etc.)

don't forget to send initial presentation ideas to me next thursday (skechy ones are encouraged! ) and if you feel stuck, let me know!