Big

lecture 07
how does AI learn?
Gradient Descent (intuitions only)
+
style transfer with CreateML

new series of following lectures --
what i have learned from learning AI 🤓
all but hard maths 🤭

after today's lecture:
-- intuition about how to numberify AI learning process 🤑
-- intuition about the technique behind learning aka gradient descent 🏂
-- style transfer and how to train it using createML 🧑‍🎤

your turn:
--0. download a dataset of your interest from roboflow
--1. open CreateML and add train/val/test data folder into the data sources
--2. select transfer learning
--3. ❗️enter a smaller number of iteration (e.g. 1k)
--4. fire off the training!!!
--5. sketch a simple idea about an app using object detection models (1-3 sentences)

GAME SETTINGS 🎰
--1. environment: littel 2D creature living on a curve terrain 🗻
--2. objective: find the valley 🔻
--3. player control: moving along the X axis, left or right (direction)? how far (stepsize)?🕹️
--4. world mist: mostly unknown but some very local information
all we know is that we can feel the slope under out feet 🌫️

on the whiteboard:
answer 2: jackpot! 🥰
no slope means that we have reached the valley!!!
a gentle reminder: avoiding being omniscient in this game, we know it is the valley not because we can see it being the lowest point from outside the game world

on the whiteboard:
question 3: back to the start, now we know how to decide the direction but how about our step size?
the dangerous situation with big step size 🥾

on the whiteboard:
answer 3: a good strategy is that we should decrease our step size when the slope gets flatter

on whiteboard:
question 4: game level up! new terrain unlocked... 🔐
how can we know if we are at THE valley (if flat slope is all we are looking for) ??? 🥲

on the whiteboard:
BONUS 💰 question 1: start here and is there any chance we end up there (that is not a local minima) ?
hint: run simulations in your 🧠, follow the
"feel the slope -> decide the direction
-> pick a step size -> jump to the point
-> repeat"
process

on the whiteboard:
BONUS 💰 answer 1: barely possible
don't worry too much about the local maxima

on the whiteboard:
BONUS 💰 question 2: start here and is there any chance we end up there (that is not a local minima) ?
hint: run simulations in your 🧠, follow the
"feel the slope -> decide the direction
-> pick a step size -> jump to the point
-> repeat"
process

on the whiteboard:
BONUS 💰 answer 2: likely!!!
we could get trapped at the saddle point 🪤
what can we do?
larger step size helps us get carried over

SAME SETTINGS 🎰
--1. curve terrain 🗻: a loss function measuring distance between prediction and groudtruth

SAME SETTINGS 🎰
--3. player control 🕹️: adjust values of parameters in AI models by deciding how much to increase/decrease each param

SAME SETTINGS 🎰
--4. world mist 🌫️: we are agnostic of what parameter values would give the perfect solution
but we can compute the gradient (the slope) given current parameter values (under our feet)

SAME SETTINGS 🎰
--1. curve terrain 🗻: a loss function measuring distance between prediction and groudtruth
--2. objective 🔻: find the lowest loss point
--3. player control 🕹️: adjust values of parameters in AI models by deciding how much to increase/decrease each param
--4. world mist 🌫️: we are agnostic of what parameter values would give the perfect solution
but we can compute the gradient (the slope) given current parameter values (under our feet)

SAME TECHNIQUES 🍜
--1. use the slope (gradient) direction to infer what directions to adjust for each parameter
gradient: derivative, aka the slope
direction really just refer to the plain binary choice of "increase or decrease / + or - "

SAME TECHNIQUES 🍜 --2. also use the slope (gradient) value as indicator of how close we are to a potential valley

SAME TECHNIQUES 🍜
--3. also use the slope (gradient) value to determine the step size of adjustment
step size: learning rate

SAME TECHNIQUES 🍜
--1. use the slope (gradient) direction to infer what directions to adjust for each parameter
direction really just refer to the plain binary choice of "increase or decrease / + or - "
--2. also use the slope (gradient) value as indicator of how close we are to a potential valley
--3. also use the slope (gradient) value to determine the step size of adjustment
step size: learning rate

SAME FINDINGS IMPLIED 🍜
-- mostly local minima
-- impossible global minima
-- no need to worry about local maxima
-- extra caution for saddle point (larger step size)

RECAP 1️⃣
numberify and rephrase the AI learning process:
minimizing loss (cost) function by adjusting parameter values

RECAP 2️⃣
after numberifying the problem setting, we can then apply some math trick called gradient descent
-- find the steepest decreasing direction
-- one gradient for one parameter

RECAP 3️⃣
we multiply the (minus) gradients with some learning rate to decide the adjustment values

let's watch some of this video together
to verify our intuitions
and to connect them with the practical process

another two jargons unlocked:
backpropagation: a gradients calculation scheme
optimizer: conventionally in python DL libraries, all these backprop/GD stuff are handled by an object called "optimizer"

In the PokemonGAN example,
the "optim" class from torch takes care of everything we talked about today (gradient descent, backprop, etc.)
try finding "optim" in the code!

play around with:
-- what style/content images to use
--- number of iterations
-- style intensity
--- and every tunable parameter (hyperparameters) there!

today we talked about:

-- intuition about how to numberify the AI learning process 🤑
--- which is to minimize __ function by adjusting __ values
-- intuition about the technique behind learning aka Gradient Descent 🏂
--- which is to find the steepest decrease direction and multiply with a learning rate
--- backprop is a scheme to calculate gradients
--- "optimizer" for handling the backprop
-- style transfer and how to train it using createML 🧑‍🎤

today we talked about:

-- intuition about how to numberify AI learning process 🤑
--- which is to minimize loss function by adjusting parameter values
-- intuition about the technique behind learning aka Gradient Descent 🏂
--- which is to find the steepest decrease direction and multiply with a learning rate
--- backprop is a scheme to calculate gradients
--- "optimizer" for handling the backprop
-- style transfer and how to train it using createML 🧑‍🎤