lecture 07
how does AI learn?
Gradient Descent (intuitions only)
+
style transfer with CreateML
SeAts APp SEAtS ApP SEaTS APP
🧘
as usual, an AI-related fun project to wake us up
new series of following lectures --
what i have learned from learning AI πŸ€“
all but hard maths 🀭
AI intuitions (help me with a cooler name)
after today's lecture:
-- intuition about how to numberify AI learning process πŸ€‘
-- intuition about the technique behind learning aka gradient descent πŸ‚
-- style transfer and how to train it using createML πŸ§‘β€πŸŽ€
object detection training revised (gentle noodling)
your turn:
--0. download a dataset of your interest from roboflow
--1. open CreateML and add train/val/test data folder into the data sources
--2. select transfer learning
--3. ❗️enter a smaller number of iteration (e.g. 1k)
--4. fire off the training!!!
--5. sketch a simple idea about an app using object detection models (1-3 sentences)
a very recent text-to-audio generation model with playable demo: AudioLDM
AI intuitions 01
on whiteboard:
let's start with a game! πŸ•ΉοΈ
GAME SETTINGS 🎰
--1. environment: littel 2D creature living on a curve terrain πŸ—»
--2. objective: find the valley πŸ”»
--3. player control: moving along the X axis, left or right (direction)? how far (stepsize)?πŸ•ΉοΈ
--4. world mist: mostly unknown but some very local information
all we know is that we can feel the slope under out feet 🌫️
on the whiteboard:
start here,
question 1: shall i go left or right? 😈
on the whiteboard:
answer 1: go with the downward slope direction πŸ˜‰
on the whiteboard:
question 2: what happen if the slope is flat? 🧐
on the whiteboard:
answer 2: jackpot! πŸ₯°
no slope means that we have reached the valley!!!
a gentle reminder: avoiding being omniscient in this game, we know it is the valley not because we can see it being the lowest point from outside the game world
on the whiteboard:
question 3: back to the start, now we know how to decide the direction but how about our step size?
the dangerous situation with big step size πŸ₯Ύ
on the whiteboard:
answer 3: a good strategy is that we should decrease our step size when the slope gets flatter
on whiteboard:
question 4: game level up! new terrain unlocked... πŸ”
how can we know if we are at THE valley (if flat slope is all we are looking for) ??? πŸ₯²
on the whiteboard:
answer 4: NO WE CAN'T πŸ₯Ή
these are global minima and local minima
on the whiteboard:
BONUS πŸ’° question 1: start here and is there any chance we end up there (that is not a local minima) ?
hint: run simulations in your 🧠, follow the
"feel the slope -> decide the direction
-> pick a step size -> jump to the point
-> repeat"
process
on the whiteboard:
BONUS πŸ’° answer 1: barely possible
don't worry too much about the local maxima
on the whiteboard:
BONUS πŸ’° question 2: start here and is there any chance we end up there (that is not a local minima) ?
hint: run simulations in your 🧠, follow the
"feel the slope -> decide the direction
-> pick a step size -> jump to the point
-> repeat"
process
on the whiteboard:
BONUS πŸ’° answer 2: likely!!!
we could get trapped at the saddle point πŸͺ€
what can we do?
larger step size helps us get carried over
on the whiteboard:
but what can we do?
larger step size helps us get carried over
MISSION ACCOMPLISHED ❀️‍πŸ”₯
wait how about AI?
that's exactly how AI learns
SAME SETTINGS 🎰
--1. curve terrain πŸ—»: a loss function measuring distance between prediction and groudtruth
SAME SETTINGS 🎰
--2. objective πŸ”»: find the lowest loss point
SAME SETTINGS 🎰
--3. player control πŸ•ΉοΈ: adjust values of parameters in AI models by deciding how much to increase/decrease each param
SAME SETTINGS 🎰
--4. world mist 🌫️: we are agnostic of what parameter values would give the perfect solution
but we can compute the gradient (the slope) given current parameter values (under our feet)
SAME SETTINGS 🎰
--1. curve terrain πŸ—»: a loss function measuring distance between prediction and groudtruth
--2. objective πŸ”»: find the lowest loss point
--3. player control πŸ•ΉοΈ: adjust values of parameters in AI models by deciding how much to increase/decrease each param
--4. world mist 🌫️: we are agnostic of what parameter values would give the perfect solution
but we can compute the gradient (the slope) given current parameter values (under our feet)
SAME TECHNIQUES 🍜
--1. use the slope (gradient) direction to infer what directions to adjust for each parameter
gradient: derivative, aka the slope
direction really just refer to the plain binary choice of "increase or decrease / + or - "
SAME TECHNIQUES 🍜 --2. also use the slope (gradient) value as indicator of how close we are to a potential valley
SAME TECHNIQUES 🍜
--3. also use the slope (gradient) value to determine the step size of adjustment
step size: learning rate
SAME TECHNIQUES 🍜
--1. use the slope (gradient) direction to infer what directions to adjust for each parameter
direction really just refer to the plain binary choice of "increase or decrease / + or - "
--2. also use the slope (gradient) value as indicator of how close we are to a potential valley
--3. also use the slope (gradient) value to determine the step size of adjustment
step size: learning rate
SAME FINDINGS IMPLIED 🍜
-- mostly local minima
-- impossible global minima
-- no need to worry about local maxima
-- extra caution for saddle point (larger step size)
RECAP 1️⃣
numberify and rephrase the AI learning process:
minimizing loss (cost) function by adjusting parameter values
RECAP 2️⃣
after numberifying the problem setting, we can then apply some math trick called gradient descent
-- find the steepest decreasing direction
-- one gradient for one parameter
RECAP 3️⃣
we multiply the (minus) gradients with some learning rate to decide the adjustment values
DONEπŸŽ‰
let's watch some of this video together
to verify our intuitions
and to connect them with the practical process
maybe this as well
well done everyone πŸŽ‰
we have gone through MSc-level content
another two jargons unlocked:
backpropagation: a gradients calculation scheme
optimizer: conventionally in python DL libraries, all these backprop/GD stuff are handled by an object called "optimizer"
In the PokemonGAN example,
the "optim" class from torch takes care of everything we talked about today (gradient descent, backprop, etc.)
try finding "optim" in the code!
very into jon rafman's work recently
next: style transfer with CreateML
the training process is easy,
curating the dataset is fun!
style transfer paper here, some parts are super interesting
play around with:
-- what style/content images to use
--- number of iterations
-- style intensity
--- and every tunable parameter (hyperparameters) there!
today we talked about:

-- intuition about how to numberify the AI learning process πŸ€‘
--- which is to minimize __ function by adjusting __ values
-- intuition about the technique behind learning aka Gradient Descent πŸ‚
--- which is to find the steepest decrease direction and multiply with a learning rate
--- backprop is a scheme to calculate gradients
--- "optimizer" for handling the backprop
-- style transfer and how to train it using createML πŸ§‘β€πŸŽ€
today we talked about:

-- intuition about how to numberify AI learning process πŸ€‘
--- which is to minimize loss function by adjusting parameter values
-- intuition about the technique behind learning aka Gradient Descent πŸ‚
--- which is to find the steepest decrease direction and multiply with a learning rate
--- backprop is a scheme to calculate gradients
--- "optimizer" for handling the backprop
-- style transfer and how to train it using createML πŸ§‘β€πŸŽ€