Mathematics of Machine Learning

Hello world it's Suraj and the mathematics of machine learning is it necessary to know math for machine learning
Absolutely machine learning is math. It's all math in this video
I'm gonna help you understand. Why we use math in machine learning by example
machine learning is all about
Creating an algorithm that can learn from data to make a prediction
that prediction could be what an object in a picture looks like or
What the next price for gasoline might be in a certain country or what the best?
Combination of drugs to cure a certain disease might be machine learning is built on
mathematical
prerequisites and
Sometimes it feels like learning them might be a bit overwhelming, but it isn't or is it no
It's not as long as you understand why they're used it'll make machine learning a lot more fun
You can have a full time job
Doing machine learning and not know a single thing about the math behind the functions you're using
But that's no fun is it you want to know why something works?
And why one model is better than another machine learning is powered by the diamond of?
Statistics calculus linear algebra and probability statistics is at the core of everything
Calculus tells us how to learn and optimize our model linear algebra makes running these algorithms feasible on
massive data sets and
Probability helps predict the likelihood of an event occurring
So let's start from scratch with an interesting problem
The problem is to predict the price of an apartment in an up-and-coming neighborhood in New York City
Let's say Harlem shout-out to Harlem. Yo Westside represent, okay?
let's say that all we'll know when we
Eventually make a prediction is the price per square foot of a given apartment, that's the only marker?
We'll use to predict the price of the apartment as a whole and love for us
We've got a data set of apartments with two columns in the first column
We've got the price per square foot of an apartment in the second column
We've got the price of the apartment as a whole there's got to be some kind of correlation here
and if we build a predictive model
We can learn what that correlation is so that in the future if all we're given is the price per square foot of a house
We can predict the price of it if we were to graph out this data
Let's graph this out with the x-axis measuring the price per square foot and the y-axis
Measuring the price of a house it would be a scatter plot
Ideally we could find a line that intersects as many data points as possible
and then we could just plug in some input data into our line and out comes the prediction poof in
mathematics the field of statistics acts as a collection of techniques that extract useful information
From data. It's a tool for creating an
understanding from a set of numbers
Statistical inference is the process of making a prediction about a larger population
of data based on a smaller sample as in what can we infer about a
Populations parameters based on a sample statistic sounds pretty similar to what we're trying to do right now, right?
Since we're trying to create a line. We'll use a statistical
inference technique called linear regression this allows us to summarize and study the relationship between
two variables a lemma one variable X is regarded as the
Independent variable the other variable Y is regarded as the dependent variable the way we can represent linear
Regression is by using the equation y equals MX plus B Y is the prediction X is the input?
B is the point where the line intersects the y-axis and M is the slope of the line
We already know what the x value would be and why is our prediction if we had M. And B
We would have a full equation plug and play easy prediction, but the question is how do we get these variables?
Naive way would be for us to just try out a bunch of different values
Over and over and over again and plot the line over time using our eyes
We could try and estimate just how well fit the line. We draw is
But that doesn't seem efficient does it we do know there exists some ideal values for M
And B such that the line when drawn using those values
Would be the best fit for our data set let's say we did have a bunch of time on our hands
And we decided to try out a bunch of predicted values for M. And B
we need some way of measuring how good our predicted values are we'll need to use what's called an error function an
Error function will tell us how far off the actual Y value is from our predicted value
There are lots of different types of statistical error functions out there
But let's just try a simple one called least squares
This is what it looks like we'll make an apartment price prediction for each of our data points based on our own intuition
We can use this function to double check
against the actual apartment price value it will subtract each predicted value from the actual value and
Then it will square each of those differences
the Sigma that little a looking thing denotes that we are doing this not just for one data point but for
Every single data point we have M data points to be specific
This is our total error value. We can create a three dimensional graph now
We know the x axis and the y axis they will be all the potential m
and B values respectively
But let's add another axis the z axis and on the z axis would be all the potential error
values for every single combination of M and B if we were to actually graph this out
It would look just like this this kind of bowl like shape Cup it firmly in your hand like a nice
Bowl if we find that data point at the bottom of the bowl the smallest error value. That would be our ideal m
And B values that would give us the line of best fit
But how do we actually do that now we need to borrow from the math discipline?
as calculus the study of change
It's got an optimization technique called gradient descent that will help us discover the minimum value iteratively
It will use the error for a given data point to compute
What's called the gradient of our unknown variable and we can use the gradient to update our two variables?
Then we'll move on to the next data point and repeat the process over and over and over again
Slowly like a ball rolling down a bowl. We'll find what our minimum value is see calculus
Helps us find the direction of change in what direction?
Should we change the unknown variables MMB in our function such that its?
Prediction is more optimal aka the error is smallest, but apartment prices
Don't just depend on the price per square foot right
Also included are different features like the number of bedrooms and the number of
Bathrooms as well as the average price of homes within a mile if we factored in those features as well our regression line would look
More like this there are now multiple
variables to consider so we can call it a
multivariate regression problem the branch of math concerned with the study of
Multivariate spaces and the linear transformations between them is called linear algebra
it gives us a set of
operations that we can perform on groups of numbers known as matrices our training set now becomes an M by I
Matrix of M samples that have I feature x' instead of a single variable with a weight each of the features has a weight
So that's an example of how three of the four main branches of math dealing with machine learning are used
But what about the fourth probability all right?
So let's just scratch this example what if instead of predicting the price of an apartment?
We want to predict whether or not it's in prime condition or not we want to be able to
Classify a house with the probability of it being prime or not prime
Probability is the measure of likelihood of something we can use a probabilistic technique called logistic?
Regression to help us do this since this time our data is categorical as in it has different categories or classes
instead of predicting a value or predicting the
Probability of an occurrence since the probability goes between 0 and 100 we can't use an infinitely stretching line
We're left with some threshold passed some point X
We are more likely than not looking at a prime house
We'll use an s-shaped curve given by the sigmoid function to do this
Once we optimize our function will plug in input data and get a probabilistic class value
just like that so to summarize machine learning consists mainly of
statistics calculus linear algebra and probability theory
Calculus tells us how to optimize linear algebra makes executing algorithms feasible on massive data sets
Probability helps predict the likelihood of a certain outcome and statistics tells us what our goal is
This week's coding challenge is to create a logistic regression model from scratch in
Python on an interesting data set github links go in the comment section and winners will be announced in a week, please subscribe
For more programming videos and for now I've got to build
Thanks for watching