### Mathematics of Machine Learning

# Mathematics of Machine Learning

Hello world it's Suraj and the mathematics of machine learning is it necessary to know math for machine learning

Absolutely machine learning is math. It's all math in this video

I'm gonna help you understand. Why we use math in machine learning by example

machine learning is all about

Creating an algorithm that can learn from data to make a prediction

that prediction could be what an object in a picture looks like or

What the next price for gasoline might be in a certain country or what the best?

Combination of drugs to cure a certain disease might be machine learning is built on

mathematical

prerequisites and

Sometimes it feels like learning them might be a bit overwhelming, but it isn't or is it no

It's not as long as you understand why they're used it'll make machine learning a lot more fun

You can have a full time job

Doing machine learning and not know a single thing about the math behind the functions you're using

But that's no fun is it you want to know why something works?

And why one model is better than another machine learning is powered by the diamond of?

Statistics calculus linear algebra and probability statistics is at the core of everything

Calculus tells us how to learn and optimize our model linear algebra makes running these algorithms feasible on

massive data sets and

Probability helps predict the likelihood of an event occurring

So let's start from scratch with an interesting problem

The problem is to predict the price of an apartment in an up-and-coming neighborhood in New York City

Let's say Harlem shout-out to Harlem. Yo Westside represent, okay?

let's say that all we'll know when we

Eventually make a prediction is the price per square foot of a given apartment, that's the only marker?

We'll use to predict the price of the apartment as a whole and love for us

We've got a data set of apartments with two columns in the first column

We've got the price per square foot of an apartment in the second column

We've got the price of the apartment as a whole there's got to be some kind of correlation here

and if we build a predictive model

We can learn what that correlation is so that in the future if all we're given is the price per square foot of a house

We can predict the price of it if we were to graph out this data

Let's graph this out with the x-axis measuring the price per square foot and the y-axis

Measuring the price of a house it would be a scatter plot

Ideally we could find a line that intersects as many data points as possible

and then we could just plug in some input data into our line and out comes the prediction poof in

mathematics the field of statistics acts as a collection of techniques that extract useful information

From data. It's a tool for creating an

understanding from a set of numbers

Statistical inference is the process of making a prediction about a larger population

of data based on a smaller sample as in what can we infer about a

Populations parameters based on a sample statistic sounds pretty similar to what we're trying to do right now, right?

Since we're trying to create a line. We'll use a statistical

inference technique called linear regression this allows us to summarize and study the relationship between

two variables a lemma one variable X is regarded as the

Independent variable the other variable Y is regarded as the dependent variable the way we can represent linear

Regression is by using the equation y equals MX plus B Y is the prediction X is the input?

B is the point where the line intersects the y-axis and M is the slope of the line

We already know what the x value would be and why is our prediction if we had M. And B

We would have a full equation plug and play easy prediction, but the question is how do we get these variables?

Naive way would be for us to just try out a bunch of different values

Over and over and over again and plot the line over time using our eyes

We could try and estimate just how well fit the line. We draw is

But that doesn't seem efficient does it we do know there exists some ideal values for M

And B such that the line when drawn using those values

Would be the best fit for our data set let's say we did have a bunch of time on our hands

And we decided to try out a bunch of predicted values for M. And B

we need some way of measuring how good our predicted values are we'll need to use what's called an error function an

Error function will tell us how far off the actual Y value is from our predicted value

There are lots of different types of statistical error functions out there

But let's just try a simple one called least squares

This is what it looks like we'll make an apartment price prediction for each of our data points based on our own intuition

We can use this function to double check

against the actual apartment price value it will subtract each predicted value from the actual value and

Then it will square each of those differences

the Sigma that little a looking thing denotes that we are doing this not just for one data point but for

Every single data point we have M data points to be specific

This is our total error value. We can create a three dimensional graph now

We know the x axis and the y axis they will be all the potential m

and B values respectively

But let's add another axis the z axis and on the z axis would be all the potential error

values for every single combination of M and B if we were to actually graph this out

It would look just like this this kind of bowl like shape Cup it firmly in your hand like a nice

Bowl if we find that data point at the bottom of the bowl the smallest error value. That would be our ideal m

And B values that would give us the line of best fit

But how do we actually do that now we need to borrow from the math discipline?

as calculus the study of change

It's got an optimization technique called gradient descent that will help us discover the minimum value iteratively

It will use the error for a given data point to compute

What's called the gradient of our unknown variable and we can use the gradient to update our two variables?

Then we'll move on to the next data point and repeat the process over and over and over again

Slowly like a ball rolling down a bowl. We'll find what our minimum value is see calculus

Helps us find the direction of change in what direction?

Should we change the unknown variables MMB in our function such that its?

Prediction is more optimal aka the error is smallest, but apartment prices

Don't just depend on the price per square foot right

Also included are different features like the number of bedrooms and the number of

Bathrooms as well as the average price of homes within a mile if we factored in those features as well our regression line would look

More like this there are now multiple

variables to consider so we can call it a

multivariate regression problem the branch of math concerned with the study of

Multivariate spaces and the linear transformations between them is called linear algebra

it gives us a set of

operations that we can perform on groups of numbers known as matrices our training set now becomes an M by I

Matrix of M samples that have I feature x' instead of a single variable with a weight each of the features has a weight

So that's an example of how three of the four main branches of math dealing with machine learning are used

But what about the fourth probability all right?

So let's just scratch this example what if instead of predicting the price of an apartment?

We want to predict whether or not it's in prime condition or not we want to be able to

Classify a house with the probability of it being prime or not prime

Probability is the measure of likelihood of something we can use a probabilistic technique called logistic?

Regression to help us do this since this time our data is categorical as in it has different categories or classes

instead of predicting a value or predicting the

Probability of an occurrence since the probability goes between 0 and 100 we can't use an infinitely stretching line

We're left with some threshold passed some point X

We are more likely than not looking at a prime house

We'll use an s-shaped curve given by the sigmoid function to do this

Once we optimize our function will plug in input data and get a probabilistic class value

just like that so to summarize machine learning consists mainly of

statistics calculus linear algebra and probability theory

Calculus tells us how to optimize linear algebra makes executing algorithms feasible on massive data sets

Probability helps predict the likelihood of a certain outcome and statistics tells us what our goal is

This week's coding challenge is to create a logistic regression model from scratch in

Python on an interesting data set github links go in the comment section and winners will be announced in a week, please subscribe

For more programming videos and for now I've got to build

Thanks for watching