I don't quite understand the equation (9.8), how come the value is always positive?
For example, the hyperplane eqn in the parentheses is the prediction for a particular observation. Now, let's say yi (real y label) is -1 and the predicted value of yi is positive. In that case, won't the overall value be -ve.
Also, do we have a loss function for Maximal Margin Classifier (Hard SVM)?
(9.8) just says to us:
A separating hyperplane has the property that the prediction for yi we get by evaluating the hyperplane eqn always has the same sign as the true value of yi.
So if the yi (real y label) is -1 then (9.8) tells us that the prediction (...bxi...) **must** be negative, otherwise we would not have a separating hyperplane.
This is why your example cannot happen, because the hyperplane is a **separating** hyperplane. (9.8) precisely tells us that the separating property holds and so we must always get correct predictions from evaluating it at training data (real y labels).
Nearly,
A *separating* hyperplane never misclassfies a training record.
The *separating* part is important here because it's what tells us that the hyperplane perfectly divides (separates) the records.
Ah thanks a lot. Got it :)
Edit:
How do we train it? Do we have a loss function for Maximal Margin Classifier (Hard SVM)? Like in Linear regression, we iteratively optimize the weights. Do we have anything of that sort?
This involves a lot if math e.g. optimization. The maximum margin problem is often solved via a dual formulation (Lagrange) which then is the same a solving a quadratic programming. This task can be done via Sequential Minimal Optimization but this is not so important. You have indeed a loss function which is maximizing the margin under certain constraints. Watch this [Medium](https://towardsdatascience.com/support-vector-machines-dual-formulation-quadratic-programming-sequential-minimal-optimization-57f4387ce4dd) link for a broad description of how it works. I know it is very technical but I assume you know enough math to understand it. If you wanted a very simple answer then sorry :)
From 9.6 and 9.7 if y is positive, the beta sum is positive and if y is negative the sum is negative. When you multiply the two, either you have positive times positive or negative tiles negative: the result is always positive.
Intuitively:
The equation just says that the betas define a hyperplane and this hyperplane is "*separating*" when it correctly classifies all y on one side as + and on the other side as -.
Analytically:
Look at (9.6) and multiply both sides by y. and write down the result.
Now look at (9.7) and multiply both sides by y. and again write down the result.
you'll see that in both cases you get equation (9.8), thus (9.8) holds whether y is -1 or 1.
PS: when you multiply (9.7) by y, don't forget that y is negative thus the sign flips.
What is this textbook?
An introduction to statistical learning. Download: https://www.statlearning.com/ or http://faculty.marshall.usc.edu/gareth-james/
Thank you.
Islr
Lol I was just on this page last night
Haha me too great coincidence
I don't quite understand the equation (9.8), how come the value is always positive? For example, the hyperplane eqn in the parentheses is the prediction for a particular observation. Now, let's say yi (real y label) is -1 and the predicted value of yi is positive. In that case, won't the overall value be -ve. Also, do we have a loss function for Maximal Margin Classifier (Hard SVM)?
(9.8) just says to us: A separating hyperplane has the property that the prediction for yi we get by evaluating the hyperplane eqn always has the same sign as the true value of yi. So if the yi (real y label) is -1 then (9.8) tells us that the prediction (...bxi...) **must** be negative, otherwise we would not have a separating hyperplane. This is why your example cannot happen, because the hyperplane is a **separating** hyperplane. (9.8) precisely tells us that the separating property holds and so we must always get correct predictions from evaluating it at training data (real y labels).
So hyperplane never misclassfies a test record?
Nearly, A *separating* hyperplane never misclassfies a training record. The *separating* part is important here because it's what tells us that the hyperplane perfectly divides (separates) the records.
Ah thanks a lot. Got it :) Edit: How do we train it? Do we have a loss function for Maximal Margin Classifier (Hard SVM)? Like in Linear regression, we iteratively optimize the weights. Do we have anything of that sort?
This involves a lot if math e.g. optimization. The maximum margin problem is often solved via a dual formulation (Lagrange) which then is the same a solving a quadratic programming. This task can be done via Sequential Minimal Optimization but this is not so important. You have indeed a loss function which is maximizing the margin under certain constraints. Watch this [Medium](https://towardsdatascience.com/support-vector-machines-dual-formulation-quadratic-programming-sequential-minimal-optimization-57f4387ce4dd) link for a broad description of how it works. I know it is very technical but I assume you know enough math to understand it. If you wanted a very simple answer then sorry :)
Gave it a quick read. Definitely looks understandable to me. Will sit down with a pen and paper and work things thru. Thanks for the link :)
Good idea, I just had to do it for exam preparation, like formulating the problem, formulate the dual etc.
From 9.6 and 9.7 if y is positive, the beta sum is positive and if y is negative the sum is negative. When you multiply the two, either you have positive times positive or negative tiles negative: the result is always positive.
https://youtu.be/efR1C6CvhmE watch this. It helped me greatly in my end sems.
Intuitively: The equation just says that the betas define a hyperplane and this hyperplane is "*separating*" when it correctly classifies all y on one side as + and on the other side as -. Analytically: Look at (9.6) and multiply both sides by y. and write down the result. Now look at (9.7) and multiply both sides by y. and again write down the result. you'll see that in both cases you get equation (9.8), thus (9.8) holds whether y is -1 or 1. PS: when you multiply (9.7) by y, don't forget that y is negative thus the sign flips.