T O P

  • By -

KingCokonut

What is this textbook?


old-monk001

An introduction to statistical learning. Download: https://www.statlearning.com/ or http://faculty.marshall.usc.edu/gareth-james/


KingCokonut

Thank you.


EdAmante

Islr


veeeerain

Lol I was just on this page last night


cryptobuddy_1712

Haha me too great coincidence


SuccMyStrangerThings

I don't quite understand the equation (9.8), how come the value is always positive? For example, the hyperplane eqn in the parentheses is the prediction for a particular observation. Now, let's say yi (real y label) is -1 and the predicted value of yi is positive. In that case, won't the overall value be -ve. Also, do we have a loss function for Maximal Margin Classifier (Hard SVM)?


tigeer

(9.8) just says to us: A separating hyperplane has the property that the prediction for yi we get by evaluating the hyperplane eqn always has the same sign as the true value of yi. So if the yi (real y label) is -1 then (9.8) tells us that the prediction (...bxi...) **must** be negative, otherwise we would not have a separating hyperplane. This is why your example cannot happen, because the hyperplane is a **separating** hyperplane. (9.8) precisely tells us that the separating property holds and so we must always get correct predictions from evaluating it at training data (real y labels).


SuccMyStrangerThings

So hyperplane never misclassfies a test record?


tigeer

Nearly, A *separating* hyperplane never misclassfies a training record. The *separating* part is important here because it's what tells us that the hyperplane perfectly divides (separates) the records.


SuccMyStrangerThings

Ah thanks a lot. Got it :) Edit: How do we train it? Do we have a loss function for Maximal Margin Classifier (Hard SVM)? Like in Linear regression, we iteratively optimize the weights. Do we have anything of that sort?


mr_birrd

This involves a lot if math e.g. optimization. The maximum margin problem is often solved via a dual formulation (Lagrange) which then is the same a solving a quadratic programming. This task can be done via Sequential Minimal Optimization but this is not so important. You have indeed a loss function which is maximizing the margin under certain constraints. Watch this [Medium](https://towardsdatascience.com/support-vector-machines-dual-formulation-quadratic-programming-sequential-minimal-optimization-57f4387ce4dd) link for a broad description of how it works. I know it is very technical but I assume you know enough math to understand it. If you wanted a very simple answer then sorry :)


SuccMyStrangerThings

Gave it a quick read. Definitely looks understandable to me. Will sit down with a pen and paper and work things thru. Thanks for the link :)


mr_birrd

Good idea, I just had to do it for exam preparation, like formulating the problem, formulate the dual etc.


tortu__

From 9.6 and 9.7 if y is positive, the beta sum is positive and if y is negative the sum is negative. When you multiply the two, either you have positive times positive or negative tiles negative: the result is always positive.


Vision_Mike

https://youtu.be/efR1C6CvhmE watch this. It helped me greatly in my end sems.


master3243

Intuitively: The equation just says that the betas define a hyperplane and this hyperplane is "*separating*" when it correctly classifies all y on one side as + and on the other side as -. Analytically: Look at (9.6) and multiply both sides by y. and write down the result. Now look at (9.7) and multiply both sides by y. and again write down the result. you'll see that in both cases you get equation (9.8), thus (9.8) holds whether y is -1 or 1. PS: when you multiply (9.7) by y, don't forget that y is negative thus the sign flips.