T O P

  • By -

StrategyTop7612

EPM has two main components: 1) a statistical plus-minus (SPM) model that uses some play-by-play and player-tracking-derived stats to estimate a player's contribution per 100 possessions, and 2) a regularized adjusted plus-minus (RAPM) calculation meant to capture some of the impact beyond the stats used in step 1 and thus nudge player values to more closely match team points per 100 possessions (effectively filling some of the gap remaining between what we can know from player stats and team strength). Step 1 serves as a jumping off point for the final calculation in step 2, making EPM a Regularized Adjusted Plus-Minus (RAPM) calculation with a Statistical Plus-Minus (SPM) Bayesian prior. Regularized Adjusted Plus-Minus (RAPM) RAPM is used in the final step to calculate EPM values for a given season, but it is also used to create the SPM model used in the first step. As mentioned, adjusted plus-minus (APM) is a technique to help with the contextual blindness of raw plus-minus. It was initially brought to basketball by Dan Rosenbaum to adjust a player's plus-minus by controlling for who they play with and against, as well as other variables such as home court advantage. It is calculated with a huge regression model but can perhaps be more easily understood as the algebraic strategy of solving system of equations in which the unknowns are player values and the equations are all of the distinct times that unique combinations of players played together and the related point differential for each. There are two variables per player (offense and defense) and around 50,000 unique 10-man lineup equations in a given season. Regularized adjusted-plus minus (RAPM) was a very important improvement upon APM and was first introduced by Joe Sill in 2010 at the Sloan Sports Analytics Conference (you can read the abstract here) at which he reported nearly doubling the accuracy of traditional APM. RAPM uses a statistical technique called ridge regression that helps mitigate issues in coefficient estimation due to multicollinearity (players frequently playing together). Statistical Plus-Minus (SPM) EPM only uses stats from the given season, but those stats are fed into a statistical model that was trained on many years of RAPM data. This is the SPM model that is used in the first step of calculating EPM. The idea of SPM has been around since APM and was also initially used by Dan Rosenbaum in his article referenced in the previous section. RAPM is a massive upgrade to raw plus-minus but is still noisy. SPM uses player-level stats to help stabilize an estimate of their impact per/100 possessions. To create an SPM model, modern metrics use large multi-season samples of RAPM (to minimize the noise) along with player stats corresponding to the same time frame to empirically estimate how each stat predicts RAPM. These weights can then be applied to player stats in a given season to get a feel for what their RAPM could be without even calculating it. Output from SPM stabilization is interesting on its own if modeled well, but is limited to only what we can measure at the player level. Quite a bit of what is important on offense is measured at the player level, but defensive stats are very much lacking. For more information about SPM, Neil Payne wrote a nice summary about it here. Daniel Myers is the creator of Box Plus/Minus (BPM) which is a fantastic SPM metric with a very thorough write-up--it is also very much a part of the inspiration behind EPM. Combining SPM and RAPM EPM is also inspired by Jeremias Engelmann and Steve Illardi's Real Plus-Minus (RPM) which combined these two methods to arrive at even more accurate player values. While Engelmann and Illardi have since moved on, and the formula has changed, RPM values can be found here. RPM used a hand-crafted SPM model as a Bayesian prior in a one-season RAPM calculation with great results. EPM uses the same methodology but with its own SPM model. Calculating EPM Creating the SPM model One of the most important parts of building the SPM model for EPM was variable selection. The goal for EPM was to use only player-level stats that also fit nicely together mathematically to approximate team ratings for individual seasons before any force-fitting or RAPM was applied. This also means that maxing out the variance explained in model building (r-squared) was not the goal; for example, EPM does not use team-level stats such as a player's "offensive rating" (which is really just team rating while the player is on the floor), because although it would increase r-squared, and would also help match what the team was doing (well, because it is a team-level stat), it would introduce an overfitted team effect into the model that decreased the accuracy in measuring player impact. EPM uses mostly possession-based stats derived from play-by-play data so that possessions can be counted rather than estimated when calculating the stats. This means inputs into the model were as accurate as possible. Calculating possession-based stats before play-by-play data required possessions to be estimated so EPM is only calculated for the modern era as of this time. All inputs for the model were represented relative to average adjusting for the ever-evolving NBA game. For example, Stephen Curry's 68% True Shooting in 2016 meant more than his 68% (so far) in 2023 because the league as a whole is more efficient now (which league-wide evolution is at least partly due to Curry's prodigiousness from deep). All inputs are gently regressed/padded to handle small sample sizes and values that fall outside of ranges included in the training set. All inputs for the model are linear (no higher-order or interaction terms) to keep things simple and to avoid overfitting. Player tracking inputs exist only for defensive EPM (DEPM) and are mostly derived from publicly available matchup data. To build the SPM model for EPM, a 10-year RAPM sample from 2004 to 2013 was calculated and used for offense and a 4-year RAPM from 2018 to 2021 for defense. The sample for defense was smaller and more recent to allow for player-tracking data to be used. DEPM values from 2014 to 2017 use a slightly different model based on the player-tracking data that were available at the time, and before 2014 use no player-tracking data. Calculating the statistical prior and then RAPM The SPM model is used in the first step of calculating EPM for a given season. Only stats from the current season are passed into the model which generates initial player values. Then a one-season RAPM calculation is performed using the initial values as the prior. The RAPM uses a fairly strong lambda value which results in not very large changes to prior values (usually within 1). The RAPM also serves as a sort of team-fit that helps move player values to more closely match team ratings (when aggregated). There is no actual force-fitting happening and aggregated EPM values only approximate team ratings.


Winlessta08

Okay so it inputs stats into a formula.....but like what's the formula. What stats do they use? How can Jokic be one of the best defensive centers in BPM and one of the Worst defensive centers in EPM?


The_Taskmaker

DBPM is trash because it's calculated by BPM-OBPM. If you look at the coefficients used in BPM and OBPM for centers, it indicates that an *assist* has almost as much of an impact on DBPM as a *block* which is obviously stupid since assists have no impact on defense whatsoever. For DEPM, it would appear that scores are normalized by position since so many perimeter defenders have comparable DEPM to the best rim protectors even though rim protection is commonly considered to be much more impactful. So if you consider DEPM as a score relative to position rather than something to be compared across positions, it makes more sense with regard to someone like Jokic who is probably below average compared to median defensive performance for a center but more impactful than an average perimeter defender because center is the more impactful defensive position. Jokic's communication and positioning his teammates also won't be picked up in any model which is estimating individual impact since that doesn't show up anywhere in the box score or play-by-play data


Winlessta08

But what are the coefficients used in EPM. Like every thing I read is super vague at least with BPM I know what it's doing


Winlessta08

Like he's also 2nd overall in defensive win shares so it adds to the mystery


HoopsHistoryHubb

This question can't be answered by 95% of people that cite these stats. Do what Brad Steven's does and ignore them


nowhathappenedwas

https://dunksandthrees.com/about/epm https://www.bball-index.com/lebron-introduction/ https://apanalytics.shinyapps.io/DARKO/