No, this isn’t about politics. You can stop reading now if that’s where you thought it was going. If, on the other hand, you thought this was going to be about basketball, I congratulate you on your perspicacity and invite you to accompany me on a magical journey.
One of the things that I’m struck by as I look at advanced basketball analytics is how relatively model-free the whole enterprise seems. There are a couple of good methods that you see show up over and over again: regression analysis, adjusting plus/minus stats to remove bias, and estimating overall player contribution, usually in terms of either points or wins produced or some similar derived statistic. Nothing wrong with any of this of course, and it yields a certain amount of valuable insight. But coming from the science world as I do, and particularly coming from a recent background in working in cognitive science, what I find interesting about these analytics is that they seem mostly unconstrained by the game itself.
What do I mean by that? I don’t want this to sound like I’m saying “this is a terrible thing” or “everyone is an idiot” which is not the case. All I’m saying is that the way these models are constructed relies very heavily on box scores and backing out weights of various elements of player performance via regression. What they don’t do in any explicit way (modulo a few exceptions, like 3-point shooting and per-minute analysis) that I’ve seen is incorporate top-down constraints from the actual game itself into the analysis.
What would such constraints look like? One very obvious thing that comes to mind is the shot clock: you have 24 seconds to put the ball in the hoop, and if you fail to do so your opponents get the ball. Another constraint is the ball itself: there’s only one, and only one player (and one team) can possess it at any given time. The court boundaries are obviously also constraints, as is the fact that you can’t camp in the lane (on either offense or defense). And so on.
At first glance these seem like trivial statements, as anyone who pays the slightest amount of attention to basketball will understand them to be obviously true. We can debate the value of LeBron James (really high or really, really high?) all day if we want, but no one is arguing that the shot clock is anything other than 24 seconds. But I think that because the information is so obvious, it may have escaped incorporation into interesting analyses. I’m really just throwing thoughts out there, but if the end goal of each possession is to get the ball in the hoop, and you’re looking for a method which accomplishes this with maximum efficiency, you are really optimizing within the bounds set by the constraint of the shot clock. In a completely unsurprising turn of events, each possession becomes a constrained optimization, although not one that is expressed in terms of any simple objective-function (indeed the landscape here is certain to be very complex).
The reason such constraints are valuable in other fields is because they set hard limits on what you can and can’t do. In physics, for example, you know that whatever happens, you can’t extract work by moving heat from a cold body to a warm body; that’s the second law, and if you find that your theory has violated it, you know you’ve done something wrong. In cognitive modeling, life is a bit more complex because the constraints are often empirical; for example, ACT-R, a cognitive architecture I work with, commits to a certain (experimentally validated) model of working memory decay. That’s an architectural constraint on the kinds of dynamics that an ACT-R model can exhibit. Other architectures make other assumptions, etc. The point of the constraints is that, although you may sacrifice the freedom of your model to generate any output by embracing them, you gain the security of reducing your outcome space and the freedom to focus on things that you think are relevant. The nice thing about basketball is that the constraints are baked-in via rules, so we don’t have to guess at what they are; we “just” have to take them into account.
The reason I think this is important for basketball is because I think the box score analytics game is largely played out. By this I don’t mean that the analytics are useless, but my strong suspicion is that virtually everything that can be extracted from such information, has been. Obviously people are doing more complicated stuff now with line-ups and stuff, but since “keep your best line-up on the floor for 48 minutes” is not a viable strategy (another constraint!), a coach is faced with a complicated process of decision-making when it comes to rotations. The question for analysts is this: is there any way that a systematic breakdown of basketball dynamics which takes seriously the various constraints presented by the game rules can aid coaches (and players) in making decisions on the floor? I think, optimist that I am, that the answer is “yes” but the work of getting there will require a lot more than just box score information.