The Racermetrics Manifesto

by Sean Wrona

Welcome to Racermetrics. This site is designed to apply sabermetric-style thinking to auto racing analysis.

Auto racing fans have historically had many misconceptions when evaluating racing talent. Within the context of an individual race, we tend to most often celebrate the drivers who are hard chargers, who dominate, and who make lots of passes, particularly those who can do so without wrecking other drivers. However auto racing sanctioning bodies have adopted championship points systems that may or may not reflect the drivers who were actually the most relevant in the races. In recent years NASCAR Sprint Cup's Chase for the Championship, NHRA's Countdown, IndyCar's double points races and doubleheaders, and F1's double points finale at Abu Dhabi simply diminished the opportunity for the best drivers to be rewarded while increasing the luck factor. In mainstream sports playoffs are generally accepted, but honestly I'm not a fan of them there either. While I do believe there is something to clutch performance, it is rather overstated. In most sports, dominance now is the best indicator for long-run success, but omitting or ignoring data outside the playoffs simply provides too small of a sample size to measure the overall performance of an athlete. The reason for playoffs in the first place is to artificially increase the excitement and the luck factor (and therefore the ratings) in exchange for making it more likely to reward a team that is not the best. However, since every team usually does not play every other team in a league, there is some justification for playoffs in mainstream sports, not to mention the historical precedent. In racing it can be much worse as large fields of competitors increase the luck factor in and of themselves far more than is seen in traditional sports, every driver competes against the same drivers each race, and no racing series except possibly the World of Outlaws or USAC sprint cars has nearly enough races to entirely mitigate the luck factor and ensure a champion is the most deserving against a large field. Weighting certain races more than others (which seems to be spreading even worldwide, although thankfully, F1 dropped its double points finale for 2015) further increases the luck factor indicating the need for a different kind of assessment to compare drivers.

Giving specific races greater weight is not the only reason why championships may not be the best proxy for talent nor perhaps even the most important one. While auto racing is ostensibly an individual sport it is obviously just as much of a team sport as any other (particularly in any series which requires pit stops) and in order to determine the best drivers one must remove all team factors and that is the most challenging aspect of driver evaluation. One goal of Racermetrics is to distinguish the driver's accomplishments from the team's. While I haven't yet come up with an optimal formula to measure equipment strength, I do have several ideas for distinguishing between the driver and the pit crew's effects. Our goal is to only focus on the driver and ignore team factors as much as possible (except in columns when we are evaluating the teams rather than the drivers, which we may do). Hopefully I can eventually develop an equivalent of the sabermetric win shares concept to distinguish between the driver, pit crew, strategist, and engineers and give a proper share of the win for each. For starters, drivers will be primarily rewarded for speed under full racing or green flag conditions and for on-track passing ability, while they will receive no credit for passes in the pits or inheriting a position from a driver who encounters trouble on the racetrack. Additionally we are focusing primarily on race pace and ignoring attrition-related factors entirely. Yes, the phrase "to finish first, one must first finish" is a very famous auto racing quote, but since the late '90s, most racing series have had very good reliability. Retirements or DNFs (depending on which side of the pond you are on) are much less frequent yet in general most points systems have moved in the direction of rewarding consistency over top finishes - even Formula One (it took Lewis Hamilton quite a while to finally pass Nico Rosberg in the championship even though he was more dominant in the races...) NASCAR's points systems have always been horrific in this regard and the current base for the points system is even worse than the 1975-2003 system, which rewarded the best driver with the championship a little more often than the chase, but not that much more often. This push for consistency has led many fans to argue that the finish is the only thing that matters. In something like endurance sports car racing, that is more likely true, but in most series, this really isn't a factor now. Particularly when you want to predict what will happen in the next race or evaluate an overall career, looking at which drivers dominated the most and which drivers ultimately took the lead on track the most would be the most accurate way to assess talent. A race is a series of laps and a series of data points, with the last one certainly being the most important and deserving the highest weight, but when you want to predict what will happen in the next race, you have to look at a driver's pace and the laps before the final lap in aggregate matter more than the finish does, as it is way too easy (and has gotten easier) to for instance luck into a win on pit strategy as reliability and therefore the number of cars on the lead lap have increased. Should the team get credit for that in a theoretical team win shares measure? Yes, almost exclusively. Should the driver? Not so much. I'm not neglecting the idea that a driver can be a brilliant fuel saver (like Scott Dixon) and help contribute to a team's ability to fulfill its strategy, but I think in the long run pace is more important (which Dixon also has in spades). When an unexpected driver gets a good finish, many fans praise the driver for having a "good run", but this is inaccurate. Having a good run requires running well for the entire race, not just lucking into a good finish thanks to pit strategy or a wild restrictor plate finish. The finish matters, but in evaluating talent, it doesn't matter as much as people think it does, and consistent performance in the races is more reflective than consistent finishing positions, where luck plays too big a role, especially now that many series are giving certain races more weight than others.

Driver rankings

Another goal I eventually intend to fulfill with this website is a global driver ranking. Other sites have attempted this but have struggled to remain credible based on unequal series weights. For instance Autosport's world driver rankings have consistently listed 6-time V8 Supercars champion Jamie Whincup, one of the most consistently dominant drivers in the world, behind mid-tier Formula One and NASCAR drivers solely because the site rates F1 and NASCAR higher. The top F1 and NASCAR drivers deserve respect for sure. However, one opinion I hold very firmly is that I am going to try not to be a snob. While Formula One, IndyCar, NASCAR, sports car, and touring car fans don't always like all the other categories and too often tend to bash top drivers in other disciplines, I am going to start out by rating all national/international major leagues as equal. This is a controversial opinion since most people would say the best drivers are exclusively in F1, but once you view things through a single-series lens, that will inevitably open yourself up to more bias. F1 and NASCAR may have the majority of the most highly-paid drivers, and many people argue that money attracts talent by itself. However, I would argue that there really isn't a great way of proving that the best driver in one discipline is better than the best driver in another, and in an era of specialization there are fewer successful crossovers than there were decades ago. Switching from any premier series to any other premier series will usually lead to failure. Really, one could similarly argue comparing drivers in different racing series is futile, but I still want to do it but I may not start that for a few months. Ultimately in the long run money always comes down to how something well is marketed (not the skill of the participants as you don't suddenly become better at something when the endeavor you are participating in becomes more popular) so essentially this argument means that the drivers in better-marketed series are considered better than the drivers in worse-marketed series, but considering the most popular driver in each series is rarely the best, this argument collapses under its own weight to me. You can only compare drivers relative to the competition they are competing against, however. The drivers who separate themselves the most relative to the midpoint in a national or international top-level series should be considered the best in the world, and that would include drivers in F1, IndyCar, NASCAR, sports car racing, DTM, V8 Supercars, BTCC, WTCC, Formula E, etc... I prefer this to a system that rates NASCAR drivers over other touring series like V8 Supercars or DTM solely because it is more popular (or has more races), or automatically rates European sports car drivers over American ones likewise... Since F1 is a non-spec series compared to most of the other series I listed (as is European sports car racing), that allows for drivers to theoretically dominate much more, so the top drivers on any prospective list will likely be European drivers anyway, as the US tended to go in a spec direction allowing a much greater diversity in winners in exchange for less overall dominance (and in the eyes of many, less interesting racing technology).

We will probably only cover racing sanctioning bodies where drivers compete in a series of races on an oval or road racing circuit on the same track at the same time, as opposed to series like WRC and NHRA which have different formats and can't be evaluated using the same formulas. I will start out by rating top-level championships only and will not analyze feeders initially. I will not be the only contributor as I hope to have contributors for each type of racing. The formula I will be using to evaluate drivers is as follows:

For each category drivers will be ranked on a percentile scale from 0 to 100 with the top driver in that category in that race achieving the full weight (30 for average running position, 20 for finish, etc...) and the bottom driver in that category scoring 0. This does not take into account the strength of equipment, but should at least measure the overall performance over the entire race, not merely the finish. We may eventually adjust rankings for each category based on equipment strength.

Average running position - 30%

This will be defined as the average running position for a driver while running on the lead lap. This is a statistic that appears in NASCAR's loop data but is probably the best predictor for how a driver will do in future races and better than the actual finish.

Finish - 20%

The finish DOES matter and I DO believe there are some drivers who are clutch and others who are not, so it certainly deserves to be a significant factor, but I don't believe it is as important than the other laps combined.

Speed - 10%

Average speed under green-flag conditions. I will only include laps where the driver had a lap time within 10% of the fastest lap of the race, so pit stops, crashes, spinouts, and running out of fuel will not be considered in this, allowing it to be a direct measure of raw pace. However, speed by itself is very equipment-centric so I believe it should be a smaller factor than the others.

Passing - 20%

Drivers are ranked from best to worst passer using a points system I invented last year where passing the fastest driver nets you a point total equivalent to the number of drivers who competed in the race. For instance, in the 2014 Indianapolis 500, Ryan Hunter-Reay in addition to winning the race was also the fastest driver. Any driver who passed him earned 33 points, while any driver who passed the slowest driver (Pippa Mann) earned 1 point. Points are subtracted for being passed, so a driver who was passed by Hunter-Reay would lose 1 point while a driver passed by Mann would lose 33. Drivers are thus ranked based on WHO they passed for position, not just making passes in and of itself. While some people have measured passing before, I don't believe anyone has attempted a sliding scale to adjust for the speed of the drivers passed.

Natural peak - 20%

This is my measure of dominance. The natural peak is the driver's lap 1 position or best position obtained based on an on-track pass, whichever is better. Essentially the idea is that this is the best position that drivers obtain themselves - through qualifying or race conditions, while ignoring positions gained in the pits, positions inherited on the track, etc... Carlos Huertas's win at Houston and Scott Dixon's win at Mid-Ohio last year were completely due to staying out of the pits and winning on pit strategy - they did not pass cars left and right to come from the back of the field and win. Both had a natural peak outside the top fifteen which was very low. The term 'natural peak' comes from my term 'natural lead' reflecting leading the race via a green-flag pass as opposed to leading the race via inheriting the lead (what drivers do THEMSELVES versus what their team does for them).

I will consider calculating this for any major league top level series which I am able to obtain complete lap data (unfortunately NASCAR is not one of them) and then make a ranking of those drivers. Since I intend to use this same formula for all seasons, this can allow for interesting direct comparisons across widely differing series. Drivers' overall scores would be based on their average using this metric. To start with, I may just calculate this for each individual series. For sports car series, I will likely only consider the position on the final lap that each particular driver was in the car, which means multiple drivers on the same team could have very different positions on such a list, which reflects reality as most people would argue on most sports car teams there are superior and inferior drivers, and the superior driver usually finishes the race while an inferior driver who leaves the car in a weaker position should not receive credit for winning (at least in terms of evaluating talent) if this driver was not leading the race before leaving the car.

Now that I've finished boring you with walls of dialogue, after this point I will largely focus on compiling walls of data instead. Next week I intend to anticipate the 24 Hours of Daytona by recapping last year's race using this formula. I also intend to launch a series called "How the Races Were Won" where I will attempt to go through every F1, IndyCar (CART/Champ Car or IRL/IndyCar), and NASCAR Winston/Nextel/Sprint Cup race starting in 1990 to determine whether the race was won on-track (via a green-flag pass), off-track (via a pit stop exchange, etc...), or incidentally (due to the leader encountering trouble on the racetrack). I have a lot of this already done and I think I can produce those columns rather quickly so they will be suitable filler until NASCAR, F1, and IndyCar start their seasons in February and March.

Sean Wrona is the Managing Editor of racermetrics.com, the Webmaster of race-database.com, the winner of the 2010 Ultimate Typing Championship at the SXSW Interactive Conference in Austin, and the ratings compiler and statistician for the Mensa Scrabble-by-Mail SIG. He earned a master's in applied statistics from Cornell University in 2008 and previously digitized several seasons of NBA box scores on basketball-reference.com. You may contact him at sean@racermetrics.com.