# Fantasy Football Philosopher: How Much of a Sample Size Do We Need?

Projections are central to fantasy football. Analysts and managers alike are constantly analyzing trends and ascertaining if they can extrapolate them *forward*.

We’ve had plenty of surprising trends to start the 2021 season. On the upside, Mike Williams and Cooper Kupp are examples of wide receivers with surprisingly high involvement in their respective offensive schemes; as a result, their production has landed them atop the position. On the downside, perennial WR1 Allen Robinson has had a troubling start to the season, floundering as one of the lone bright spots in a struggling Bears offense. Naturally, for these players and others, the salient question is if these trends will *persist*. Is it time to sell high on Williams and Kupp and try to trade for a haul, or should we expect them to keep on dominating? Similarly, is Allen Robinson a buy-low opportunity, or should you trade him away the first chance you get?

A crucial metric – probably *the *crucial metric – used to determine if trends will continue is, naturally, *sample size.* Simply put, you would rather have a WR coming off of 7 strong games than a WR that just picked things up over the last two weeks.

This leads us to our philosophical question for today’s series: a larger sample size is better, but *how large does it need to be? *How does our confidence in predicting trends change as we see more games being put together? All data, unless otherwise specified, comes from nflfastr; we’ll be using data since 2010 with Half PPR scoring.

##### Methodology

Our approach will be pretty simple. Let’s imagine a single player; say, quarterback Tom Brady, from years 2016-2017.

- Determine a list of statistics that are important for the position. For QBs, for instance, this might be YPG, fantasy points per game, INTs, etc.
- Look at the final 10-game window from Brady’s 2016 season and calculate his YPG. Then, calculate the YPG that he threw for in the entire 2017 season.
- Repeat step (2), but for game window sizes 1 – 9 and for the other statistics (fantasy points per game, INTs, etc.).

Here is a snippet of what our data might look like:

For example, the first row says that, in the final eight weeks of 2016, Brady thew 0.25 INTs per game and, in the entire 2017 season, Brady threw for 0.5 INTs per game. Naturally, we are going to use this data to tell us how good previous game data is at *predicting *future game data

Anyways, we can complete the same steps above for all players from 2010 – 2020. I do throw out players that change teams in between years, since this means that they are in entirely different situations and it doesn’t make as much sense to project old trends forward. For example, even before the injury, we didn’t expect Will Fuller V’s excellent season with Deshaun Watson last year to translate into the *same *level of success with Tua and the crowded WR room in Miami.

You might notice that I’m looking at data *across *seasons, while we are concerned with *in-season *trends; that is, how do the first 6 weeks of the year affect things going forward? Frankly, it’s far easier to use projections across seasons because of the clean break in between years, and I took this approach because I don’t expect the result to change that much if I’m using across-seasons or in-season. That’s because while, naturally, *predictions* will be better if we use games in the *same *season, the relative amount that *sample sizes* improve projections should be similar whether we are looking across years or within a year. That is to say, you can likely apply these conclusions *right now *to a player you are predicting for the rest of the year; or, at least, I can’t think of a reason why you couldn’t. Without further ado, let’s look at the results.

##### Quarterbacks

We’ll consider these results positionally, starting with quarterbacks. The relevant statistics here are INTs, fantasy points (‘PTS’), touchdowns and yards, all on a per-game basis.

The crux of our analysis here will be a regression built using data from the previous year to project a year forward. This regression can tell us *how much we expect our prediction to be off from the truth.* *Here are some charts:

Before we get into any discussion, let’s work out how to interpret these. Consider the right-most dot in the ‘PTS’ (fantasy points per game) chart. This dot is saying that, if you use a 10-game window at the end of a season to predict the fantasy PPG of a QB throughout all of the *next *season, you can expect your estimate to be off by just over 4.0 PPG. That’s 4.0 PPG *higher *or *lower*; if your estimate is 18 PPG, a reasonable range to expect is 14 PPG to 22 PPG (or 18 +/- 4).

In this case, you technically expect the player to be in this range with 95% confidence. This range is pretty large: the difference between a 14 PPG QB and a 22 PPG QB is significant! This *wide range* of outcomes is because we have such a *high confidence level*: this implies that, 95% of the time, the player will end up in this range. If we used a *lower *level of confidence, the range would get smaller; even for a level of confidence of 80%, which is reasonable for fantasy football, the range would be *much *smaller. Therefore, don’t worry too much about the *size *of the range in *absolute *terms, but how the range *changes relatively *as we vary sample size.

- As expected, larger windows – a higher sample size (x-axis) – means predictions are better, and your ‘error’ (the y-axis) will go down. This is intuitive: the more you see of a player, the better your future projections will be!
- I notice a ‘kink’ in the lines at a game window of 6: every extra game before a 6-game window decreases the error by a lot, but every game added
*after*6 games doesn’t move the error as much. This makes sense: 6 games is a decent chunk of time, and feels like about the right number to really get the sense of a player; more games doesn’t help as much, since you’ve already formed an opinion! - YPG benefits the most from a larger sample size. With a 1-game window (using just the last game of the season), your estimate will be off by nearly 80 yards, but with a 10-game window, your estimate will only be off by about 46 yards. Fantasy points per game, probably the most important statistic, improves significantly from nearly a 6.0 error to just above a 4.0 error. By contrast, INTs are very hard to predict, and larger sample sizes don’t
*really*help that much; this makes sense, since INTs happen the least often and are probably considered the ‘most random’.

##### Running Backs

The stats we consider here are fantasy points (‘PTS’), rushing yards, receiving yards (‘YPG_REC’), and TDs, all on a per game basis.

- Again, similar to QBs, the lines start to flatten out around a 6-game window. This implies that a 6-game stretch is reasonable to use when projecting future performance; using more games won’t improve predictions
*that much.* - The most important metric here, fantasy PPG, goes from an error of about 4.8 PPG with a one-game window to 4.0 PPG with a ten-game window. Rushing yards goes from an error of over 23 YPG to under 19 YPG. These sample size improvements are not as significant as the improvements we saw with QBs; more on this later.
- TDs are difficult to predict; added sample size doesn’t improve the predictions that much. This is due to both rarity – TDs don’t happen as much – and simple randomness (goal-line personnel packages, etc.).

##### Wide Receivers

The statistics we consider here are fumbles, fantasy points (‘PTS’), TDs, and yards, all on a per game basis.

- Like we saw with QBs and RBs, the line ‘kinks’ – starts to flatten out a bit – at 6 games. This implies, again, that a 6-game window for WRs is a decent size for projecting performance.
- Touchdowns per game are very hard to predict for WRs: a 10-game window barely reduces the error of the prediction compared to just a 1-game window. Fumbles are nearly impossible to improve a prediction on (the error barely moves; check out how little the y-axis changes from top to bottom), which makes sense. You can see both of these lines jump around – even increasing at points – due to this randomness.
- YPG predictions are the most improved by sample size: the error falls from 22 to just over 18, which is similar to the improvement for RBs. Fantasy PPG predictions do improve by (relatively) more: an error of about 3.2 for a 10-game window and an error of 4.0 for a 1-game window.

##### In Conclusion

That was a lot of charts and numbers, as well as some confusing statistical topics, so let’s try to boil down the main takeaways:

- Across the positions,
**six****games****is a reasonable sample size to use to project forward**. There’s a reason that this article is coming out after Week 6; we now have a*pretty good idea*of what to expect from fantasy players, and it is more likely that trends persist. Adding more games will certainly help improve projections,*but not by much.*If you are considering the last six weeks of a player’s performance, you can be relatively confident – or as confident as you can be when making projections – in estimating player performance going forward. Remember, try to ignore the*wide ranges*of these projections: we are using a very high level of confidence, so the ranges will be quite large. More important is the*relative error improvement*with higher sample sizes. - Statistics like TDs, INTs and fumbles are much more random and thus don’t see as much estimate improvement from larger sample sizes compared to yards and fantasy points. This is important when considering a player on a TD streak, like James Conner earlier this year; conventional wisdom tells us that it’s not reasonable to continue projecting a high-TD rate, and the data bears this out.
- In terms of fantasy points – likely the most important statistic – QB projections are most improved by larger sample sizes, then WR projections, then RB projections. This is interesting, and it’s not exactly clear why! One possible explanation is
*within game*sample size: QBs get more touches than the other positions, so we observe more data for a QB in one game than for a RB in the same game (usually). Another is the relative longevity and injury rate of QBs and WRs compared to RBs, who ‘age out’ quickly and get hurt more often.

Above all, remember that *context is important. *These numbers are aggregates, and you, the fantasy manager, must decide if trends for a specific player should continue. Plenty of variables like injury, age, personnel, coaching, etc. factor in. Use this to your advantage, but always be sure to understand the specific situation before making a decision!

______

Did I miss something? Want to philosophize about something else? Message me on Twitter.

*This is using the *residual standard error.*

## Comments

Yes, this is certainly a good point; I do try to note in the article when the y-axis is incredibly ‘zoomed in’ and thus the trend line we see is incredibly insignificant, usually with more ‘random’ stats like TDs and INTs, as you mentioned.

I definitely hear you, though: these charts could be a bit more clear. I was considering working with different error metrics but settled on the default in the interest of being as interpretable as possible…but you make a great point, and the % measurement idea is a very good one. Thanks for reading and commenting, I’ll work on including these ideas in future articles!

In my opinion, the y-scale for the error plots should start at 0 for every plot. There is a relatively significant trend for some but for others the scaling grossly misrepresents the results. Take the RB TDs error for example: on a glance it seems that sample size matters, but we’re really only looking at a range of roughly 0.03 TDs. So the way it’s currently displayed is a bit disingenuous.

If there’s concern over the high range of some statistics looking awkward or masking any actual trend due to the scaling (yards would be the worst probably), perhaps you could set the 1 game window error as a baseline and plot the “% Error Improvement” for the subsequent windows, or something like that. This probably has limited statistics justification but would translate easily for the audience.

Alternatively, you could use a normalized error metric like MASE or NRMSE, but you may have to sidestep an explanation.