Drafting Batters in Fantasy Baseball, part 3

Where were we?

If you’re reading about my fantasy baseball experience for the first time, welcome! You may be better oriented by reading this first.

In the last two posts, I wrote about my draft strategy using projected WAR, and explored fantasy talent by defensive position. I discovered that WAR (Wins Above Replacement) was not a great way to select players given my league’s scoring categories. In the absence of a summary statistic to guide my decisions, I looked at all of the scoring categories. I was pretty frazzled by the end of the draft.

A better strategy

In doing research for this post, I found this article on setting up my own rankings based on my league scoring categories, using z-scores. It’s a simple statistic that identifies what players are above the mean, and by how much. This is so simple, I was kicking myself for using it on draft day. Here’s how it works.

Create Z-Scores

bat_z <- batters %>%
  filter(PA >= 300) %>%
  select(playerid, position, Name, Team, R, HR, RBI, SO, SB, OPS, WAR) %>%
  mutate(R_z = z_score(R),
         HR_z = z_score(HR),
         RBI_z = z_score(RBI),
         SO_z = -z_score(SO),
         SB_z = z_score(SB),
         OPS_z = z_score(OPS),
         tot_z = round((R_z + HR_z + RBI_z + SO_z + SB_z + OPS_z), 3))

I took the same batters dataset and filtered it just to include those with 300 or more plate appearances. I wanted to exclude players without sufficient playing time, who may have really low predicted runs, home runs, RBIs, strikeouts, or stolen bases just because of small samples. These players may also have extreme predicted OPS statistics (really high, or really low) because of small samples. I didn’t want to draft players who weren’t projected to play for most of the season, and 300 plate appearances is roughly two appearances per game. This effectively halved the batters in my dataset.

Once I filtered based on that criteria, I calculated the mean and standard deviation for the remaining players, and used that to calculate a z-score, indicating how extreme (either positive or negative) that player’s numbers are relative to the mean. Generally, the higher the z-score, the better. A z-score of 4 for home runs is definitely someone I’d love to have on my team. I calculated z-scores for all of the scoring categories, multiplied strikeouts by -1 so they were all on the same scale (we want fewer strikeouts), and then I summed them all to get an overall z-score (I named this variable “tot_z”). This overall z-score looks at all the scoring categories and lets me know how that player compares to others.

(This is a similar calculation I used to compare the scoring categories with WAR in the first post. Same calculation, different intention.)

So now let’s look at our top players.

Who’s on top?

bat_z %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()
position Name R HR RBI SO SB OPS WAR tot_z
outfield Mike Trout 114 39 105 131 22 1.027 8.2 11.889
outfield Giancarlo Stanton 109 58 140 171 2 1.029 6.4 11.869
third_base Nolan Arenado 97 39 118 101 3 0.937 5.0 8.766
outfield Bryce Harper 100 35 102 122 10 0.984 5.6 8.646
first_base Anthony Rizzo 97 34 107 98 9 0.927 4.7 8.343
outfield Mookie Betts 100 24 90 73 23 0.871 5.6 8.184
first_base Paul Goldschmidt 101 31 103 147 17 0.927 4.3 7.618
second_base Jose Altuve 94 20 82 73 28 0.859 4.8 7.434
short Carlos Correa 96 30 113 121 8 0.894 6.1 6.873
first_base Cody Bellinger 91 39 110 159 13 0.882 3.6 6.782
outfield Cody Bellinger 91 39 110 159 13 0.882 3.6 6.782

Trout’s on top, not surprisingly.

There is considerable overlap between this list of names and players with high projected WAR, but now this list accounts for projected stolen bases, and projected strikeouts. This is much easier to keep track of in the moment.

Forgive the digression, but I’m still beating myself up about Buster Posey.

Going back to the question I explored in the last post about positional talent, would I have had a different pool of talent to choose from if I’d looked at z-scores instead of WAR?

bat_z %>% 
  filter(position == 'catcher') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()
position Name R HR RBI SO SB OPS WAR tot_z
catcher Gary Sanchez 72 31 90 115 3 0.842 3.5 3.259
catcher Evan Gattis 71 30 94 118 1 0.790 1.6 2.198
catcher Buster Posey 64 14 69 62 4 0.821 4.5 1.206
catcher Willson Contreras 66 20 77 116 6 0.800 3.0 0.773
catcher Salvador Perez 59 23 74 99 1 0.752 2.8 -0.272
catcher Brian McCann 54 20 66 81 1 0.752 2.3 -0.773
catcher Jonathan Lucroy 53 11 56 63 2 0.794 2.9 -1.074
catcher Yadier Molina 54 12 67 73 6 0.724 2.5 -1.179
catcher Wilson Ramos 49 20 67 83 1 0.739 2.1 -1.261
catcher J.T. Realmuto 57 13 55 91 8 0.742 2.5 -1.495

In addition to Gary Sanchez, who I identified earlier, Gattis would have also been a good pick based on his projected homeruns and runs batted in. His WAR is quite low, which is why he didn’t end up on my radar before. Let’s look at shortstops too, for completeness.

bat_z %>% 
  filter(position == 'short') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()
position Name R HR RBI SO SB OPS WAR tot_z
short Carlos Correa 96 30 113 121 8 0.894 6.1 6.873
short Trea Turner 89 16 66 119 49 0.793 3.7 6.200
short Francisco Lindor 92 24 83 84 15 0.842 5.8 5.656
short Corey Seager 89 24 87 120 4 0.853 5.2 3.458
short Elvis Andrus 80 12 69 88 23 0.745 2.1 2.386
short Xander Bogaerts 86 15 75 110 11 0.789 3.4 1.904
short Trevor Story 82 30 93 203 11 0.791 1.9 1.586
short Ian Desmond 69 20 74 128 16 0.781 0.5 1.402
short Didi Gregorius 72 21 80 83 5 0.743 2.6 1.375
short Jean Segura 77 13 56 93 23 0.720 2.0 1.175

Simmons isn’t even on the list! Ouch. Given who was available by the fourth round, I maintain that Bogaerts might have been a fine pick (identified based on projected WAR in the last post) but it looks like Andrus might have also been helpful for his projected stolen bases and low projection for strikeouts.

Let’s look at my pick for second base, since prior analysis determined that it was also a position with scarce offensive talent.

bat_z %>% 
  filter(position == 'second_base') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>%
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()
position Name R HR RBI SO SB OPS WAR tot_z
second_base Jose Altuve 94 20 82 73 28 0.859 4.8 7.434
second_base Jose Ramirez 92 21 84 67 20 0.849 4.8 6.574
second_base Brian Dozier 96 30 84 132 14 0.825 3.7 4.856
second_base Daniel Murphy 80 19 87 70 4 0.859 2.7 3.903
second_base Rougned Odor 85 31 92 139 14 0.776 1.7 3.809
second_base Jonathan Schoop 82 31 98 137 2 0.793 3.0 2.766
second_base Robinson Cano 78 23 88 91 2 0.795 2.9 2.501
second_base Ian Happ 75 27 82 157 11 0.798 2.1 1.726
second_base Dee Gordon 78 4 39 91 46 0.674 1.9 1.598
second_base Whit Merrifield 74 12 61 97 25 0.732 2.1 1.441

Given that no one who ranked above him was available in my league by the eighth round, Odor seems to be a reasonable pick.

Now that I know using z-scores would have changed my picks for catcher and shortstop (but not second base), I’m going to look at z-scores for the rest of my draft picks.

My draft picks

Below, I filtered the full dataset to only include players I drafted. For reference, my team name is “Dropped Third Strike”, after the obscure baseball rule (shortened here to DTS for object-naming). I added in information on draft order as well.

DTS_bat <- as.data.frame(cbind(c("Mookie Betts", "Buster Posey", "Andrelton Simmons", "Edwin Encarnacion", "Rougned Odor", "Mike Moustakas", "Adam Jones", "Manuel Margot", "Brandon Crawford", "Max Kepler", "Brandon Belt", "Stephen Piscotty", "Maikel Franco", "Jose Peraza"), c(01, 02, 04, 05, 08, 11, 12, 13, 18, 20, 21, 23, 24, 25)))
names(DTS_bat) <- c("Name", "draft_order")

drafted <- inner_join(DTS_bat, bat_z, by = "Name") 
drafted %>%
  select(Name, draft_order, position, tot_z) %>%
  knitr::kable()
Name draft_order position tot_z
Mookie Betts 1 outfield 8.184
Buster Posey 2 catcher 1.206
Andrelton Simmons 4 short 0.550
Edwin Encarnacion 5 first_base 5.724
Rougned Odor 8 second_base 3.809
Mike Moustakas 11 third_base 2.995
Adam Jones 12 outfield 2.180
Manuel Margot 13 outfield -0.563
Brandon Crawford 18 short -0.540
Max Kepler 20 outfield 0.394
Brandon Belt 21 first_base 1.091
Stephen Piscotty 23 outfield -0.617
Maikel Franco 24 third_base 1.819
Jose Peraza 25 second_base -0.552
Jose Peraza 25 short -0.552

I ended up drafting four batters with negative projected z-scores (meaning they are projected to perform below average): Margot, Crawford, Piscotty, and Peraza. And fandom bias strikes again, because I drafted three Giants, and one player from every other team. I already mentioned that Buster Posey was not the best pick at the catcher’s position, but I also ended up picking up Crawford, who had mediocre to bad projected z-scores for many scoring categories. This was a bad pick, and I ended up dropping him early in the season.

I’m a little surprised by Franco, who has a rather high z-score for being available until the 24th round, and Belt for the same reason (21st round).

Up Next

In the next post, I’ll wrap this up and look at how my players actually did, comparing the projection to the final 2018 data. Which players were truly bad picks? Who outperformed their projection? Stay tuned!

Drafting Batters in Fantasy Baseball, Part 4