Drafting Batters in Fantasy Baseball, Part 4

Where were we?

If you’re reading about my fantasy baseball experience for the first time, welcome! You may be better oriented by reading this first.

In the last three posts, I wrote about my draft strategy using projected WAR, and explored fantasy talent by defensive position. I discovered that WAR (Wins Above Replacement) was not a great way to select players given my league’s scoring categories. I calculated z-scores for my scoring categories instead, and looked at how that would have changed my draft picks. I did some deep reflection on the danger of fandom bias.

Projections vs. 2018 season statistics

Up to this point, we’ve been looking at projections for 2018. Now that the season is over, we can see just how these projections played out. I used the final 2018 stats from Fangraphs, and compared them to the projected stats. I’ve renamed all final stats to include the f_ prefix for clarity, calculated z-scores for each stat, and over the total, and then I merged this dataset to the full batters dataset, to facilitate comparison. I also merged it to the subset of players I drafted to look specifically at my team, Dropped Third Strike.

end_bat_z <- read.csv("../data/post1/batters_final.csv") %>%
  rename(f_R = R,
         f_HR = HR,
         f_RBI = RBI,
         f_SO = SO,
         f_SB = SB,
         f_OPS = OPS,
         f_WAR = WAR) %>%
  filter(PA >= 300) %>%
  mutate(f_R_z = z_score(f_R),
         f_HR_z = z_score(f_HR),
         f_RBI_z = z_score(f_RBI),
         f_SO_z = -z_score(f_SO),
         f_SB_z = z_score(f_SB),
         f_OPS_z = z_score(f_OPS),
         f_tot_z = round((f_R_z + f_HR_z + f_RBI_z + f_SO_z + f_SB_z + f_OPS_z), 3),
         playerid = as.character(playerid)) %>%
  select(-Team)
  
all_final <- inner_join(end_bat_z, bat_z, by = c("playerid", "Name"))
drafted_final <- inner_join(end_bat_z, drafted, by = c("Name"))

I’m going to start by looking at who I drafted. How did the final z-scores differ from the projections?

drafted_final <- drafted_final %>% 
  mutate(diff = f_tot_z - tot_z,
         change = case_when(
           diff < -1 ~ "underperform",
           diff > 1 ~ "outperform",
           TRUE ~ "as expected"
         ))

Let’s start by looking at those who outperformed their projections. Warning for those on mobile: these tables are wide, and you may not see all the relevant columns.

drafted_final %>% 
  filter(change =="outperform") %>% 
  select(Name, draft_order, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name draft_order R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Mookie Betts 1 100 129 24 32 90 80 73 91 23 30 0.871 1.078 8.184 12.597
Stephen Piscotty 23 69 78 17 27 70 88 125 114 6 2 0.757 0.821 -0.617 3.025
Andrelton Simmons 4 70 68 11 11 67 75 64 44 13 10 0.710 0.754 0.550 2.506
Jose Peraza 25 57 85 7 14 49 58 74 75 26 23 0.688 0.742 -0.552 3.382
Jose Peraza 25 57 85 7 14 49 58 74 75 26 23 0.688 0.742 -0.552 3.382

Betts blew his projections out of the water, hitting more home runs and increasing his OPS by quite a bit. Simmons showed much better plate discipline, striking out much less often, but his other categories didn’t dramatically improve. Piscotty did dramatically better than his projections in several different categories. Even Peraza, who had a negative z-score in his projections, ended up finishing the season on a high note, scoring more runs and hitting twice as many home runs as projected.

Now let’s look at the underperformers, who performed below their projections.

drafted_final %>% 
  filter(change =="underperform") %>% 
  select(Name, draft_order, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name draft_order R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Edwin Encarnacion 5 92 74 36 32 109 107 131 132 2 3 0.869 0.810 5.724 3.710
Brandon Belt 21 78 50 21 14 77 46 145 107 5 4 0.832 0.756 1.091 -1.919
Buster Posey 2 64 47 14 5 69 41 62 53 4 3 0.821 0.741 1.206 -1.957
Rougned Odor 8 85 76 31 18 92 63 139 127 14 12 0.776 0.751 3.809 0.841
Adam Jones 12 79 54 28 15 85 63 112 93 3 7 0.774 0.732 2.180 -0.336
Brandon Crawford 18 67 63 17 14 82 54 127 122 5 4 0.742 0.719 -0.540 -1.804
Manuel Margot 13 64 50 12 8 50 51 96 88 19 11 0.720 0.675 -0.563 -1.848

Posey, whom we’ve alreaady determined at this point was drafted too early and was a poor choice based on his projection, added salt to the wound by underperforming. I knew this even without looking at the stats, given his abysmal offensive season, but this confirms it.



Encarnacion scored fewer runs than his projections, but otherwise still provided good offensive numbers. Odor showed better plate discipline (fewer strikeouts), but his offensive output decreased dramatically. Jones also had a poor offensive year. Margot was a bad draft pick, who started out with poor projections and got even worse, as did Crawford. Belt improved his plate discipline, but his offensive numbers also tanked.

Now let’s look at those who performed as expected, whose z-scores changed by 1 or less.

drafted_final %>% 
  filter(change =="as expected") %>% 
  select(Name, draft_order, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name draft_order R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Mike Moustakas 11 73 66 30 28 85 95 90 103 1 4 0.812 0.774 2.995 2.926
Maikel Franco 24 68 48 25 22 86 68 91 62 1 1 0.788 0.780 1.819 1.113
Max Kepler 20 67 80 19 20 73 58 109 96 7 4 0.769 0.727 0.394 0.706

Moustakas improved his RBIs, and got a little better at base stealing, but was otherwise pretty close to his projections. Kepler scored a few more runs, and was a bad pick to start with, but was at least consistent. Franco showed better plate discipline (fewer strikeouts), but was otherwise the same.

What-if land

Not content to sit with the bad decisions I made, I engaged in some ill-advised counterfactual exploration, and looked to see what might have happened had I drafted the players with higher z-scores, rather than drafting based on WAR.

First Base

I looked at which players had higher projected z-scores than Encarnacion, and whether they were available at the time I drafted Encarnacion in the fifth round.

all_final %>% 
  filter(position == 'first_base') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Anthony Rizzo 97 74 34 25 107 101 98 80 9 6 0.927 0.846 8.343 4.988
Paul Goldschmidt 101 95 31 33 103 83 147 173 17 7 0.927 0.922 7.618 4.246
Cody Bellinger 91 84 39 25 110 76 159 151 13 14 0.882 0.814 6.782 2.816
Joey Votto 95 67 28 12 92 67 105 101 5 2 0.952 0.837 6.493 0.471
Rhys Hoskins 92 89 36 34 111 96 140 150 5 5 0.877 0.850 6.048 4.295
Freddie Freeman 92 94 31 23 93 98 134 132 8 10 0.935 0.892 5.989 5.032
Edwin Encarnacion 92 74 36 32 109 107 131 132 2 3 0.869 0.810 5.724 3.710
Carlos Santana 78 82 27 24 80 86 95 93 5 2 0.859 0.766 3.736 2.800
Jose Abreu 77 68 29 22 95 78 115 109 2 2 0.860 0.798 3.713 1.438
Joey Gallo 92 82 42 40 100 92 236 207 7 3 0.839 0.810 3.108 2.049

This makes me feel a little better – the first basemen with higher projected total_z scores were drafted prior to my turn in the fifth. Most of them underperformed their projections, but still did well, except for Joey Votto, whose runs scored, home runs, and runs batted in are much lower than projected, leading to his abysmal final z-score this year. Other first basemen who were projected to perform worse than Encarnacion also underperformed their projection. All things considered, Edwin wasn’t a bad draft pick.

Outfield

I was really happy that I drafted Betts in the first round, given that he had high z-scores to begin with and then subsequently outperformed his projection. I looked at the next outfielder I drafted, who was Jones in the 12th round. Since all outfielders (except Brantley) with higher z-scores were drafted prior to my pick in the 12th round, I’ll exclude them from the table for simplicity’s sake.

all_final %>% 
  filter(position == 'outfield' & tot_z <= 2.550) %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Michael Brantley 69 89 14 17 74 76 66 60 12 12 0.817 0.832 2.550 4.847
Adam Jones 79 54 28 15 85 63 112 93 3 7 0.774 0.732 2.180 -0.336
Nomar Mazara 75 61 25 20 93 77 120 116 3 1 0.801 0.753 2.173 0.005
Lorenzo Cain 79 90 15 10 63 38 103 94 20 30 0.773 0.813 1.976 3.351
Eddie Rosario 79 87 23 24 83 77 126 104 10 8 0.772 0.803 1.952 3.443
Ian Happ 75 56 27 15 82 44 157 167 11 8 0.798 0.761 1.726 -2.863
Josh Reddick 71 63 18 17 75 47 80 77 7 7 0.786 0.718 1.694 -0.100
Gregory Polanco 72 75 18 23 71 81 103 117 15 12 0.770 0.839 1.641 3.451
Jay Bruce 70 31 27 9 88 37 129 75 3 2 0.788 0.680 1.410 -3.922
Michael Conforto 69 78 25 28 71 82 117 159 2 3 0.849 0.797 1.402 1.372
Ian Desmond 69 82 20 22 74 88 128 146 16 20 0.781 0.729 1.402 2.858

Looking at both Brantley and Jones, Brantley was projected to do a bit better than Jones largely due to his low strikeout and high stolen bases projection. Jones was projected to handily beat Brantley in homeruns, and runs batted in. I probably selected Jones to boost my homerun numbers. But Jones had a pretty bad season, and Brantley outperformed his projections, so now I feel the pangs of regret. Both Mazara and Cain had already been drafted by the time I picked in the 12th round. Rosario had pretty similar projections to Jones, just with more strikeouts, and he also outperformed his projections.



The projections for these players aren’t that different from each other, so I’m kicking myself for not predicting the future, which is not constructive. I think the main message here is that I shouldn’t have waited this long to pick my other two outfielders. My third outfielder, Kepler, didn’t even break the top 30. Given what I know now about outfielders contributing to runs, homeruns, and RBIs, this is a huge shortcoming in my strategy.

Third Base

I drafted Moustakas late, in round 11. In the table, I filtered out players with higher z-scores who had been selected in earlier rounds of the draft, and took the top five since there are fewer third basemen.

all_final %>% 
  filter(position == 'third_base' & tot_z < 3.2) %>% 
  top_n(., 5, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Adrian Beltre 75 49 23 15 92 65 79 96 2 1 0.815 0.763 3.160 -0.949
Joey Gallo 92 82 42 40 100 92 236 207 7 3 0.839 0.810 3.108 2.049
Mike Moustakas 73 66 30 28 85 95 90 103 1 4 0.812 0.774 2.995 2.926
Travis Shaw 77 73 28 32 89 86 144 108 7 5 0.786 0.825 2.007 3.833
Maikel Franco 68 48 25 22 86 68 91 62 1 1 0.788 0.780 1.819 1.113

By the time I had a chance to draft Moustakas in the 11th round, Beltre and Gallo were still available. Given how much Beltre underperformed his projection, it looks like I dodged a bullet. Gallo would have been a better option for runs and home runs, but his strikeouts are really quite high (even after a bit of discipline this year, he still had twice as many strikeouts as Moustakas).

Shaw would have been another reasonable option – his z-score was quite low due to his high projected number of strikeouts, but his projected runs, homeruns, and RBIs are comparable to Moustakas’s, and he was also projected to steal more bases.

Ultimately, Moustakas did just fine relative to his projection, and I could have done well with either Gallo or Shaw.

I did also end up drafting Franco in a later round, which seems reasonable given his projected z-score. He underperformed his projection, however, and didn’t contribute much to my offense.

Second Base

All the players with higher projected total z-scores than Odor were already gone by the time I got picked up Odor in the 8th round.

all_final %>% 
  filter(position == 'second_base' & tot_z < 3.9) %>% 
  top_n(., 5, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Rougned Odor 85 76 31 18 92 63 139 127 14 12 0.776 0.751 3.809 0.841
Jonathan Schoop 82 61 31 21 98 61 137 115 2 1 0.793 0.682 2.766 -1.384
Robinson Cano 78 44 23 10 88 50 91 47 2 0 0.795 0.845 2.501 -0.154
Ian Happ 75 56 27 15 82 44 157 167 11 8 0.798 0.761 1.726 -2.863
Dee Gordon 78 62 4 4 39 36 91 80 46 30 0.674 0.637 1.598 -0.288

It’s worth noting here that none of the other second basemen would have been substantially better than Odor, based on the projections. Dee Gordon was projected to steal a lot more bases, but he was also projected to hit considerably fewer home runs, and bat in fewer runs. Additionally, none of these candidates outperformed their projections. Given that, I think Odor was the right choice here.

Catcher

Posey was the first catcher to be drafted, so every catcher was available to me at the time.

all_final %>% 
  filter(position == 'catcher') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Gary Sanchez 72 51 31 18 90 53 115 94 3 1 0.842 0.697 3.259 -1.756
Evan Gattis 71 49 30 25 94 78 118 101 1 1 0.790 0.736 2.198 0.287
Buster Posey 64 47 14 5 69 41 62 53 4 3 0.821 0.741 1.206 -1.957
Willson Contreras 66 50 20 10 77 54 116 121 6 4 0.800 0.730 0.773 -2.704
Salvador Perez 59 52 23 27 74 80 99 108 1 1 0.752 0.713 -0.272 0.271
Jonathan Lucroy 53 41 11 4 56 51 63 65 2 0 0.794 0.617 -1.074 -4.014
Yadier Molina 54 55 12 20 67 74 73 66 6 4 0.724 0.750 -1.179 1.404
Wilson Ramos 49 39 20 15 67 70 83 80 1 0 0.739 0.845 -1.261 0.085
J.T. Realmuto 57 74 13 21 55 74 91 104 8 3 0.742 0.825 -1.495 2.004
Robinson Chirinos 61 48 22 18 63 65 131 140 3 2 0.749 0.757 -1.527 -1.925

I’ve spent the past few posts kicking myself for drafting Posey, but I’m not sure the numbers merit drafting any other catcher in round two. The two catchers with higher projected z-scores (Sanchez and Gattis) would have hit more home runs and batted in more runs, but their strikeouts were also nearly twice that of Posey. The clincher is that both also ended up underperforming their projections.

Looking at the top 10 catchers, only a few substantially outpeformed their projections. Given that there isn’t much variation in this group (namely, most of them are pretty bad), this is probably a good reason not to draft catchers in the second round.

As a bonus, I did a little bit of exploration in my own league, and I found that most people don’t draft catchers in the first 10 rounds, because catchers don’t seem to make much of a difference. More reasons not to draft catchers so early.

Shortstop

I drafted Simmons in the fourth round, early enough that most shortstops were still available. I excluded the three that had already been picked (Correa, Turner, and Lindor).

all_final %>% 
  filter(position == 'short' & tot_z < 3) %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()
Name R f_R HR f_HR RBI f_RBI SO f_SO SB f_SB OPS f_OPS tot_z f_tot_z
Elvis Andrus 80 53 12 6 69 33 88 66 23 5 0.745 0.675 2.386 -2.817
Xander Bogaerts 86 72 15 23 75 103 110 102 11 8 0.789 0.883 1.904 4.769
Trevor Story 82 88 30 37 93 108 203 168 11 27 0.791 0.914 1.586 7.975
Ian Desmond 69 82 20 22 74 88 128 146 16 20 0.781 0.729 1.402 2.858
Didi Gregorius 72 89 21 27 80 86 83 69 5 10 0.743 0.829 1.375 5.855
Jean Segura 77 91 13 10 56 63 93 69 23 20 0.720 0.755 1.175 3.416
Javier Baez 72 101 25 34 84 111 156 167 13 21 0.755 0.881 1.076 7.328
Marcus Semien 76 89 21 15 68 70 129 131 12 14 0.756 0.706 0.741 1.062
Jorge Polanco 69 38 14 6 71 42 90 62 13 7 0.738 0.773 0.685 -1.651
Andrelton Simmons 70 68 11 11 67 75 64 44 13 10 0.710 0.754 0.550 2.506

I discussed Bogaerts and Andrus in the previous post, so I’ll start with them. Bogaerts was projected to score more runs and hit more home runs than Simmons, and also strike out many more times. In the end, Bogaerts outperfomed his projections, batting in 33% more runs than his projections, and hitting more home runs, compensating for his high strikeouts. Andrus had similar projections, except he was also projected to steal more bases. He underperformed, so in retrospect, I’m glad I dodged that bullet.

Given Simmons’s low projected z-score, there are plenty of shortstops I could have done better with. Story’s projections were great, and he did even better by the end, cutting his strikeouts and stealing plenty more bases. Baez was also projected to do better than Simmons in homeruns and RBIs, and he outperformed his projections too, stealing more bases than expected.

Even though Simmons ended up outperforming his projection, it was as a result of fewer strikeouts, rather than increased runs. I would have done better with any of the others shortstops I mentioned, underscoring how big of a mistake it was to draft Simmons as early as I did. Down with WAR.

Lessons learned

Phew! That was a lot of analysis, and I applaud you if you stuck with me through this exploration. For those of you who skipped to the end, here are my top three takeaways from all this:

    1. Don’t rely solely on WAR to draft players. If you’re going to pick one summary statistic to guide your decisions, use a combined z-score instead.
    2. Pay attention to positional talent – draft outfielders earlier and catchers later.
    3. Rely on data, not fandom (namely, stop drafting Giants players without the data to back it up).

Next steps

Next, I’ll be looking at my strategy for drafting pitchers, and breaking it apart in a similar fashion to see what can be improved for next year. I’ll also write about putting this all together to create a cohesive drafting strategy, since we draft pitchers and batters at the same time.

Hopefully this post has provided some food for thought, and sparked some strategies for building your fantasy team. If you have questions or comments, find me on twitter!