Drafting Batters in Fantasy Baseball, Part 4

Where were we?
Projections vs. 2018 season statistics
What-if land
Lessons learned
Next steps

Where were we?

If you’re reading about my fantasy baseball experience for the first time, welcome! You may be better oriented by reading this first.

In the last three posts, I wrote about my draft strategy using projected WAR, and explored fantasy talent by defensive position. I discovered that WAR (Wins Above Replacement) was not a great way to select players given my league’s scoring categories. I calculated z-scores for my scoring categories instead, and looked at how that would have changed my draft picks. I did some deep reflection on the danger of fandom bias.

Projections vs. 2018 season statistics

Up to this point, we’ve been looking at projections for 2018. Now that the season is over, we can see just how these projections played out. I used the final 2018 stats from Fangraphs, and compared them to the projected stats. I’ve renamed all final stats to include the f_ prefix for clarity, calculated z-scores for each stat, and over the total, and then I merged this dataset to the full batters dataset, to facilitate comparison. I also merged it to the subset of players I drafted to look specifically at my team, Dropped Third Strike.

end_bat_z <- read.csv("../data/post1/batters_final.csv") %>%
  rename(f_R = R,
         f_HR = HR,
         f_RBI = RBI,
         f_SO = SO,
         f_SB = SB,
         f_OPS = OPS,
         f_WAR = WAR) %>%
  filter(PA >= 300) %>%
  mutate(f_R_z = z_score(f_R),
         f_HR_z = z_score(f_HR),
         f_RBI_z = z_score(f_RBI),
         f_SO_z = -z_score(f_SO),
         f_SB_z = z_score(f_SB),
         f_OPS_z = z_score(f_OPS),
         f_tot_z = round((f_R_z + f_HR_z + f_RBI_z + f_SO_z + f_SB_z + f_OPS_z), 3),
         playerid = as.character(playerid)) %>%
  select(-Team)
  
all_final <- inner_join(end_bat_z, bat_z, by = c("playerid", "Name"))
drafted_final <- inner_join(end_bat_z, drafted, by = c("Name"))

I’m going to start by looking at who I drafted. How did the final z-scores differ from the projections?

drafted_final <- drafted_final %>% 
  mutate(diff = f_tot_z - tot_z,
         change = case_when(
           diff < -1 ~ "underperform",
           diff > 1 ~ "outperform",
           TRUE ~ "as expected"
         ))

Let’s start by looking at those who outperformed their projections. Warning for those on mobile: these tables are wide, and you may not see all the relevant columns.

drafted_final %>% 
  filter(change =="outperform") %>% 
  select(Name, draft_order, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	draft_order	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Mookie Betts	1	100	129	24	32	90	80	73	91	23	30	0.871	1.078	8.184	12.597
Stephen Piscotty	23	69	78	17	27	70	88	125	114	6	2	0.757	0.821	-0.617	3.025
Andrelton Simmons	4	70	68	11	11	67	75	64	44	13	10	0.710	0.754	0.550	2.506
Jose Peraza	25	57	85	7	14	49	58	74	75	26	23	0.688	0.742	-0.552	3.382
Jose Peraza	25	57	85	7	14	49	58	74	75	26	23	0.688	0.742	-0.552	3.382

Betts blew his projections out of the water, hitting more home runs and increasing his OPS by quite a bit. Simmons showed much better plate discipline, striking out much less often, but his other categories didn’t dramatically improve. Piscotty did dramatically better than his projections in several different categories. Even Peraza, who had a negative z-score in his projections, ended up finishing the season on a high note, scoring more runs and hitting twice as many home runs as projected.

Now let’s look at the underperformers, who performed below their projections.

drafted_final %>% 
  filter(change =="underperform") %>% 
  select(Name, draft_order, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	draft_order	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Edwin Encarnacion	5	92	74	36	32	109	107	131	132	2	3	0.869	0.810	5.724	3.710
Brandon Belt	21	78	50	21	14	77	46	145	107	5	4	0.832	0.756	1.091	-1.919
Buster Posey	2	64	47	14	5	69	41	62	53	4	3	0.821	0.741	1.206	-1.957
Rougned Odor	8	85	76	31	18	92	63	139	127	14	12	0.776	0.751	3.809	0.841
Adam Jones	12	79	54	28	15	85	63	112	93	3	7	0.774	0.732	2.180	-0.336
Brandon Crawford	18	67	63	17	14	82	54	127	122	5	4	0.742	0.719	-0.540	-1.804
Manuel Margot	13	64	50	12	8	50	51	96	88	19	11	0.720	0.675	-0.563	-1.848

Posey, whom we’ve alreaady determined at this point was drafted too early and was a poor choice based on his projection, added salt to the wound by underperforming. I knew this even without looking at the stats, given his abysmal offensive season, but this confirms it.

Encarnacion scored fewer runs than his projections, but otherwise still provided good offensive numbers. Odor showed better plate discipline (fewer strikeouts), but his offensive output decreased dramatically. Jones also had a poor offensive year. Margot was a bad draft pick, who started out with poor projections and got even worse, as did Crawford. Belt improved his plate discipline, but his offensive numbers also tanked.

Now let’s look at those who performed as expected, whose z-scores changed by 1 or less.

drafted_final %>% 
  filter(change =="as expected") %>% 
  select(Name, draft_order, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	draft_order	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Mike Moustakas	11	73	66	30	28	85	95	90	103	1	4	0.812	0.774	2.995	2.926
Maikel Franco	24	68	48	25	22	86	68	91	62	1	1	0.788	0.780	1.819	1.113
Max Kepler	20	67	80	19	20	73	58	109	96	7	4	0.769	0.727	0.394	0.706

Moustakas improved his RBIs, and got a little better at base stealing, but was otherwise pretty close to his projections. Kepler scored a few more runs, and was a bad pick to start with, but was at least consistent. Franco showed better plate discipline (fewer strikeouts), but was otherwise the same.

What-if land

Not content to sit with the bad decisions I made, I engaged in some ill-advised counterfactual exploration, and looked to see what might have happened had I drafted the players with higher z-scores, rather than drafting based on WAR.

First Base

I looked at which players had higher projected z-scores than Encarnacion, and whether they were available at the time I drafted Encarnacion in the fifth round.

all_final %>% 
  filter(position == 'first_base') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Anthony Rizzo	97	74	34	25	107	101	98	80	9	6	0.927	0.846	8.343	4.988
Paul Goldschmidt	101	95	31	33	103	83	147	173	17	7	0.927	0.922	7.618	4.246
Cody Bellinger	91	84	39	25	110	76	159	151	13	14	0.882	0.814	6.782	2.816
Joey Votto	95	67	28	12	92	67	105	101	5	2	0.952	0.837	6.493	0.471
Rhys Hoskins	92	89	36	34	111	96	140	150	5	5	0.877	0.850	6.048	4.295
Freddie Freeman	92	94	31	23	93	98	134	132	8	10	0.935	0.892	5.989	5.032
Edwin Encarnacion	92	74	36	32	109	107	131	132	2	3	0.869	0.810	5.724	3.710
Carlos Santana	78	82	27	24	80	86	95	93	5	2	0.859	0.766	3.736	2.800
Jose Abreu	77	68	29	22	95	78	115	109	2	2	0.860	0.798	3.713	1.438
Joey Gallo	92	82	42	40	100	92	236	207	7	3	0.839	0.810	3.108	2.049

This makes me feel a little better – the first basemen with higher projected total_z scores were drafted prior to my turn in the fifth. Most of them underperformed their projections, but still did well, except for Joey Votto, whose runs scored, home runs, and runs batted in are much lower than projected, leading to his abysmal final z-score this year. Other first basemen who were projected to perform worse than Encarnacion also underperformed their projection. All things considered, Edwin wasn’t a bad draft pick.

Outfield

I was really happy that I drafted Betts in the first round, given that he had high z-scores to begin with and then subsequently outperformed his projection. I looked at the next outfielder I drafted, who was Jones in the 12th round. Since all outfielders (except Brantley) with higher z-scores were drafted prior to my pick in the 12th round, I’ll exclude them from the table for simplicity’s sake.

all_final %>% 
  filter(position == 'outfield' & tot_z <= 2.550) %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Michael Brantley	69	89	14	17	74	76	66	60	12	12	0.817	0.832	2.550	4.847
Adam Jones	79	54	28	15	85	63	112	93	3	7	0.774	0.732	2.180	-0.336
Nomar Mazara	75	61	25	20	93	77	120	116	3	1	0.801	0.753	2.173	0.005
Lorenzo Cain	79	90	15	10	63	38	103	94	20	30	0.773	0.813	1.976	3.351
Eddie Rosario	79	87	23	24	83	77	126	104	10	8	0.772	0.803	1.952	3.443
Ian Happ	75	56	27	15	82	44	157	167	11	8	0.798	0.761	1.726	-2.863
Josh Reddick	71	63	18	17	75	47	80	77	7	7	0.786	0.718	1.694	-0.100
Gregory Polanco	72	75	18	23	71	81	103	117	15	12	0.770	0.839	1.641	3.451
Jay Bruce	70	31	27	9	88	37	129	75	3	2	0.788	0.680	1.410	-3.922
Michael Conforto	69	78	25	28	71	82	117	159	2	3	0.849	0.797	1.402	1.372
Ian Desmond	69	82	20	22	74	88	128	146	16	20	0.781	0.729	1.402	2.858

Looking at both Brantley and Jones, Brantley was projected to do a bit better than Jones largely due to his low strikeout and high stolen bases projection. Jones was projected to handily beat Brantley in homeruns, and runs batted in. I probably selected Jones to boost my homerun numbers. But Jones had a pretty bad season, and Brantley outperformed his projections, so now I feel the pangs of regret. Both Mazara and Cain had already been drafted by the time I picked in the 12th round. Rosario had pretty similar projections to Jones, just with more strikeouts, and he also outperformed his projections.

The projections for these players aren’t that different from each other, so I’m kicking myself for not predicting the future, which is not constructive. I think the main message here is that I shouldn’t have waited this long to pick my other two outfielders. My third outfielder, Kepler, didn’t even break the top 30. Given what I know now about outfielders contributing to runs, homeruns, and RBIs, this is a huge shortcoming in my strategy.

Third Base

I drafted Moustakas late, in round 11. In the table, I filtered out players with higher z-scores who had been selected in earlier rounds of the draft, and took the top five since there are fewer third basemen.

all_final %>% 
  filter(position == 'third_base' & tot_z < 3.2) %>% 
  top_n(., 5, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Adrian Beltre	75	49	23	15	92	65	79	96	2	1	0.815	0.763	3.160	-0.949
Joey Gallo	92	82	42	40	100	92	236	207	7	3	0.839	0.810	3.108	2.049
Mike Moustakas	73	66	30	28	85	95	90	103	1	4	0.812	0.774	2.995	2.926
Travis Shaw	77	73	28	32	89	86	144	108	7	5	0.786	0.825	2.007	3.833
Maikel Franco	68	48	25	22	86	68	91	62	1	1	0.788	0.780	1.819	1.113

By the time I had a chance to draft Moustakas in the 11th round, Beltre and Gallo were still available. Given how much Beltre underperformed his projection, it looks like I dodged a bullet. Gallo would have been a better option for runs and home runs, but his strikeouts are really quite high (even after a bit of discipline this year, he still had twice as many strikeouts as Moustakas).

Shaw would have been another reasonable option – his z-score was quite low due to his high projected number of strikeouts, but his projected runs, homeruns, and RBIs are comparable to Moustakas’s, and he was also projected to steal more bases.

Ultimately, Moustakas did just fine relative to his projection, and I could have done well with either Gallo or Shaw.

I did also end up drafting Franco in a later round, which seems reasonable given his projected z-score. He underperformed his projection, however, and didn’t contribute much to my offense.

Second Base

All the players with higher projected total z-scores than Odor were already gone by the time I got picked up Odor in the 8th round.

all_final %>% 
  filter(position == 'second_base' & tot_z < 3.9) %>% 
  top_n(., 5, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Rougned Odor	85	76	31	18	92	63	139	127	14	12	0.776	0.751	3.809	0.841
Jonathan Schoop	82	61	31	21	98	61	137	115	2	1	0.793	0.682	2.766	-1.384
Robinson Cano	78	44	23	10	88	50	91	47	2	0	0.795	0.845	2.501	-0.154
Ian Happ	75	56	27	15	82	44	157	167	11	8	0.798	0.761	1.726	-2.863
Dee Gordon	78	62	4	4	39	36	91	80	46	30	0.674	0.637	1.598	-0.288

It’s worth noting here that none of the other second basemen would have been substantially better than Odor, based on the projections. Dee Gordon was projected to steal a lot more bases, but he was also projected to hit considerably fewer home runs, and bat in fewer runs. Additionally, none of these candidates outperformed their projections. Given that, I think Odor was the right choice here.

Catcher

Posey was the first catcher to be drafted, so every catcher was available to me at the time.

all_final %>% 
  filter(position == 'catcher') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Gary Sanchez	72	51	31	18	90	53	115	94	3	1	0.842	0.697	3.259	-1.756
Evan Gattis	71	49	30	25	94	78	118	101	1	1	0.790	0.736	2.198	0.287
Buster Posey	64	47	14	5	69	41	62	53	4	3	0.821	0.741	1.206	-1.957
Willson Contreras	66	50	20	10	77	54	116	121	6	4	0.800	0.730	0.773	-2.704
Salvador Perez	59	52	23	27	74	80	99	108	1	1	0.752	0.713	-0.272	0.271
Jonathan Lucroy	53	41	11	4	56	51	63	65	2	0	0.794	0.617	-1.074	-4.014
Yadier Molina	54	55	12	20	67	74	73	66	6	4	0.724	0.750	-1.179	1.404
Wilson Ramos	49	39	20	15	67	70	83	80	1	0	0.739	0.845	-1.261	0.085
J.T. Realmuto	57	74	13	21	55	74	91	104	8	3	0.742	0.825	-1.495	2.004
Robinson Chirinos	61	48	22	18	63	65	131	140	3	2	0.749	0.757	-1.527	-1.925

I’ve spent the past few posts kicking myself for drafting Posey, but I’m not sure the numbers merit drafting any other catcher in round two. The two catchers with higher projected z-scores (Sanchez and Gattis) would have hit more home runs and batted in more runs, but their strikeouts were also nearly twice that of Posey. The clincher is that both also ended up underperforming their projections.

Looking at the top 10 catchers, only a few substantially outpeformed their projections. Given that there isn’t much variation in this group (namely, most of them are pretty bad), this is probably a good reason not to draft catchers in the second round.

As a bonus, I did a little bit of exploration in my own league, and I found that most people don’t draft catchers in the first 10 rounds, because catchers don’t seem to make much of a difference. More reasons not to draft catchers so early.

Shortstop

I drafted Simmons in the fourth round, early enough that most shortstops were still available. I excluded the three that had already been picked (Correa, Turner, and Lindor).

all_final %>% 
  filter(position == 'short' & tot_z < 3) %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(Name, R, f_R, HR, f_HR, RBI, f_RBI, SO, f_SO, SB, f_SB, OPS, f_OPS, tot_z, f_tot_z) %>%
  knitr::kable()

Name	R	f_R	HR	f_HR	RBI	f_RBI	SO	f_SO	SB	f_SB	OPS	f_OPS	tot_z	f_tot_z
Elvis Andrus	80	53	12	6	69	33	88	66	23	5	0.745	0.675	2.386	-2.817
Xander Bogaerts	86	72	15	23	75	103	110	102	11	8	0.789	0.883	1.904	4.769
Trevor Story	82	88	30	37	93	108	203	168	11	27	0.791	0.914	1.586	7.975
Ian Desmond	69	82	20	22	74	88	128	146	16	20	0.781	0.729	1.402	2.858
Didi Gregorius	72	89	21	27	80	86	83	69	5	10	0.743	0.829	1.375	5.855
Jean Segura	77	91	13	10	56	63	93	69	23	20	0.720	0.755	1.175	3.416
Javier Baez	72	101	25	34	84	111	156	167	13	21	0.755	0.881	1.076	7.328
Marcus Semien	76	89	21	15	68	70	129	131	12	14	0.756	0.706	0.741	1.062
Jorge Polanco	69	38	14	6	71	42	90	62	13	7	0.738	0.773	0.685	-1.651
Andrelton Simmons	70	68	11	11	67	75	64	44	13	10	0.710	0.754	0.550	2.506

I discussed Bogaerts and Andrus in the previous post, so I’ll start with them. Bogaerts was projected to score more runs and hit more home runs than Simmons, and also strike out many more times. In the end, Bogaerts outperfomed his projections, batting in 33% more runs than his projections, and hitting more home runs, compensating for his high strikeouts. Andrus had similar projections, except he was also projected to steal more bases. He underperformed, so in retrospect, I’m glad I dodged that bullet.

Given Simmons’s low projected z-score, there are plenty of shortstops I could have done better with. Story’s projections were great, and he did even better by the end, cutting his strikeouts and stealing plenty more bases. Baez was also projected to do better than Simmons in homeruns and RBIs, and he outperformed his projections too, stealing more bases than expected.

Even though Simmons ended up outperforming his projection, it was as a result of fewer strikeouts, rather than increased runs. I would have done better with any of the others shortstops I mentioned, underscoring how big of a mistake it was to draft Simmons as early as I did. Down with WAR.

Lessons learned

Phew! That was a lot of analysis, and I applaud you if you stuck with me through this exploration. For those of you who skipped to the end, here are my top three takeaways from all this:

Don’t rely solely on WAR to draft players. If you’re going to pick one summary statistic to guide your decisions, use a combined z-score instead.
Pay attention to positional talent – draft outfielders earlier and catchers later.
Rely on data, not fandom (namely, stop drafting Giants players without the data to back it up).

Next steps

Next, I’ll be looking at my strategy for drafting pitchers, and breaking it apart in a similar fashion to see what can be improved for next year. I’ll also write about putting this all together to create a cohesive drafting strategy, since we draft pitchers and batters at the same time.

Hopefully this post has provided some food for thought, and sparked some strategies for building your fantasy team. If you have questions or comments, find me on twitter!

Angeline Protacio