Drafting Batters in Fantasy Baseball, part 3

Where were we?
A better strategy
Up Next

Where were we?

If you’re reading about my fantasy baseball experience for the first time, welcome! You may be better oriented by reading this first.

In the last two posts, I wrote about my draft strategy using projected WAR, and explored fantasy talent by defensive position. I discovered that WAR (Wins Above Replacement) was not a great way to select players given my league’s scoring categories. In the absence of a summary statistic to guide my decisions, I looked at all of the scoring categories. I was pretty frazzled by the end of the draft.

A better strategy

In doing research for this post, I found this article on setting up my own rankings based on my league scoring categories, using z-scores. It’s a simple statistic that identifies what players are above the mean, and by how much. This is so simple, I was kicking myself for using it on draft day. Here’s how it works.

Create Z-Scores

bat_z <- batters %>%
  filter(PA >= 300) %>%
  select(playerid, position, Name, Team, R, HR, RBI, SO, SB, OPS, WAR) %>%
  mutate(R_z = z_score(R),
         HR_z = z_score(HR),
         RBI_z = z_score(RBI),
         SO_z = -z_score(SO),
         SB_z = z_score(SB),
         OPS_z = z_score(OPS),
         tot_z = round((R_z + HR_z + RBI_z + SO_z + SB_z + OPS_z), 3))

I took the same batters dataset and filtered it just to include those with 300 or more plate appearances. I wanted to exclude players without sufficient playing time, who may have really low predicted runs, home runs, RBIs, strikeouts, or stolen bases just because of small samples. These players may also have extreme predicted OPS statistics (really high, or really low) because of small samples. I didn’t want to draft players who weren’t projected to play for most of the season, and 300 plate appearances is roughly two appearances per game. This effectively halved the batters in my dataset.

Once I filtered based on that criteria, I calculated the mean and standard deviation for the remaining players, and used that to calculate a z-score, indicating how extreme (either positive or negative) that player’s numbers are relative to the mean. Generally, the higher the z-score, the better. A z-score of 4 for home runs is definitely someone I’d love to have on my team. I calculated z-scores for all of the scoring categories, multiplied strikeouts by -1 so they were all on the same scale (we want fewer strikeouts), and then I summed them all to get an overall z-score (I named this variable “tot_z”). This overall z-score looks at all the scoring categories and lets me know how that player compares to others.

(This is a similar calculation I used to compare the scoring categories with WAR in the first post. Same calculation, different intention.)

So now let’s look at our top players.

Who’s on top?

bat_z %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()

position	Name	R	HR	RBI	SO	SB	OPS	WAR	tot_z
outfield	Mike Trout	114	39	105	131	22	1.027	8.2	11.889
outfield	Giancarlo Stanton	109	58	140	171	2	1.029	6.4	11.869
third_base	Nolan Arenado	97	39	118	101	3	0.937	5.0	8.766
outfield	Bryce Harper	100	35	102	122	10	0.984	5.6	8.646
first_base	Anthony Rizzo	97	34	107	98	9	0.927	4.7	8.343
outfield	Mookie Betts	100	24	90	73	23	0.871	5.6	8.184
first_base	Paul Goldschmidt	101	31	103	147	17	0.927	4.3	7.618
second_base	Jose Altuve	94	20	82	73	28	0.859	4.8	7.434
short	Carlos Correa	96	30	113	121	8	0.894	6.1	6.873
first_base	Cody Bellinger	91	39	110	159	13	0.882	3.6	6.782
outfield	Cody Bellinger	91	39	110	159	13	0.882	3.6	6.782

Trout’s on top, not surprisingly.

There is considerable overlap between this list of names and players with high projected WAR, but now this list accounts for projected stolen bases, and projected strikeouts. This is much easier to keep track of in the moment.

Forgive the digression, but I’m still beating myself up about Buster Posey.

Going back to the question I explored in the last post about positional talent, would I have had a different pool of talent to choose from if I’d looked at z-scores instead of WAR?

bat_z %>% 
  filter(position == 'catcher') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()

position	Name	R	HR	RBI	SO	SB	OPS	WAR	tot_z
catcher	Gary Sanchez	72	31	90	115	3	0.842	3.5	3.259
catcher	Evan Gattis	71	30	94	118	1	0.790	1.6	2.198
catcher	Buster Posey	64	14	69	62	4	0.821	4.5	1.206
catcher	Willson Contreras	66	20	77	116	6	0.800	3.0	0.773
catcher	Salvador Perez	59	23	74	99	1	0.752	2.8	-0.272
catcher	Brian McCann	54	20	66	81	1	0.752	2.3	-0.773
catcher	Jonathan Lucroy	53	11	56	63	2	0.794	2.9	-1.074
catcher	Yadier Molina	54	12	67	73	6	0.724	2.5	-1.179
catcher	Wilson Ramos	49	20	67	83	1	0.739	2.1	-1.261
catcher	J.T. Realmuto	57	13	55	91	8	0.742	2.5	-1.495

In addition to Gary Sanchez, who I identified earlier, Gattis would have also been a good pick based on his projected homeruns and runs batted in. His WAR is quite low, which is why he didn’t end up on my radar before. Let’s look at shortstops too, for completeness.

bat_z %>% 
  filter(position == 'short') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>% 
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()

position	Name	R	HR	RBI	SO	SB	OPS	WAR	tot_z
short	Carlos Correa	96	30	113	121	8	0.894	6.1	6.873
short	Trea Turner	89	16	66	119	49	0.793	3.7	6.200
short	Francisco Lindor	92	24	83	84	15	0.842	5.8	5.656
short	Corey Seager	89	24	87	120	4	0.853	5.2	3.458
short	Elvis Andrus	80	12	69	88	23	0.745	2.1	2.386
short	Xander Bogaerts	86	15	75	110	11	0.789	3.4	1.904
short	Trevor Story	82	30	93	203	11	0.791	1.9	1.586
short	Ian Desmond	69	20	74	128	16	0.781	0.5	1.402
short	Didi Gregorius	72	21	80	83	5	0.743	2.6	1.375
short	Jean Segura	77	13	56	93	23	0.720	2.0	1.175

Simmons isn’t even on the list! Ouch. Given who was available by the fourth round, I maintain that Bogaerts might have been a fine pick (identified based on projected WAR in the last post) but it looks like Andrus might have also been helpful for his projected stolen bases and low projection for strikeouts.

Let’s look at my pick for second base, since prior analysis determined that it was also a position with scarce offensive talent.

bat_z %>% 
  filter(position == 'second_base') %>% 
  top_n(., 10, tot_z) %>% 
  arrange(desc(tot_z)) %>%
  select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
  knitr::kable()

position	Name	R	HR	RBI	SO	SB	OPS	WAR	tot_z
second_base	Jose Altuve	94	20	82	73	28	0.859	4.8	7.434
second_base	Jose Ramirez	92	21	84	67	20	0.849	4.8	6.574
second_base	Brian Dozier	96	30	84	132	14	0.825	3.7	4.856
second_base	Daniel Murphy	80	19	87	70	4	0.859	2.7	3.903
second_base	Rougned Odor	85	31	92	139	14	0.776	1.7	3.809
second_base	Jonathan Schoop	82	31	98	137	2	0.793	3.0	2.766
second_base	Robinson Cano	78	23	88	91	2	0.795	2.9	2.501
second_base	Ian Happ	75	27	82	157	11	0.798	2.1	1.726
second_base	Dee Gordon	78	4	39	91	46	0.674	1.9	1.598
second_base	Whit Merrifield	74	12	61	97	25	0.732	2.1	1.441

Given that no one who ranked above him was available in my league by the eighth round, Odor seems to be a reasonable pick.

Now that I know using z-scores would have changed my picks for catcher and shortstop (but not second base), I’m going to look at z-scores for the rest of my draft picks.

My draft picks

Below, I filtered the full dataset to only include players I drafted. For reference, my team name is “Dropped Third Strike”, after the obscure baseball rule (shortened here to DTS for object-naming). I added in information on draft order as well.

DTS_bat <- as.data.frame(cbind(c("Mookie Betts", "Buster Posey", "Andrelton Simmons", "Edwin Encarnacion", "Rougned Odor", "Mike Moustakas", "Adam Jones", "Manuel Margot", "Brandon Crawford", "Max Kepler", "Brandon Belt", "Stephen Piscotty", "Maikel Franco", "Jose Peraza"), c(01, 02, 04, 05, 08, 11, 12, 13, 18, 20, 21, 23, 24, 25)))
names(DTS_bat) <- c("Name", "draft_order")

drafted <- inner_join(DTS_bat, bat_z, by = "Name") 
drafted %>%
  select(Name, draft_order, position, tot_z) %>%
  knitr::kable()

Name	draft_order	position	tot_z
Mookie Betts	1	outfield	8.184
Buster Posey	2	catcher	1.206
Andrelton Simmons	4	short	0.550
Edwin Encarnacion	5	first_base	5.724
Rougned Odor	8	second_base	3.809
Mike Moustakas	11	third_base	2.995
Adam Jones	12	outfield	2.180
Manuel Margot	13	outfield	-0.563
Brandon Crawford	18	short	-0.540
Max Kepler	20	outfield	0.394
Brandon Belt	21	first_base	1.091
Stephen Piscotty	23	outfield	-0.617
Maikel Franco	24	third_base	1.819
Jose Peraza	25	second_base	-0.552
Jose Peraza	25	short	-0.552

I ended up drafting four batters with negative projected z-scores (meaning they are projected to perform below average): Margot, Crawford, Piscotty, and Peraza. And fandom bias strikes again, because I drafted three Giants, and one player from every other team. I already mentioned that Buster Posey was not the best pick at the catcher’s position, but I also ended up picking up Crawford, who had mediocre to bad projected z-scores for many scoring categories. This was a bad pick, and I ended up dropping him early in the season.

I’m a little surprised by Franco, who has a rather high z-score for being available until the 24th round, and Belt for the same reason (21st round).

Up Next

In the next post, I’ll wrap this up and look at how my players actually did, comparing the projection to the final 2018 data. Which players were truly bad picks? Who outperformed their projection? Stay tuned!

Drafting Batters in Fantasy Baseball, Part 4

Angeline Protacio