If you’re reading about my fantasy baseball experience for the first time, welcome! You may be better oriented by reading this first.
In the last two posts, I wrote about my draft strategy using projected WAR, and explored fantasy talent by defensive position. I discovered that WAR (Wins Above Replacement) was not a great way to select players given my league’s scoring categories. In the absence of a summary statistic to guide my decisions, I looked at all of the scoring categories. I was pretty frazzled by the end of the draft.
In doing research for this post, I found this article on setting up my own rankings based on my league scoring categories, using z-scores. It’s a simple statistic that identifies what players are above the mean, and by how much. This is so simple, I was kicking myself for using it on draft day. Here’s how it works.
bat_z <- batters %>%
filter(PA >= 300) %>%
select(playerid, position, Name, Team, R, HR, RBI, SO, SB, OPS, WAR) %>%
mutate(R_z = z_score(R),
HR_z = z_score(HR),
RBI_z = z_score(RBI),
SO_z = -z_score(SO),
SB_z = z_score(SB),
OPS_z = z_score(OPS),
tot_z = round((R_z + HR_z + RBI_z + SO_z + SB_z + OPS_z), 3))
I took the same batters dataset and filtered it just to include those with 300 or more plate appearances. I wanted to exclude players without sufficient playing time, who may have really low predicted runs, home runs, RBIs, strikeouts, or stolen bases just because of small samples. These players may also have extreme predicted OPS statistics (really high, or really low) because of small samples. I didn’t want to draft players who weren’t projected to play for most of the season, and 300 plate appearances is roughly two appearances per game. This effectively halved the batters in my dataset.
Once I filtered based on that criteria, I calculated the mean and standard deviation for the remaining players, and used that to calculate a z-score, indicating how extreme (either positive or negative) that player’s numbers are relative to the mean. Generally, the higher the z-score, the better. A z-score of 4 for home runs is definitely someone I’d love to have on my team. I calculated z-scores for all of the scoring categories, multiplied strikeouts by -1 so they were all on the same scale (we want fewer strikeouts), and then I summed them all to get an overall z-score (I named this variable “tot_z”). This overall z-score looks at all the scoring categories and lets me know how that player compares to others.
(This is a similar calculation I used to compare the scoring categories with WAR in the first post. Same calculation, different intention.)
So now let’s look at our top players.
bat_z %>%
top_n(., 10, tot_z) %>%
arrange(desc(tot_z)) %>%
select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
knitr::kable()
position | Name | R | HR | RBI | SO | SB | OPS | WAR | tot_z |
---|---|---|---|---|---|---|---|---|---|
outfield | Mike Trout | 114 | 39 | 105 | 131 | 22 | 1.027 | 8.2 | 11.889 |
outfield | Giancarlo Stanton | 109 | 58 | 140 | 171 | 2 | 1.029 | 6.4 | 11.869 |
third_base | Nolan Arenado | 97 | 39 | 118 | 101 | 3 | 0.937 | 5.0 | 8.766 |
outfield | Bryce Harper | 100 | 35 | 102 | 122 | 10 | 0.984 | 5.6 | 8.646 |
first_base | Anthony Rizzo | 97 | 34 | 107 | 98 | 9 | 0.927 | 4.7 | 8.343 |
outfield | Mookie Betts | 100 | 24 | 90 | 73 | 23 | 0.871 | 5.6 | 8.184 |
first_base | Paul Goldschmidt | 101 | 31 | 103 | 147 | 17 | 0.927 | 4.3 | 7.618 |
second_base | Jose Altuve | 94 | 20 | 82 | 73 | 28 | 0.859 | 4.8 | 7.434 |
short | Carlos Correa | 96 | 30 | 113 | 121 | 8 | 0.894 | 6.1 | 6.873 |
first_base | Cody Bellinger | 91 | 39 | 110 | 159 | 13 | 0.882 | 3.6 | 6.782 |
outfield | Cody Bellinger | 91 | 39 | 110 | 159 | 13 | 0.882 | 3.6 | 6.782 |
Trout’s on top, not surprisingly.
There is considerable overlap between this list of names and players with high projected WAR, but now this list accounts for projected stolen bases, and projected strikeouts. This is much easier to keep track of in the moment.
Forgive the digression, but I’m still beating myself up about Buster Posey.
Going back to the question I explored in the last post about positional talent, would I have had a different pool of talent to choose from if I’d looked at z-scores instead of WAR?
bat_z %>%
filter(position == 'catcher') %>%
top_n(., 10, tot_z) %>%
arrange(desc(tot_z)) %>%
select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
knitr::kable()
position | Name | R | HR | RBI | SO | SB | OPS | WAR | tot_z |
---|---|---|---|---|---|---|---|---|---|
catcher | Gary Sanchez | 72 | 31 | 90 | 115 | 3 | 0.842 | 3.5 | 3.259 |
catcher | Evan Gattis | 71 | 30 | 94 | 118 | 1 | 0.790 | 1.6 | 2.198 |
catcher | Buster Posey | 64 | 14 | 69 | 62 | 4 | 0.821 | 4.5 | 1.206 |
catcher | Willson Contreras | 66 | 20 | 77 | 116 | 6 | 0.800 | 3.0 | 0.773 |
catcher | Salvador Perez | 59 | 23 | 74 | 99 | 1 | 0.752 | 2.8 | -0.272 |
catcher | Brian McCann | 54 | 20 | 66 | 81 | 1 | 0.752 | 2.3 | -0.773 |
catcher | Jonathan Lucroy | 53 | 11 | 56 | 63 | 2 | 0.794 | 2.9 | -1.074 |
catcher | Yadier Molina | 54 | 12 | 67 | 73 | 6 | 0.724 | 2.5 | -1.179 |
catcher | Wilson Ramos | 49 | 20 | 67 | 83 | 1 | 0.739 | 2.1 | -1.261 |
catcher | J.T. Realmuto | 57 | 13 | 55 | 91 | 8 | 0.742 | 2.5 | -1.495 |
In addition to Gary Sanchez, who I identified earlier, Gattis would have also been a good pick based on his projected homeruns and runs batted in. His WAR is quite low, which is why he didn’t end up on my radar before. Let’s look at shortstops too, for completeness.
bat_z %>%
filter(position == 'short') %>%
top_n(., 10, tot_z) %>%
arrange(desc(tot_z)) %>%
select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
knitr::kable()
position | Name | R | HR | RBI | SO | SB | OPS | WAR | tot_z |
---|---|---|---|---|---|---|---|---|---|
short | Carlos Correa | 96 | 30 | 113 | 121 | 8 | 0.894 | 6.1 | 6.873 |
short | Trea Turner | 89 | 16 | 66 | 119 | 49 | 0.793 | 3.7 | 6.200 |
short | Francisco Lindor | 92 | 24 | 83 | 84 | 15 | 0.842 | 5.8 | 5.656 |
short | Corey Seager | 89 | 24 | 87 | 120 | 4 | 0.853 | 5.2 | 3.458 |
short | Elvis Andrus | 80 | 12 | 69 | 88 | 23 | 0.745 | 2.1 | 2.386 |
short | Xander Bogaerts | 86 | 15 | 75 | 110 | 11 | 0.789 | 3.4 | 1.904 |
short | Trevor Story | 82 | 30 | 93 | 203 | 11 | 0.791 | 1.9 | 1.586 |
short | Ian Desmond | 69 | 20 | 74 | 128 | 16 | 0.781 | 0.5 | 1.402 |
short | Didi Gregorius | 72 | 21 | 80 | 83 | 5 | 0.743 | 2.6 | 1.375 |
short | Jean Segura | 77 | 13 | 56 | 93 | 23 | 0.720 | 2.0 | 1.175 |
Simmons isn’t even on the list! Ouch. Given who was available by the fourth round, I maintain that Bogaerts might have been a fine pick (identified based on projected WAR in the last post) but it looks like Andrus might have also been helpful for his projected stolen bases and low projection for strikeouts.
Let’s look at my pick for second base, since prior analysis determined that it was also a position with scarce offensive talent.
bat_z %>%
filter(position == 'second_base') %>%
top_n(., 10, tot_z) %>%
arrange(desc(tot_z)) %>%
select(position, Name, R, HR, RBI, SO, SB, OPS, WAR, tot_z) %>%
knitr::kable()
position | Name | R | HR | RBI | SO | SB | OPS | WAR | tot_z |
---|---|---|---|---|---|---|---|---|---|
second_base | Jose Altuve | 94 | 20 | 82 | 73 | 28 | 0.859 | 4.8 | 7.434 |
second_base | Jose Ramirez | 92 | 21 | 84 | 67 | 20 | 0.849 | 4.8 | 6.574 |
second_base | Brian Dozier | 96 | 30 | 84 | 132 | 14 | 0.825 | 3.7 | 4.856 |
second_base | Daniel Murphy | 80 | 19 | 87 | 70 | 4 | 0.859 | 2.7 | 3.903 |
second_base | Rougned Odor | 85 | 31 | 92 | 139 | 14 | 0.776 | 1.7 | 3.809 |
second_base | Jonathan Schoop | 82 | 31 | 98 | 137 | 2 | 0.793 | 3.0 | 2.766 |
second_base | Robinson Cano | 78 | 23 | 88 | 91 | 2 | 0.795 | 2.9 | 2.501 |
second_base | Ian Happ | 75 | 27 | 82 | 157 | 11 | 0.798 | 2.1 | 1.726 |
second_base | Dee Gordon | 78 | 4 | 39 | 91 | 46 | 0.674 | 1.9 | 1.598 |
second_base | Whit Merrifield | 74 | 12 | 61 | 97 | 25 | 0.732 | 2.1 | 1.441 |
Given that no one who ranked above him was available in my league by the eighth round, Odor seems to be a reasonable pick.
Now that I know using z-scores would have changed my picks for catcher and shortstop (but not second base), I’m going to look at z-scores for the rest of my draft picks.
Below, I filtered the full dataset to only include players I drafted. For reference, my team name is “Dropped Third Strike”, after the obscure baseball rule (shortened here to DTS for object-naming). I added in information on draft order as well.
DTS_bat <- as.data.frame(cbind(c("Mookie Betts", "Buster Posey", "Andrelton Simmons", "Edwin Encarnacion", "Rougned Odor", "Mike Moustakas", "Adam Jones", "Manuel Margot", "Brandon Crawford", "Max Kepler", "Brandon Belt", "Stephen Piscotty", "Maikel Franco", "Jose Peraza"), c(01, 02, 04, 05, 08, 11, 12, 13, 18, 20, 21, 23, 24, 25)))
names(DTS_bat) <- c("Name", "draft_order")
drafted <- inner_join(DTS_bat, bat_z, by = "Name")
drafted %>%
select(Name, draft_order, position, tot_z) %>%
knitr::kable()
Name | draft_order | position | tot_z |
---|---|---|---|
Mookie Betts | 1 | outfield | 8.184 |
Buster Posey | 2 | catcher | 1.206 |
Andrelton Simmons | 4 | short | 0.550 |
Edwin Encarnacion | 5 | first_base | 5.724 |
Rougned Odor | 8 | second_base | 3.809 |
Mike Moustakas | 11 | third_base | 2.995 |
Adam Jones | 12 | outfield | 2.180 |
Manuel Margot | 13 | outfield | -0.563 |
Brandon Crawford | 18 | short | -0.540 |
Max Kepler | 20 | outfield | 0.394 |
Brandon Belt | 21 | first_base | 1.091 |
Stephen Piscotty | 23 | outfield | -0.617 |
Maikel Franco | 24 | third_base | 1.819 |
Jose Peraza | 25 | second_base | -0.552 |
Jose Peraza | 25 | short | -0.552 |
I ended up drafting four batters with negative projected z-scores (meaning they are projected to perform below average): Margot, Crawford, Piscotty, and Peraza. And fandom bias strikes again, because I drafted three Giants, and one player from every other team. I already mentioned that Buster Posey was not the best pick at the catcher’s position, but I also ended up picking up Crawford, who had mediocre to bad projected z-scores for many scoring categories. This was a bad pick, and I ended up dropping him early in the season.
I’m a little surprised by Franco, who has a rather high z-score for being available until the 24th round, and Belt for the same reason (21st round).
In the next post, I’ll wrap this up and look at how my players actually did, comparing the projection to the final 2018 data. Which players were truly bad picks? Who outperformed their projection? Stay tuned!