FIP vs. SIERA, the value of WAR, and how much batting average matters: The Athletics advanced sta

Some days when I am looking at baseball statistics, I hear the voice of David Byrne in my head, singing Once In A Lifetime by the Talking Heads: You may ask yourself, Well, how did I get here?'

Some days when I am looking at baseball statistics, I hear the voice of David Byrne in my head, singing “Once In A Lifetime” by the Talking Heads: “You may ask yourself, ‘Well, how did I get here?'”

Such is the experience of being a contemporary baseball writer who nearly failed Algebra I in high school.

Advertisement

Now, when writing about the Yankees for The Athletic, I’ve come to rely heavily on advanced statistics to help me find new trends in player performance or correct the impression I’ve gotten from employing the eye test only. Unfortunately, the conversation around traditional statistics versus advanced statistics is often full of emotion, with the new-school field of stats seeming vast and inaccessible to many baseball fans.

Baseball is no longer measured in RBI and ERA alone, however. The information that teams are using to make roster and in-game decisions goes so far beyond a traditional slash line that it can be difficult to understand the thought process behind some of the game’s biggest moments.

The question is often, where to start? There are glossaries on FanGraphs, MLB and SABR, and our colleague at The Athletic, Keith Law, wrote a book called “Smart Baseball” that addresses the evolution of statistics in baseball.

In this mailbag, Lindsey Adler (me) and The Athletic’s Eno Sarris will attempt to answer some of our reader questions about analytics and hopefully provide some clarity on some key concepts.

Note: Submitted questions have been edited for clarity and length.

I’m so confused by defensive advanced stats in general. By the eye test, Yankees third baseman Gio Urshela is a tremendous defender, but by my understanding, the advanced stats see him more like average.

Urshela’s Outs Above Average (Statcast’s defensive metric) at third base in 2021 was minus-5, putting him 38th of 43 qualified MLB third basemen last year. I watch Urshela every day — and more importantly, I hear former third baseman Aaron Boone talk about Urshela all the time — and those of us who watch him closely are pretty damn impressed by his work at third.

Here’s a rough gist of how OAA evaluates a fielder: How far he has to run to catch the ball; how quickly he gets to the ball; how far he is from the base he’s throwing to; and how fast the runner is. Basically: How well does a fielder deal with timing up the play while dealing with a moving baseball and a moving baserunner?

Advertisement

Urshela rates as better to his right than to his left (I can confirm the eye test bears this out), but one way to think about him specifically versus what OAA is measuring is this: When Urshela gets to the ball, he always makes a great (and often successful) attempt at the play. He may not always get to the ball in time, which I think is why his attempts sometimes look so exciting/ridiculous/unbelievable.

He rates well on the eye test for a few reasons, in my opinion: 1) Bias of proximity — he took the third base job from Miguel Andújar and has for two years played next to Gleyber Torres, who struggled at short; 2) he’s fun as heck to watch and makes some gutsy plays; 3) his skill set (dealing with the ball once he gets to it) isn’t exactly what OAA and other range-based metrics are measuring.

— Lindsey

I get a little confused with FIP and SIERA. From what I understand, they’re opposites but I’m not sure which is the better metric to use to judge a pitcher.

A lot of the advanced metrics when it comes to pitching simply do not compute, so then I go by the eye test, which is apparently a very boomer-y thing to do (and that is the last thing I want to be).

Pitching metrics are confusing! At their heart is a disagreement about to what extent a pitcher can control outcomes on batted balls. A pitcher can elicit ground balls and fly balls — mostly by pitching to different locations in the zone, but also because certain types of movement matter. But once the ball is in play, the league-wide batting average doesn’t ever stray much from about .290 in any given season, and pitchers themselves tend to oscillate around that number.

That truth is the thinking behind FIP (Fielding Independent Pitching), which uses only three things as its inputs: strikeouts, walks, home runs. It’s a powerful stat, and it’s more predictive of future ERA than ERA itself at least. But it’s stuck between being descriptive/explaining what has happened and being more predictive/explaining what will happen. Including home runs allowed ends up adding a lot of noise, because we don’t know a pitcher’s true ability to limit home runs until we have nearly three seasons of data from that pitcher.

If FIP is the hammer, then SIERA (Skill-Interactive Earned Run Average) tries to be more precise by adding more batted-ball inputs and adjusting for those. One of the findings from SIERA research was that pitchers with elite ground-ball rates had lower batting averages on balls in play. So ground-ball rate is a positive in SIERA, but it’s not really accounted for in FIP. Here’s the full formula; it’s complicated.

Advertisement

SIERA is more predictive than FIP. That probably means it’s describing the true talent of the pitcher better than FIP, too. There have been advancements and other pitching estimators (like Pitching+ at The Athletic) that are even more predictive, and most of them attempt to tackle exactly how much control a pitcher has over balls in play. But in terms of easy-to-find estimators on a leaderboard, SIERA is pretty good, and better than FIP.

— Eno

What do the pluses and minuses signify and when/why are they useful?

The pluses/minuses signify that a statistic puts a player’s performance on a scale relative to a variety of factors. We know a hitter’s OPS in Colorado might be inflated versus, say, in San Francisco, where the marine layer creates an obstacle for aerodynamics and the dimensions are pretty big (and they were previously bigger).

Baseball is a funny sport. It’s typically played outside with no requirements for standard field dimensions or climate control. Imagine if NBA players played the same game on a variety of court dimensions.

Something like OPS+ takes a player’s raw OPS (Shohei Ohtani had a .965 OPS in 639 PA in 2021) and tells you whether a player was above average or below average, and by how much, using factors like the way a ballpark affects hitters or pitchers to somewhat neutralize the variety across the sport. “Average” for OPS+ is set at 100. Ohtani’s .965 OPS in 2021 was equivalent to a 158 OPS+, meaning he was 58 percent more valuable than the average MLB hitter.

Personally, I find things like OPS+, wRC+ and ERA+ to be pretty useful for writing purposes. It’s easy to include OPS+ with OPS and I don’t expect most readers to perceive the scale between a .900 OPS and a .965 OPS. (Frankly, I don’t, either.)

— Lindsey

Is batting average a useless statistic?

Batting average is a good statistic. It is not a statistic that tells the full story of a player’s production. Those things can co-exist.

The issue with batting average is that some people demonize it and some valorize it. Don’t trust anyone who reduces a ballplayer’s performance to his batting average; don’t trust anyone who disregards a player’s batting average as irrelevant.

Advertisement

For a player like Michael Brantley, who hit .311/.362/.437 with eight home runs in 2021, batting average is absolutely relevant to his performance. That’s his most reliable means of generating value. For Joey Gallo, who generates his value through walks and hitting for power, batting average is still not irrelevant in part because he specifically understands that he has to generate value by other means.

To me, the key is understanding that no statistic is perfect, but especially some of the standard “old-school” statistics. This is why we have a variety of statistics and why the original champions of sabermetrics were basically advocating for more creativity in assessing player performance. Batting average shouldn’t carry that burden alone.

Let’s do more to understand on-base percentage and hitting for extra bases so that average can be there as part of a little group instead of just out on its lonesome. That’s right: A slash line is canonically a group of little baseball stat friends. Let’s not banish one of the friends just because his other friends are a little bit hotter.

— Lindsey

I understand what BABIP means, but I was curious if it is always the case where a pitcher or batter will regress back to the mean average. How is it used in sorting out “this player is actually just good” versus “this player just got lucky for a year”?

It’s true that league-wide batting average on balls in play tends to stick close to .290 every year, but hitters have more control over their BABIP than pitchers. Simple evidence for this is that the spread in BABIPs even among large-sample pitchers is smaller than the one for hitters. For example, over the last decade, pitchers with at least 1,000 innings in the last decade range from having a .313 BABIP (Martin Perez) to .261 (Marco Estrada). Hitters with at least 3,000 plate appearances range from .351 (Mike Trout) to .244 (Brian McCann).

A few skills come into play when it comes to BABIP. How hard does the batter hit the ball? How fast is the batter? Batters who have high exit velocities and run fast have higher BABIPs. With the advent of the shift, left-handers who pull the ball on the ground a lot have low BABIPs. Also, fly balls have lower BABIPs than ground balls, so extreme fly-ball hitters will tend to have lower BABIPs.

All of that is to say that a high BABIP on Mike Trout is not the same as a high BABIP on Anthony Rizzo. Looking at current BABIP versus projected BABIP is a good way to get the most of this stat. Rizzo, a lefty who hits fly balls and isn’t very fast, is projected for a .267 BABIP on FanGraphs, while Trout is projected for one over .300.

Lastly, no batter consistently puts up BABIPs close to .400. So Tim Anderson (.372 BABIP last year) and Austin Riley (.368 BABIP) are likely to see some batting average regression, regardless of their ability to hit the ball hard and run fast.

Advertisement

— Eno

Is there a risk with WAR discussions being treated as absolute numbers as opposed to approximations, especially with potential for use in award bonuses (or if MLB has their way, arbitration)?

I think so, yes. In November 2020, Bill James posted a piece on his website about some of the flaws with Wins Above Replacement, a concept that— and I’m speaking a little on behalf here — has sort of become a runaway train in his opinion. Here’s part of his blog post:

“Look, I am not saying that WAR has no value, or that no system of WAR could ever (be) developed that is somewhat reliable. What I am saying is:

1) That the systems of WAR that we have now, while of course they are generally accurate in many cases, are not at all reliable,

2) That the primary reason that they are not reliable is not because of errors in any particular component, but rather, because in the calculation of a comparison derivative, there is the potential that the sum of the errors could be greatly magnified,

3) It is insane to rely on the outcome of a comparison derivative based on estimates, unless those estimates are fantastically accurate, and

4) It will be decades before sabermetrics has accurate estimates of all of the components of performance evaluation, if we ever get there. We certainly will never get there in my lifetime.”

To me, WAR is a tool to get you in the ballpark of player evaluation — not the final word on it. Think about the guys who make up a baseball team. You have guys who look like Giancarlo Stanton and guys who look like Jose Altuve. Or you can have a Sergio Romo and a Max Scherzer. Individual skill sets still matter in baseball, and the things we value in players change fairly often. To me, I don’t expect to ever have a statistic that perfectly measures a player’s value when there are so many variables.

Advertisement

The way I think about WAR is that it’s an imperfect guide to determine where a player’s season falls on the spectrum from bad to good. I’m not necessarily going to get wrapped up in parsing the difference between Vladimir Guerrero Jr. accumulating 6.7 fWAR in 2021 versus Bryce Harper accumulating 6.6 fWAR. That 0.1 win above replacement, in my opinion, shouldn’t be a determining factor in things like award voting and especially in player arbitration.

— Lindsey

I would love to understand wRC+ versus OPS+. Both are a baseline of 100 and seem to be closely correlated, but are also different.

What’s the simplest way to understand wOBA?

The good news is that they are both pretty good and they are tightly correlated because they measure the same things. Both look at how often the batter does good things on the plate (get on base, hit for power, etc) and then park- and league-adjust them so that 100 is an average position player.

OPS has a slight flaw, though, as it doesn’t completely correctly weigh each event at the plate. You can tell it’s got some flaws just by noticing that the denominator for slugging percentage (at-bats) is different from the denominator for on-base percentage (plate appearances), and those are the two ingredients for OPS.

wRC+ is based on wOBA, or weighted on-base average. wOBA attempts to weight each event based on how much it improves a team’s chances of scoring runs, and then adds all those events together. For example, with nobody out and nobody on, a team in today’s game can expect to score 0.53 runs in an inning. With someone on second and no outs, a team can expect to score 1.17 runs. That pegs the true value of a double to a team at 0.64 runs. Using this run expectancy setup, you can assign a value in runs to all of the different things a batter can do at the plate, and then sum it up in wOBA.

Long story short: wOBA and wRC+ are more precise, but OPS and OPS+ are very similar, and if you’re used to them, the added precision may not be worth much to you.

— Eno

What does “expected” mean in the context of things like xWOBA?

“Expected” is basically a Statcast term that uses different information for its algorithms than what most non-MLB-employed statisticians use in their research. For a statistic like wOBA, it’s a way of packaging data about what happened on the field. How often did a player get on base and in what way?

Statcast can use things like exit velocity and launch angle to basically build a database of outcomes: A ball hit at X mph at a X degree angle has wound up as a hit/double/home run X percent of the time. This is an attempt at measuring what a hitter did to create a good outcome for himself, regardless of defense. A right-handed hitter hitting a well-struck line drive might have one outcome with Nolan Arenado at third base versus if it’s Rafael Devers.

Advertisement

Hitting a well-struck line drive down the line is still a pretty good outcome for a hitter, even if they get victimized by a fielder or field dimensions, or any other number of factors. An expected statistic helps measure the things that are within a hitter or pitcher’s control.

You’ll see this come up from beat writers throughout the season because many of us watch games with the Statcast live data page open on our laptops. If a hitter makes good contact and it winds up as an out, the page shows up what the “expected” batting average was and that kind of tells you if the hitter was unlucky to hit it where he did, or if it might have seemed more spectacular off the bat than it really was.

Understandably, many people do not care about expected statistics because at the end of the day, results are what matter. However, it is at least a little bit different for a player to go 1-for-4 with a robbing that kept him from going 2-for-4 than 1-for-4 with three routine outs.

Another non-Statcast statistic that uses this theory is Pythagorean win-loss record, which you can find on a team’s Baseball-Reference page. The formula basically uses run differential to tell you how many wins and losses a team would be expected to have given that ratio.

The 2021 Mariners had a 90-72 team record despite giving up 51 more runs than they scored throughout the season. Their Pythagorean win-loss record was 76-86, which is so drastic that it’s like the Mariners were doing a season-long bit when it comes to one-run games. (They even acknowledged their run differential by saying their “fun differential” was off the charts.) What we learn from this 14-game discrepancy is that the Mariners were really damn good in close games and that they basically operated at a non-sustainable pace all season. (By definition, though, it turned out to be sustainable for them.)

This can help us answer the question: Were the 2021 Mariners a good team or just lucky? They were lucky, and using other data, we can also determine that they were very good in very difficult scenarios.

— Lindsey

Why is the position penalty so severe for first basemen as far as WAR is concerned? Seems insane that Keith Hernandez, one of the best defensive first baseman of all time, is barely above neutral w/r/t defensive fWAR.

To make the defensive adjustments, you have to look at how players perform both offensively and defensively at multiple positions. With first base being at the bottom of the defensive spectrum, meaning the “easiest” position to play defensively, teams end up putting older and slower players at the position. Our ability to measure defense is always improving, but FanGraphs still uses the spectrum outlined by current MLBAM data architect Tom Tango in 2006 on his blog.

Defensive spectrum WAR adjustments

PositionAdjustment

Catcher

12.5 runs

Shortstop

7.5 runs

Second Base

2.5 runs

Third Base

2.5 runs

Center Field

2.5 runs

Left Field

-7.5 runs

Right Field

-7.5 runs

First Base

-12.5 runs

DH

-17.5 runs

*All per 162 defensive games

These factors may have added noise to the older position penalties used in WAR. Recent research seems to suggest the first base penalty is too large — Jeff Zimmerman looked in 2015 and thought the old range was too wide and that the penalty for first basemen should be smaller. A source confirmed that FanGraphs has considered changing the penalties and may in the future, but hasn’t since they were first implemented.

Advertisement

Though his new number for first base (minus-9.5 runs) doesn’t seem that far off of the old number, it adds up over a long career like the one Hernandez had. He could have racked up as much as three or four more wins over his career and end up closer to Ernie Banks than Tony Perez on the all-time list — then again, all the other first basemen would also get more benefit, so his place among them would remain about the same.

— Eno

Less hardcore analytics, more pitch development-y: Why does pitch shape matter, how can you change it, and how can a layperson use analytics to track change/efforts to improve? 

When we developed Stuff+ to try to capture the relative importance of all of the physical properties of a pitch, it did turn out that velocity was the heaviest lifter — but movement also predicted the quality of a pitch well. Here are the features in the model, ranked by their relative importance. 

Vertical movement difference and horizontal movement difference — movement as defined off of the primary fastball — are very important to the outcome of a pitch. Things like ride on the fastball (how much it “jumps” at the hitter) and drop on the breaking ball can be as influential on the outcome of a pitch as velocity. 

The raw materials for pitch movement are spin rate, spin axis and seam orientation. Spin rate is tough to change without sticky stuff, but the other two are where coaches shine. By changing the axis, you can help a pitcher harness more of their spin rate (be more efficient with their spin) or just change the raw movement. By changing the seam orientation, you can take advantage of seam effects like seam-shifted wake

All you’re trying to do, as you can see with this heat map that shows Jesús Luzardo’s fastballs (in black, with league average in gray, and league outcomes in color, with red as good), is move them from the blue closer to the red. Here, Luzardo would benefit greatly from more ride (positive vertical movement) on the four-seam or more glove-side fade (negative horizontal movement) on the sinker. 

Actually making those improvements happen can be a big ask, as a pitcher’s arm slot matters, and that is really difficult to change. But most often it’s about trying different grips and cues (like: come around the ball, stay through the ball, keep the wrist stiff, imagine it coming off your middle finger) while throwing in front of the high-speed cameras and tracking machines. The coach checks the data against the desired outcome and when it matches what they are trying to do, they give the pitcher an atta-boy. Enough atta-boys and you’ve got a new pitch. 

For those of us trying to track this on the outside, there are changes in Stuff+ to notice, and the color-coded movement profiles at Baseball Savant are also helpful. Vertical movement is generally more important than horizontal movement, but when a pitcher is doing better, and their movements here have changed, that’s always interesting. 

— Eno

Why should I, a casual fan, care about anything Statcast/advanced baseball statistics related?

The short answer is, unfortunately, that a casual fan should care about advanced statistics (to an extent) because front offices care about advanced statistics. Now, front offices have access to massive volumes of data and are in many cases using proprietary algorithms that may be slightly to drastically different from what we use in the public sphere, so even reading FanGraphs regularly won’t give you the same degree of knowledge as a front office.

Advertisement

However, I worry a lot about the gap between how the public views baseball and baseball players and how teams view baseball and baseball players. I imagine if you are a fan who primarily values batting average and runs batted in, the contemporary game is very confusing. I imagine it is very difficult to watch someone like Joey Gallo, who creates value through a high on-base percentage and slugging, and understand why he’s worth a spot in the center of the lineup. (His second-half with the Yankees in 2021 was significantly worse than his first half with Texas, but even Rangers fans who have watched his entire career find the low batting average to be frustrating.)

I don’t think most fans need to know many of the things we’re addressing in this mailbag — parsing the difference between defensive statistics. But I do imagine it’s pretty difficult to understand and really enjoy baseball right now if you don’t understand the importance of things like on-base percentage or the way a high spin rate on a fastball can make a pitcher with a 4.50 ERA seem appealing to another team at the trade deadline.

My opinion is that it’s kind of a crisis for baseball that team-side player evaluation is so drastically different from how the public evaluates players and that gap is only going to continue to grow. But there are some ways fans can come to understand team decisions a little bit better, which might help with some confusion and frustration.

— Lindsey

If you ran a baseball broadcast, what stat lines would you put on the chyron when a batter comes up/pitcher takes the hill?

We’ll both give this one a go.

For hitters, I’d go with average, on-base percentage, slugging percentage, OPS+ and WAR. That might surprise you, after I said that wOBA and wRC+ were more precise than their OPS brethren, but sometimes more precision isn’t worth all the explanation required to get people up and running. A lot of fans know what OPS is, so let’s show them that because it helps them understand how good the batter is at making contact, getting on base and hitting for power. Then add the park-adjusted one to boot. I would include WAR because I do believe it’s important to get a holistic sense of a player’s capabilities in all facets of the game.

For pitchers it’s more difficult question. Everyone knows ERA, so that’s going on the board, as is innings pitched. I think I would focus on making it fun and easy to follow along in each plate appearance by listing out their pitches, either with percentages thrown or velocities or both. Is that possible? “Four-Seamer, 97 mph, 54% usage.” How good is this guy, and what does he throw? Those are the two questions I have when a pitcher steps to the mound, provided of course everyone agrees it’s crazy to put Stuff+/Location+/Pitching+ up there.

– Eno

Great question. My only real qualm with ERA is that one eight-run outing for a starter can really inflate it, so I’d just like more context. Something like including average innings pitched per outing and average runs allowed in an appearance (for a starter?). This is where I’d like to put ERA+ up there because who can actually keep track of what’s an average ERA these days? I agree with Eno that we could have things like batter-handedness splits, even two-strike figures or something like that. Maybe instead of small sample pitcher-versus-batter stats. Basically, I’d like to add just a bit of context that tells you more about how a pitcher has typically performed in an outing. Telling me that a starter (and especially a reliever) has a 4.00 ERA doesn’t precisely tell me all that much about what his typical outing is like.

Advertisement

For hitters, I think I’d go with a standard slash-line, but probably with the addition of wRC+ or OPS+ to help contextualize overall offensive contributions. Maybe even putting league-average data for batting average/on-base percentage/slugging, or something that helps demonstrate where a player makes his offensive mark.

— Lindsey

(Photo: Hunter Martin / Getty Images)

ncG1vNJzZmismJqutbTLnquim16YvK57kmlwam5gZnxzfJFrZmlpX2eEcLLIqWSvq12otqa%2BwGaroZ1dq66twcRmpp9lp5a%2Fbq3NnWShp6diuravx2aZmqyknruoecCvnKuZl5p6rq3TrZyrq12ptaZ5wK2fpZ2knrC0ecCdrZqmk5qxbr%2FTmqusZZ2Wtq2uwKBm

 Share!