Author Topic: Advanced Analytics  (Read 414 times)

0 Members and 1 Guest are viewing this topic.

Offline varoadking

  • Posts: 29499
  • King of Goodness
Advanced Analytics
« Topic Start: June 13, 2019, 12:51:41 AM »
Love it...

Scott VanPelt on ESPN just referred to them as "Made up stuff."

But then, we already knew that...


Offline Mathguy

  • Posts: 9162
  • Floyd - Truely Man's best Friend
    • Outer Banks Beach House
Re: Advanced Analytics
« Reply #1: June 15, 2019, 11:45:23 AM »
That's tough to identify at times.  There are these data know as Spurious Statistics because while the numbers exist, these statistics don't have any meaning in understanding the cause & effect of a given circumstance.

Love it...

Scott VanPelt on ESPN just referred to them as "Made up stuff."

But then, we already knew that...



Offline mitlen

  • Posts: 66171
  • We had 'em all the way.
Re: Advanced Analytics
« Reply #2: June 15, 2019, 12:08:38 PM »
I'm with Samuel L. Clemens who popularized, "There are three kinds of lies: lies, damned lies, and statistics."    In the 21st century, we can add advanced analytics.

Offline bluestreak

  • Global Moderator
  • ****
  • Posts: 11259
Re: Advanced Analytics
« Reply #3: June 15, 2019, 12:26:24 PM »
That's tough to identify at times.  There are these data know as Spurious Statistics because while the numbers exist, these statistics don't have any meaning in understanding the cause & effect of a given circumstance.

Wut?

Offline Elvir Ovcina

  • Posts: 5542
Re: Advanced Analytics
« Reply #4: June 17, 2019, 03:10:35 PM »
Wut?

The causation/correlation problem. 

Online HalfSmokes

  • Posts: 21606
Re: Advanced Analytics
« Reply #5: June 17, 2019, 03:13:07 PM »
The causation/correlation problem. 

For the purposes of baseball scouting, if a trait is so heavily correlated with success that it is hard to tell if causation is present, why would a talent evaluator care that there is no causation?

Offline Elvir Ovcina

  • Posts: 5542
Re: Advanced Analytics
« Reply #6: June 17, 2019, 03:32:47 PM »
For the purposes of baseball scouting, if a trait is so heavily correlated with success that it is hard to tell if causation is present, why would a talent evaluator care that there is no causation?

Two reasons.  One, because there is a high chance that there is a different indicator that captures the same causal relationship better (intercorrelation).  For example, team batting average has a high correlation with runs scored.  That would lead you to focus on batting average.  It turns out that OPS has a higher correlation than batting average alone.   

Two (and this is rarer): there are traits that predict success much more reliably at lower levels than higher ones.  The classic one is birth dates.  Because most youth baseball leagues use 7/31 as the age cutoff for each year, kids get picked for youth elite travel teams at a much higher rate if they happen to be one of the oldest players in their cohort than the youngest.  My travel teams growing up were a lot of August/September/October kids, because a few extra months of physical maturity matters a lot when you're talking about 13-year-olds.  That trait is fairly highly correlated to performance even through high school, but it does not persist much at all beyond that.   In other words, there is causation in one data set that will not persist as a causal factor in a different one. 

Online HalfSmokes

  • Posts: 21606
Re: Advanced Analytics
« Reply #7: June 17, 2019, 03:39:30 PM »
Two reasons.  One, because there is a high chance that there is a different indicator that captures the same causal relationship better (intercorrelation).  For example, team batting average has a high correlation with runs scored.  That would lead you to focus on batting average.  It turns out that OPS has a higher correlation than batting average alone.   

Two (and this is rarer): there are traits that predict success much more reliably at lower levels than higher ones.  The classic one is birth dates.  Because most youth baseball leagues use 7/31 as the age cutoff for each year, kids get picked for youth elite travel teams at a much higher rate if they happen to be one of the oldest players in their cohort than the youngest.  My travel teams growing up were a lot of August/September/October kids, because a few extra months of physical maturity matters a lot when you're talking about 13-year-olds.  That trait is fairly highly correlated to performance even through high school, but it does not persist much at all beyond that.   In other words, there is causation in one data set that will not persist as a causal factor in a different one. 

In both of your examples you can easily see that correlation isn’t causation

Offline Elvir Ovcina

  • Posts: 5542
Re: Advanced Analytics
« Reply #8: June 17, 2019, 03:52:29 PM »
In both of your examples you can easily see that correlation isn’t causation

Except that it took decades of baseball scouting and then nerds moving into the front office to figure either one out, so apparently it's not so intuitive to baseball scouts. 

Online HalfSmokes

  • Posts: 21606
Re: Advanced Analytics
« Reply #9: June 17, 2019, 04:02:46 PM »
Except that it took decades of baseball scouting and then nerds moving into the front office to figure either one out, so apparently it's not so intuitive to baseball scouts. 

The nerds have a say in front offices now (hopefully). If you’re talking about math guy’s example, it would be more along the lines of whether or not a high school coach should care that birthday and skill aren’t causally related; that may be the case, but if birthday and high school performance are so strongly correlated, why should he care?

Offline UMDNats

  • Posts: 18063
Re: Advanced Analytics
« Reply #10: June 17, 2019, 04:10:59 PM »
Love it...

Scott VanPelt on ESPN just referred to them as "Made up stuff."

But then, we already knew that...



Baseball itself is just "made-up stuff." Everything is made-up. The only things that aren't are death and the Nationals choking in the NLDS. Happy Monday!

Offline Elvir Ovcina

  • Posts: 5542
Re: Advanced Analytics
« Reply #11: June 17, 2019, 04:12:27 PM »
The nerds have a say in front offices now (hopefully). If you’re talking about math guy’s example, it would be more along the lines of whether or not a high school coach should care that birthday and skill aren’t causally related; that may be the case, but if birthday and high school performance are so strongly correlated, why should he care?

Well, in high school they are causally related, so it's not the greatest example.  The problem overall is picking up on the wrong signal.   It could be that there's a better signal, or it could be that there's a problem in extrapolating going forward.  But the issue either way is that statistics in scouting is an optimization problem.  There's no perfect player, so if you zoom in on the wrong set of numbers, you'll get suboptimal results regardless of whether your problem is true spurious correlation (pretty rare in large data sets like these) or intercorrelation or some other issue.    I do worry that some of the newest generation of stats are getting into thorny statistical territory. 

Online HalfSmokes

  • Posts: 21606
Re: Advanced Analytics
« Reply #12: June 17, 2019, 04:15:36 PM »
My feeling so that even if you’re picking up on noise, if that noise is strongly enough correlated with a signal that it’s hard to distinguish the two, there may not be much reason to exclude the noise, especially considering how deep a data set baseball provides to test theories

Offline bluestreak

  • Global Moderator
  • ****
  • Posts: 11259
Re: Advanced Analytics
« Reply #13: June 17, 2019, 04:32:43 PM »
Two reasons.  One, because there is a high chance that there is a different indicator that captures the same causal relationship better (intercorrelation).  For example, team batting average has a high correlation with runs scored.  That would lead you to focus on batting average.  It turns out that OPS has a higher correlation than batting average alone.   

Two (and this is rarer): there are traits that predict success much more reliably at lower levels than higher ones.  The classic one is birth dates.  Because most youth baseball leagues use 7/31 as the age cutoff for each year, kids get picked for youth elite travel teams at a much higher rate if they happen to be one of the oldest players in their cohort than the youngest.  My travel teams growing up were a lot of August/September/October kids, because a few extra months of physical maturity matters a lot when you're talking about 13-year-olds.  That trait is fairly highly correlated to performance even through high school, but it does not persist much at all beyond that.   In other words, there is causation in one data set that will not persist as a causal factor in a different one.

But batting average doesn't have a correlation without causation. It's highly correlated because a higher batting average is correlated with a high OPS because its a major component of both OBP and slugging.
This isn't some random occurrence that happens to correlate. It's pretty clear that more base hits will cause you to score more runs. Just because another stat is better, doesn't mean to other one isn't causitive. 

Offline Elvir Ovcina

  • Posts: 5542
Re: Advanced Analytics
« Reply #14: June 17, 2019, 05:05:09 PM »
But batting average doesn't have a correlation without causation. It's highly correlated because a higher batting average is correlated with a high OPS because its a major component of both OBP and slugging.
This isn't some random occurrence that happens to correlate. It's pretty clear that more base hits will cause you to score more runs. Just because another stat is better, doesn't mean to other one isn't causitive.

That's true (it's not a true spurious correlation), but it's still sub-optimal.  I'm having trouble coming up with true spurious correlations in baseball stats, which makes sense as the causal mechanisms are usually so intuitive.

Offline JCA-CrystalCity

  • Global Moderator
  • ****
  • Posts: 39410
  • Platoon - not just a movie, a baseball obsession
Re: Advanced Analytics
« Reply #15: June 19, 2019, 02:44:35 PM »
VaRK is forbidden from reading this because it has numbers that back it's main point, which is Anthony Rendon is better than ever.

https://blogs.fangraphs.com/anthony-rendon-keeps-getting-better/

You can sneak a peak at the conclusion.
Quote
There are no two ways about it: Anthony Rendon is hitting the ball harder than he ever has, better than he ever has. Maybe it’s a byproduct of his improved selectivity and aggression on the first pitch, and maybe not, but it certainly looks to my eyes like the two are connected. Some things with Rendon never change. He still has the same stance, the same keen eye and patient disposition at the plate. He still doesn’t get the recognition he deserves as one of the best players in baseball. This year, however, things might be different. If he keeps hitting like he has and maintains his aggression on first-pitch strikes while getting ahead in the count when pitchers don’t challenge him, he’ll be impossible to ignore. That’s the theory, at least. In his career so far, Rendon has been both excellent and underrated, and that’s one trend that hasn’t changed in 2019.