Author Topic: Advanced Analytics  (Read 424 times)

0 Members and 1 Guest are viewing this topic.

Offline bluestreak

  • Global Moderator
  • ****
  • Posts: 11259
Re: Advanced Analytics
« Topic Start: June 17, 2019, 04:32:43 PM »
Two reasons.  One, because there is a high chance that there is a different indicator that captures the same causal relationship better (intercorrelation).  For example, team batting average has a high correlation with runs scored.  That would lead you to focus on batting average.  It turns out that OPS has a higher correlation than batting average alone.   

Two (and this is rarer): there are traits that predict success much more reliably at lower levels than higher ones.  The classic one is birth dates.  Because most youth baseball leagues use 7/31 as the age cutoff for each year, kids get picked for youth elite travel teams at a much higher rate if they happen to be one of the oldest players in their cohort than the youngest.  My travel teams growing up were a lot of August/September/October kids, because a few extra months of physical maturity matters a lot when you're talking about 13-year-olds.  That trait is fairly highly correlated to performance even through high school, but it does not persist much at all beyond that.   In other words, there is causation in one data set that will not persist as a causal factor in a different one.

But batting average doesn't have a correlation without causation. It's highly correlated because a higher batting average is correlated with a high OPS because its a major component of both OBP and slugging.
This isn't some random occurrence that happens to correlate. It's pretty clear that more base hits will cause you to score more runs. Just because another stat is better, doesn't mean to other one isn't causitive.