Author Topic: Stats. Giggity! (Read 40048 times)

blue911 · « **Reply #350:** March 23, 2015, 05:11:02 PM »

Quote from: houston-nat on March 23, 2015, 04:44:36 PM

They're using wGIDP? Which I guess is a new weighted double play thingamajig?

Is there a xGIDP? Where they could see how many ground balls where hit in double play situations and come up with an expected number? A weighed stat would still use GIDP which isn't context neutral.

houston-nat · « **Reply #351:** March 23, 2015, 05:12:55 PM »

Quote from: blue911 on March 23, 2015, 05:11:02 PM

Is there a xGIDP? Where they could see how many ground balls where hit in double play situations and come up with an expected number? A weighed stat would still use GIDP which isn't context neutral.

Maybe this link and comments section will help?

blue911 · « **Reply #352:** March 23, 2015, 05:14:42 PM »

No that wouldn't work either. Maybe something along the lines of how fly balls are used in xFIP

blue911 · « **Reply #353:** March 23, 2015, 05:19:16 PM »

Quote from: houston-nat on March 23, 2015, 05:12:55 PM

Maybe this link and comments section will help?

Nothing. Nor can I dig it out of Google. They need to explain the guts of wGDP

MarquisDeSade · « **Reply #354:** March 24, 2015, 08:28:45 AM »

Quote from: blue911 on March 23, 2015, 05:19:16 PM

Nothing. Nor can I dig it out of Google. They need to explain the guts of wGDP

You and I both know that's not going to happen. This is why I can't take FanGraphs serious. Well, that and half the commenters and writers failing to understand basic math.

blue911 · « **Reply #355:** March 24, 2015, 08:37:31 AM »

Quote from: MarquisDeSade on March 24, 2015, 08:28:45 AM

You and I both know that's not going to happen. This is why I can't take FanGraphs serious. Well, that and half the commenters and writers failing to understand basic math.

I'm not sure why or how you go about the weighting process. Adam Dunn didn't hit ground balls for the same reason he struck out in bunches,hit a crap ton of home runs and had a low batting average. His swing plane. Period. It wasn't because he was some anti-GIDP monster. And no if he struck out less he wouldn't have hit into a significantly greater number of double plays.

PC · « **Reply #356:** April 03, 2015, 03:25:43 PM »

baseball-reference play index has a free trial through the 15th

http://www.sports-reference.com/blog/2015/03/get-a-free-baseball-reference-play-index-trial-through-april-15/

mitlen · « **Reply #357:** April 03, 2015, 06:35:10 PM »

Nice story on Bean and his former guys and stats ladies on Newshour/PBS tonight. Should be on WETA around 7:30 PM.

blue911 · « **Reply #358:** April 17, 2015, 07:22:48 AM »

Introduction to Statcast

http://m.mlb.com/news/article/118508858/major-league-baseballs-statcast-glossary-of-terms-of-state-of-the-art-tracking-technology

PebbleBall · « **Reply #359:** April 17, 2015, 09:15:04 AM »

Quote from: blue911 on April 17, 2015, 07:22:48 AM

Introduction to Statcast

http://m.mlb.com/news/article/118508858/major-league-baseballs-statcast-glossary-of-terms-of-state-of-the-art-tracking-technology

I wonder how long it will take to provide useful context for some of this. For all the technical benefits, the lack of historical data is a shame.

HalfSmokes · « **Reply #360:** April 17, 2015, 09:23:51 AM »

I wonder which teams will end up making the best use of it - having a staff with the skill and equipment to build new models using the data for a payoff that is probably years down the line seems like a huge investment

blue911 · « **Reply #361:** April 17, 2015, 09:28:47 AM »

Quote from: PebbleBall on April 17, 2015, 09:15:04 AM

I wonder how long it will take to provide useful context for some of this. For all the technical benefits, the lack of historical data is a shame.

Never. The data is proprietary to MLB and they aren't going to share it with any outside source. Fast,Wyers,McCracken,Lichtman,Pavlides et al have been either hired by a MLB club or are working under contract with MLB and have signed non disclosure agreements. Pitchf/x wasn't supposed to be made public and the owners aren't happy that it's public knowledge.

What the public will get is a marketing campaign from MLB. The teams have been receiving the component level of the data for about 5 years.

MarquisDeSade · « **Reply #362:** April 17, 2015, 09:34:29 AM »

Quote from: HalfSmokes on April 17, 2015, 09:23:51 AM

I wonder which teams will end up making the best use of it - having a staff with the skill and equipment to build new models using the data for a payoff that is probably years down the line seems like a huge investment

I'm not going to answer this on the forum (for obvious reasons) but you have to keep in mind that the models are only good if you keep the same staff in place to see the return on your investments. Think about the changeover that happens when a GM is fired or when a team is sold (like the Padres) and the hiring and development process that each new "administration" undergoes to get their "system" in place. You could spend years getting this to give you the projections you want and end up fired before you get a chance to implement it.

Quote from: PebbleBall on April 17, 2015, 09:15:04 AM

For all the technical benefits, the lack of historical data is a shame.

Again, I'm not going to state why, but you have to keep in mind that most of the data being captured now was never captured before, or wasn't to the degree that it is now. What's going to be interesting is that as the data storage methods change (defensive plays can take up to 10-25G per play) what will they be able to track and what impact will that have moving forward. It's interesting, but I think for most "consumers" this crap is going to get abused to support dumbass narratives and theories.

Most of the pitching related data are already captured with PitchF/X which you can already get access to the rich, raw data using pitchRx in R (http://cpsievert.github.io/pitchRx/). Of course, one of the issues with PitchF/X (which is probably not the case with the rest of this data) is that the pitch types and breaks are based on self-reported data from the pitchers. A guy that says he's throwing a two seam fast ball but actual failed throwing a cutter will have that pitch show up as a two seamer if it doesn't break.

HalfSmokes · « **Reply #363:** April 17, 2015, 09:37:26 AM »

Quote from: MarquisDeSade on April 17, 2015, 09:34:29 AM

I'm not going to answer this on the forum (for obvious reasons) but you have to keep in mind that the models are only good if you keep the same staff in place to see the return on your investments. Think about the changeover that happens when a GM is fired or when a team is sold (like the Padres) and the hiring and development process that each new "administration" undergoes to get their "system" in place. You could spend years getting this to give you the projections you want and end up fired before you get a chance to implement it.

That's why I wonder, it's an investment for something that you'd expect to bear fruit a decade down the line.

PebbleBall · « **Reply #364:** April 17, 2015, 09:37:53 AM »

Quote from: blue911 on April 17, 2015, 09:28:47 AM

Never. The data is proprietary to MLB and they aren't going to share it with any outside source. Fast,Wyers,McCracken,Lichtman,Pavlides et al have been either hired by a MLB club or are working under contract with MLB and have signed non disclosure agreements. Pitchf/x wasn't supposed to be made public and the owners aren't happy that it's public knowledge.

What the public will get is a marketing campaign from MLB. The teams have been receiving the component level of the data for about 5 years.

I only mean for fan purposes. For the information that's public, at what point will we look at it and have some rough idea of how these numbers look against a meaningful standard?

MarquisDeSade · « **Reply #365:** April 17, 2015, 09:39:49 AM »

Quote from: blue911 on April 17, 2015, 09:28:47 AM

Never. The data is proprietary to MLB and they aren't going to share it with any outside source. Fast,Wyers,McCracken,Lichtman,Pavlides et al have been either hired by a MLB club or are working under contract with MLB and have signed non disclosure agreements. Pitchf/x wasn't supposed to be made public and the owners aren't happy that it's public knowledge.

What the public will get is a marketing campaign from MLB. The teams have been receiving the component level of the data for about 5 years.

That's true to a certain extent. Cleveland had their own GroundF/x system in place for years before the camera based tracking systems started getting tested. This "new system" is really just an expansion on the old systems that were in place with better cameras and computer processors to speed up the analysis. I do wonder if there's going to be a "hidden in plain sight" API or XML data stream for this data like there is for PitchF/X. Most people (read: 99%) wouldn't know what to do with it if they had it but for those of us that do it would be pretty cool to figure out what went wrong or right during a particular play or at bat. At the same time, "consumer" level analysis (read: FanGraphs) is so piss poor and based on bad data that releasing this to the general public would just add more fuel to the "bad analysis" fire.

PebbleBall · « **Reply #366:** April 17, 2015, 09:40:26 AM »

Quote from: MarquisDeSade on April 17, 2015, 09:34:29 AM

Again, I'm not going to state why, but you have to keep in mind that most of the data being captured now was never captured before, or wasn't to the degree that it is now. What's going to be interesting is that as the data storage methods change (defensive plays can take up to 10-25G per play) what will they be able to track and what impact will that have moving forward. It's interesting, but I think for most "consumers" this crap is going to get abused to support dumbass narratives and theories.

This is exactly what I'm talking about. Moving forward - how long would it take to build a foundation so that it can have some broad meaning when somebody cites it.

MarquisDeSade · « **Reply #367:** April 17, 2015, 09:45:36 AM »

Quote from: HalfSmokes on April 17, 2015, 09:37:26 AM

That's why I wonder, it's an investment for something that you'd expect to bear fruit a decade down the line.

It's a solution looking for a problem. Whoever developed and sold this did a great job on it. The problem is that most teams are going to have to decide how much time and, more importantly, FTEs they're going to put into using this versus the way they've done it before. For some teams I doubt they're even using the systems they've had in place for years, much less anything that takes actually spending time and money to understand and tweak.

Quote from: PebbleBall on April 17, 2015, 09:37:53 AM

I only mean for fan purposes. For the information that's public, at what point will we look at it and have some rough idea of how these numbers look against a meaningful standard?

You'll get some cool graphics and to see Brian Kenny suffer a boner induced coma but otherwise the typical fan isn't going to get to use this to make arguments or narratives on outside of what MLB allows to be broadcast or transmitted. Besides, do you really need to know what the infinitesimal amount of effort, time, and other captured data it required for Lorenzo Cain to cover 75' to rob Joe Mauer of a double when you can just watch the highlights on MLB.com or Sportscenter?

MarquisDeSade · « **Reply #368:** April 17, 2015, 09:47:15 AM »

Quote from: PebbleBall on April 17, 2015, 09:40:26 AM

This is exactly what I'm talking about. Moving forward - how long would it take to build a foundation so that it can have some broad meaning when somebody cites it.

Unless MLB releases it in a psuedo-public way like they did for PitchFX (which is only available by stripping XML data) never. You can always subscribe to it but, based on what the quotes were from Stats when I did side work for an odds maker, I don't think you'd want to pay that premium.

PebbleBall · « **Reply #369:** April 17, 2015, 09:47:24 AM »

Quote from: MarquisDeSade on April 17, 2015, 09:45:36 AM

Besides, do you really need to know what the infinitesimal amount of effort, time, and other captured data it required for Lorenzo Cain to cover 75' to rob Joe Mauer of a double when you can just watch the highlights on MLB.com or Sportscenter?

No, of course not. I'm only trying to build my case for dismissing the tidal wave of trivial information it's going to produce.

MarquisDeSade · « **Reply #370:** April 17, 2015, 09:53:41 AM »

Quote from: PebbleBall on April 17, 2015, 09:47:24 AM

No, of course not. I'm only trying to build my case for dismissing the tidal wave of trivial information it's going to produce.

My advice would be to just ignore it. I'm sure the "Sunday Night Baseball" donks and Brian Kenny and Tom Verducci will have all kinds of worthless data to throw out there but, and this is just me, I'd ignore it and pretend it's fantasy. I'm sure teams are going to get some worthwhile data and actionable analysis out of this but the average fan is going to get cherry picked and poorly explained metrics that have no impact on the game in question or any meaningful context for comparative analysis.

blue911 · « **Reply #371:** April 17, 2015, 10:10:19 AM »

Quote from: MarquisDeSade on April 17, 2015, 09:34:29 AM

I'm not going to answer this on the forum (for obvious reasons) but you have to keep in mind that the models are only good if you keep the same staff in place to see the return on your investments. Think about the changeover that happens when a GM is fired or when a team is sold (like the Padres) and the hiring and development process that each new "administration" undergoes to get their "system" in place. You could spend years getting this to give you the projections you want and end up fired before you get a chance to implement it.

Again, I'm not going to state why, but you have to keep in mind that most of the data being captured now was never captured before, or wasn't to the degree that it is now. What's going to be interesting is that as the data storage methods change (defensive plays can take up to 10-25G per play) what will they be able to track and what impact will that have moving forward. It's interesting, but I think for most "consumers" this crap is going to get abused to support dumbass narratives and theories.

Most of the pitching related data are already captured with PitchF/X which you can already get access to the rich, raw data using pitchRx in R (http://cpsievert.github.io/pitchRx/). Of course, one of the issues with PitchF/X (which is probably not the case with the rest of this data) is that the pitch types and breaks are based on self-reported data from the pitchers. A guy that says he's throwing a two seam fast ball but actual failed throwing a cutter will have that pitch show up as a two seamer if it doesn't break.

A story came out around a year ago (maybe two,hell I'm old) about a MLB team buying a Cray and people thought it was crazy to spend so much on a top end machine. I'm not an expert but it appears that a high end machine would be a nice,but not necessary, purchase/toy.

blue911 · « **Reply #372:** April 17, 2015, 10:16:51 AM »

Quote from: PebbleBall on April 17, 2015, 09:47:24 AM

No, of course not. I'm only trying to build my case for dismissing the tidal wave of trivial information it's going to produce.

I bang on Fangraphs because they are the site that most people misuse, so adding some skepticism is meant to act as a brake to bad "analytics". Most people only cite stats that back whatever argument they happen to be involved with at the time and not look at what these metrics are attempting to measure (thus the horrible misunderstanding about UZR).

MarquisDeSade · « **Reply #373:** April 17, 2015, 10:17:39 AM »

Quote from: blue911 on April 17, 2015, 10:10:19 AM

A story came out around a year ago (maybe two,hell I'm old) about a MLB team buying a Cray and people thought it was crazy to spend so much on a top end machine. I'm not an expert but it appears that a high end machine would be a nice,but not necessary, purchase/toy.

If it costs you a $1M for the hardware to build lineup projection models or to identify "injury markers" that's money well spent. I mean, this is just me talking, but I would have preferred Rizzo and Nats spend $7M on data servers than the less than 100 innings we got out of Chien Ming Wang.

blue911 · « **Reply #374:** April 17, 2015, 10:20:15 AM »

Quote from: MarquisDeSade on April 17, 2015, 10:17:39 AM

If it costs you a $1M for the hardware to build lineup projection models or to identify "injury markers" that's money well spent. I mean, this is just me talking, but I would have preferred Rizzo and Nats spend $7M on data servers than the less than 100 innings we got out of Chien Ming Wang.

Yep and I don't know a damn thing about data warehousing when used in a production environment.