I wonder which teams will end up making the best use of it - having a staff with the skill and equipment to build new models using the data for a payoff that is probably years down the line seems like a huge investment
I'm not going to answer this on the forum (for obvious reasons) but you have to keep in mind that the models are only good if you keep the same staff in place to see the return on your investments. Think about the changeover that happens when a GM is fired or when a team is sold (like the Padres) and the hiring and development process that each new "administration" undergoes to get their "system" in place. You could spend years getting this to give you the projections you want and end up fired before you get a chance to implement it.
For all the technical benefits, the lack of historical data is a shame.
Again, I'm not going to state why, but you have to keep in mind that most of the data being captured now was never captured before, or wasn't to the degree that it is now. What's going to be interesting is that as the data storage methods change (defensive plays can take up to 10-25G per play) what will they be able to track and what impact will that have moving forward. It's interesting, but I think for most "consumers" this crap is going to get abused to support dumbass narratives and theories.
Most of the pitching related data are already captured with PitchF/X which you can already get access to the rich, raw data using pitchRx in R (
http://cpsievert.github.io/pitchRx/). Of course, one of the issues with PitchF/X (which is probably not the case with the rest of this data) is that the pitch types and breaks are based on self-reported data from the pitchers. A guy that says he's throwing a two seam fast ball but actual failed throwing a cutter will have that pitch show up as a two seamer if it doesn't break.