Friday, 20 February 2015

Thoughts on PDO

I'm going to presume if you're here that you know a bit about PDO

*cue sage nodding*

or at least what it stands for.

*silence*


It doesn't stand for anything, but we'll not get hung up on that.  That's the NHL's job, for now.  James Grayson popularised its use in football (indeed, his treasure trove blog houses 57 articles on the subject, if you're not up to speed spend a weekend over there!) and it is a very useful tool that helps describe teams that appear to be over or underachieving in relation to some of their underlying numbers.   

For example, since losing to Man City and only recording one defeat in 15 games since, Man Utd have had a giddily high PDO rating; a clear example of a team that is overperforming against reasonable expectation.  Arsenal have put up solid shot numbers, certainly better than last season, when high PDO kicked them nominally into title contention.  Later it cooled a little, they dropped off and found their way back to 4th, a more accurate reflection.  This year their PDO has been in the bin and they're two points behind Utd despite projecting to be much better, and are down in amongst the CL pack.

More examples? West Ham have spent the whole season with a high PDO, as have Swansea.  They look to be having good years.  Moreso than say Newcastle or Stoke, who are both generating a low figure.  I suspect the general skill level of these four teams to be broadly similar (In fact a model i've built ranks Stoke 9th, Newcastle 10th, West Ham 11th and Swansea 14th) and the divergence from baseline numbers can at least be partially explained by PDO variance.

So what we have is a good storytelling aid which I find interesting and good fun.

"But the table doesn't lie"

*grits teeth*

No, but it doesn't reveal all it's truths either...

Despite generally liking PDO (and in amongst the stats community not everyone does) I do have issues with its current construction.  Because it has been ported straight over from hockey, parts of it's derivation have been left unquestioned and I think there are a couple of subtle tweaks which may enhance general understanding.  You say tomato, I say tomato.  Hmm... doesn't work so well on paper.
Never mind.

As it stands PDO is:

(goals for divided by shots on target for) plus (saves divided by shots on target against)

AKA

(shooting%) plus (save%)

...then you get a grey area where some people multiply the derived figure by 1000, some by 100 or others, well, me at least, leave it as a decimal. 

Immediately in the definition and the creation of the number, we have a couple of issues.
  • Why are we adding entirely different aspects together? (shooting% and save%)
  • Why isn't there a standardised numerical format?
I asked Ben Pugsley, because he knows a ton more than I do: "Why is it rated to 100? Or 1000?" and he said that it was a "long held thing in analytics (...) from baseball (...) 100 is defined as an "average" " and that the usage of 1000 was purely a method to add detail and see 4 digits.

So we have elements of "this is the way it's done because this is the way it's always been done."
That's not intrinsically bad, but to me we have these figures orbiting around seemingly meaningless totals:

"PDO is 102!"
"One hundred and two whats?"

"Man Utd are at 1120!"
"I'll fit my brunch in beforehand."

Can anything be done simplify matters?

I am presuming there was a comfort found in adding your team's shooting percentage to it's save percentage; you have built a single figure for your team and you are defining what your team is doing but I feel there is more clarity in the entirely related but subtly different:

"What is my team doing and what is the opposition doing against us?"

I propose this, and I propose it with goodwill but little expectation:

(goals for divided by shots on target for) minus (goals against divided by shots on target against)

AKA:

(shooting % For) minus (shooting% Against)

This does two things:
  • We are no longer orbiting around an arbitrary number, we are centering around positive or negative.  A high PDO will be positive and a low PDO will be negative.  Understanding is relatable here, we have the universal law of goal difference: positive is good, negative is bad.  It makes sense.
  • We are combining the For and Against aspect of the same metric.  Just as we subtract Goals Against from Goals For to create Goal Difference, we do the same to create PDO or "Shooting% difference".
And it is entirely related why? The eagle-eyed will have noticed, we've essentially derived the same number as PDO, we've just decluttered it a bit.  The PDO of 107 or 1070 is now defined as 0.07.  A PDO of 982 is now -0.18.  Average is zero.

"We are neither positive or negative! We perform exactly as you might suspect! We are a zero team!"

I'd recommend leaving it as a two digit decimal.  In football, PDO variance rarely requires the precision for further digits that hockey desired.  A decimal is also, and again we're basing things on simple logic, the number derived from the calculation.  No further work is required.

There is plenty of work to be done to cross-over many of the metrics used in analytics and there is great complexity in a lot of the modelling that goes on.  Such work is invaluable and important but often lives in a realm beyond average understanding.  In order to attempt bridge the gaps between scepticism and reticence and understanding and adoption, existing metrics need to offer as much clarity as possible and hopefully i've shown that there are methods that can increase this without the need for structural change. 

Thanks for reading!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Nowadays, you will find my weekly Premier League Round-up over on STATSBOMB and assorted player and team articles as and when I get round to them.  This was initially supposed to be a quick 500 word idea, but i've doubled that and being a bit speculative, I thought i'd put it back here on the rickety old blog.  Plus I know (two or three) people miss reading articles on a pink background (mobile only!) so it's one for the veteran visitors.

I'd like to thank my family, the cat and Acer computers for providing support and er...  a computer in exchange for money.  Until next time.
 

4 comments:

  1. Good idea.

    Form my part, writing articles for an Italian online magazine on Serie A I got rid of PDO and instead started using what I called "Conversion Ratio" which is basically:

    goals for divided by shots on target for / (goals for divided by shots on target for + goals against divided by shots on target against).

    Its R^2 with PDO is 0.99 and it has the advantage that it is expressed the same way as TSR, or SoTR. I find it just a bit more understandable to the average reader than PDO.

    ReplyDelete
  2. That's cool.
    Understanding is a key aspect of what/why I'm proposing.
    I derived the same myself a little while ago whilst trying to build a repeatable metric.
    Personally i'm more inclined towards using all shots, I have a hunch it's more repeatable but haven't done the hard yards (on the list!), but since PDO came from hockey, I think the use of 'shots on target' came over with it.

    ReplyDelete
  3. Yes. I used shots on target 'cause I wanted to stay as close as possible to PDO while increasing understanding. If I will find some time I'll try also with all shots and see if that increase repeatability for Serie A (which, I found out, a is tricky league when it comes to shot-based metrics as compared to, says, EPL). Keep up the good work!

    ReplyDelete