Wednesday, 1 March 2017

A few thoughts on moving beyond xG and stats in the media

I contributed to an article on Ultimo Uomo recently about how can analytics move beyond xG and stats use in the media.

That article was in Italian, and had plenty more contributors, so check it out, but here are my thoughts pre-translation, for anyone interested.

How can analytics evolve beyond xG?

Expected goals provided a strange line in the sand for football analytics primarily because the data required to build a reputable expected goals model fell outside the published statistics from public sites and the technical aptitude to build such a model was specific and not trivial.
So for a long time, people built xG models, or looked on from outside wondering about xG models. This had the effect of slightly focusing the evolution of football analytics around xG related topics—people built ever more complex models, and eventually graduated towards models that valued the movement of the ball wherever it was on the pitch, but the basic concept was still an expected goal value.

More recently studies of passing have become more prevalent a with a desire to identify players and teams that are most efficient or successful at moving the ball. Still though it is difficult to separate descriptions of style from actual beneficial results that tally with winning football matches--a long term core issue with any metric development.

There is a lot of hope for tracking data, that it might add in extra factors that aid precision but nothing is public there yet and it's possible that benefits from that will only be marginal, a charge that could also be laid at the addition of running and sprinting data.
Defensive analysis remains hard to work with at a player level and it must be hoped that advances in quality of data can shed light here.

Stats in the media
The level of stats use in the media has seen a sharp rise in recent years, and with fantasy football, Football Manager and data sites such as Squawka and Whoscored, the acceptance of numerical descriptions of players is firmly entrenched in younger fans' minds. More often we see talk of shot or shot creation numbers for teams and players which add a necessary second layer for analysis beyond goals and assists (xG may still be too esoteric for total mainstream usage) and this is positive.
Less positive is the use of other statistics that do not represent what they are being used for. Defensive stats like tackles and interceptions are often presented in a more = better fashion, when they are little more than descriptive, they do not necessarily reflect quality of play. Goalkeepers cannot be graded accurately by volume of saves and simple lists based on one or two stats do not do a good job of grading players outside of attacking metrics.

So we have more presentation of stats, more description of stats, but a long way to go to before actual nuanced and thoughtful analysis of stats is anything like normal. And there is certainly a knowledge gap here. It takes time and understanding to read genuine meaning into football statistics yet there is a requirement for media companies to incorporate information to their presentations, and maybe only in certain cases is this backed up by genuine understanding. This same problem presents itself inside clubs, where performance analysts are bringing statistics into their work without sufficient grounding in what matters and what does not. Until both the media and football understand that they require knowledgeable people to direct their usage of stats, offerings will fall short of their potential and we are in danger of finding stat use marginalised (in clubs) or used as trivia but nothing more (in media).

We see more visuals in the media now, but again I would caution against their usage without understanding. An average position map, average pass location or heat map, is rarely capable of giving the full truth yet remains popular and often misused. However shot location maps, with or without expected goal values, or specific pass or chance creation maps can reveal significant truths about a game, a team or a player. To say “Look this player always shoots from 30 yards and never scores” and visualise that is a simple method of showing and proving a point. The key points should always be that any number used or visualisation shown adds to the presentation, reveals truth otherwise concealed and is quick and accessible to understand. There is certainly work to do on all sides here and only teamwork between visual experts/storytellers and those who understand the data can make this work best.

Friday, 30 September 2016

Yannick Bolasie--Early Report Card

Oh hey.

Back where it all began, on the old blog!

Quick post here just to tell a brief tale from the stats and wonder why other people aren't doing the same type of thing? Here in 2016, you might think you need advanced data and snazzy xG and sure, it's great if you've got all that and want to do advanced analysis, but let's not forget, you will be just fine armed with a theory, a website full of stats and an interest. And a cheapo blog page.

Get inspired, give it a whirl. Tons of stories to tell around a variety of leagues.

So what's today's story?

Early days, and Yannick Bolasie has now featured in six Everton games and played 460 odd minutes, a small sample for sure, but how is he getting on? Eyecatching assists against Sunderland and Middlesbrough gave the impression the answer to that question is "well", but is it?

Cursory look at the early numbers and we have a small uptick in a couple of areas, mainly overall shot contribution. Could it be that he's playing for a better team now that creates more? Would that be enough to move the needle here? Probably. He isn't getting shots in the box, a mark that has clearly reduced, which could be tactical or just bad decision making, and his dribble rates look broadly par with his time at Palace.

I guess it's kinda fine. But is it £30m fine?

At risk of stating the bleeding obvious, the likelihood in purchasing Bolasie is that Everton got exactly the same player Palace had, and so far, that looks the case.

Lotta money for moderate contributions...

Saturday, 2 April 2016

Aston Villa 0-4 Chelsea

Aston Villa 0-4 Chelsea

The sun was out over lunchtime in Birmingham today. This being so, a chance to get a head start on the year's tanning was about the best excuse around to spend a couple of hours holed up in Villa Park watching the dying embers of Aston Villa's long Premier League tenure. Heavily discounted tickets played a part too alongside a curiosity to find out how the faithful fans were reacting to their dire predicament.

To that end they did not disappoint, but it took the cumulative effect of opposition goals before we reached peak tension and by the end it had been replaced by stoic resignation. At least it had for Bill who sat next to me, in a seat he calls his own and has done for over 40 years. He has seen relegation before and seemed unfazed by the future, preferring to focus on the problems of the day; "we just can't score", he told me after a couple of early chances were easily dealt with by Thibaut Courtois. Later, when Chelsea opened the scoring he knew the game was up: "we've had it now. All season as soon as we go behind they give up, their confidence has gone."

Having taken over first team duties in the week, Eric Black took his role seriously and spent the entire game on the edge of his technical area; he looked the part, sadly for him his team didn't. For twenty minutes the game meandered as a slightly reserve Chelsea side played neat touches and found little penetration. For Villa a clear tactic showed itself early: find Rudy Gestede's head. So, find it they did but no instruction seemed to have been made to follow up the play and his isolation was an odd but recurrent theme. He did his job, and may well be a useful asset down in the Championship, but here he had little purpose. A different isolation could be seen around Jordan Ayew, who got the crowd interested by trying things, most of which did not include his team mates and few of which had the desired outcome. 

It would be cruel to compare the four incongruously cheerful pre-match mascots to Aston Villa's defence but fielding Aly Cissokho, Alan Hutton, Joleon Lescott and Micah Richards in a 2016 Premier League fixture seemed equally as inappropriate. Lescott, an early crowd villain thanks to his social media woes, seemed keen to take responsibility for organising his team alongside Richards, but neither were helped by the simplicity in which Chelsea navigated the ball around their defensive associates. Cissokho's side was exploited easily for the first Chelsea goal and he had a poor game, but Hutton secured crowd favour having been noted as repeatedly "trying", a method that concealed his otherwise ineffective defensive play.

Beyond that, in and around central midfield was where the game was won, Ruben Loftus-Cheek towered above the diminutive Villa men and looked capable, and with Cesc Fabregas attempting to dictate and Jon-Obi Mikel doing a routine clean up job in behind, it was far too easy for Chelsea to dominate and succeed, without ever having to up their tempo. Idrissa Gueye was involved but erratic and Carlos Sanchez and Ashley Westwood didn't have the game to overcome their stature.

Interest was piqued with the arrival of the mythical Alexandre Pato midway through the first half and the penalty he won and scored will detract from the fact that he looked incapable of sprinting throughout and tired almost as soon as he arrived on the pitch. His recent absence is thus easily explained and a quick comparison to a latter year Michael Owen seems apt. For more wild comparisons, Kenedy has a touch of Neymar in his gait and style but his sole notable contribution was to remove his mask in a fit of pique and play on without it.

So 1-0 flattered Chelsea a little, 2-0 on the stroke of half time killed the game and 3-0 after some neat interplay involving Oscar, Pato and Pedro as soon as the second half started meant some seats were never reoccupied.

Sub-plots exterior to the game soon took hold after the fourth went in. A large exodus took place and I narrowly avoided a soaking as one fan launched his mostly full water bottle at the wall behind me.  Those that remained had a chance to vent their frustrations at Leandro Bacuna, who was booed relentlessly by his own fans in response for claiming a return to Holland and the Champions League was on his agenda. It can be presumed his performances have not been of that standard as "Champions League, you're having a laugh" rung out; one of the few all-in songs from the home crowd.

Then on 74 minutes the signs went up: "Proud History. What Future?" and those that had stayed had their chance to protest. The poisonous atmosphere dissipated rather quickly and the signs turned into a giant game of paper planes with each one that reached the pitch greeted with a loud cheer, a pastime that took precedence over the football as time wore on. By the final whistle, many littered the pitch edges, discarded, the point having been made, if not heard.

Lastly, Hutton's red card failed to dim the crowd's enthusiasm for him, even in failure he was still lauded for "trying" and applauded off.

The last time I had visited Villa Park was in 1992 and I stood in the Holte End. When Dwight Yorke scored, the place erupted and a crush of buoyant joyous fans leapt all over each other; the ferocity surprised me. I was just a kid back then, and I suspect a younger Bill will have jumped out of his seat in the Trinity Road Stand. This time, the only time people stood up was to leave early and the noise from the crowd was critical of their own players and the owner.  Little among what was seen today bodes well for their future, unless they find someone to run onto Gestede's knockdowns.   

Thursday, 5 November 2015

More on Chelsea's woes

More quick thoughts on the Chelsea situation:

The issue of how anyone could have predicted Chelsea's decline has been a prominent thought recently.  I recall about March time having a eureka moment with regard last year's Chelsea team when I realised that much like Man Utd 12-13 or Liverpool and Arsenal 13-14, on balance, "things were going their way" and by that I mean they were collecting points at a rate above and beyond an expectation derived from versions of their shooting stats.

Following on from James Grayson's work in building a team rating- so beyond simple shot numbers- I tracked similar myself and found that for 2014-15 two teams overachieved at a far higher rate than anyone else: Chelsea and Swansea.  You would find similar using xG.

So Chelsea's true talent level was a good deal less than the 87 points they put together.  Their season existed in two clear halves; in the first 19 games they recorded solid top 2/3 numbers yet picked up more points than any other team in the last 6 seasons (46) and in the second half they picked up 41 points whilst putting up second tier numbers.  In the first half, their defense was exceptional at preventing chances and also goals.  In the second half, their defense was less effective in itself but their save percentage rode high.

I mention this because there is a reasonably wide assumption that despite this season's malaise, Chelsea will improve.  And a bit of debate has centred around whether they well be able to qualify for the top four or failing that, show top four level form for the rest of the season.

I would contend that throughout early to mid 2015, Chelsea's underlying form has pegged them as a 5th to 8th team, therefore, if they are to revert to the form which might be expected of them based on last season's most recent 19 game form, it's reasonable to assume that their true level is just that: a fifth to eighth level team.  Shot ratios around 55-58% will most times find a team exactly there: in the Europa mix.  That kind of level can expect a broad return of 40-50 points from here on in, which would lead to a full return of 51-61 points.

Surely too low?

Well, Omar Chaudhuri's been tracking Sporting Index' points predictions from the start of the year and they now have Chelsea at 63 points, down from a starting point of 83:

So the bookies, who let's remember are professionally and financially deigned to be more right than I am about these things, feel that Chelsea can recover, to a degree but not to an extent that 4th place is viable. A projection of 52 points in 27 games is ~4th place form but achieving 4th place on top of 11 points in 11 games is likely beyond reach.

What this also shows us is how reluctant even the bookmakers were to downgrade Chelsea's 14-15 season.  Projecting them on 83 points- the same as City to start- wasn't unrealistic.  While I felt City were strongest set for this year, in my preview, I was certain that Chelsea would be tough to surpass, indeed I suggested that any team that scored more points than them would win the title.

In the range of media pre-season predictions I saw, there were a few crackpot suggestions that Man City would struggle, but I don't recall anyone picking Chelsea to fail and you would have been certified if you had predicted them to be picking up points at relegation pace into November.

So while there was a case to be made that by now Chelsea could have been predicted to be pulling in points at under two per game and thus sitting 4th to 6th, nobody could have predicted that they would have cracked to this degree.  At best I can project that they could be on 14 points by now, only a win ahead of their real total.  This is how bad they have been.  On the pitch, this isn't a case of "luck" or unaccountable variables intervening.

The other thing that could not be predicted was the complete porousness of their defense.  While seemingly tiring as 2014-15 wore on, the tactical shift in the team was built on a sturdy defense, when I tried to analyse their back four, each of them appeared to be producing numbers akin to centre backs; Mourinho's block was in full effect.  So if you were theorising that Chelsea were unlikely to perform at a high level this year- something very few did- it was entirely logical to presume that any suffering would be on the front end, to follow on from the narrow victories and uninspiring shot numbers of late season. And Mourinho has simply never had charge of a defensively suspect team.

To predict that this team would suddenly be conceding two goals a game and only Sunderland would be conceding more shots on target? Nah, that's just crazy talk...

So then you start to wonder about their transfers. Why is De Bruyne ripping up the league at a different team? Why are the first change strikers Falcao and Remy and not Lukaku? What happened to Cuadrado and Salah?  Beyond Matic, Fabregas and Costa, Chelsea's transfers are far more erratic than we have become used to.

And did this overachievement last season mask true decline and the end of a team? If Terry is in decline and none of his partners are up to his standard, whilst Ivanovic is regularly flayed out on his flank, Fabregas has high miles, Hazard has played relentlessly for a number of years... 

No new blood and then the chaos of potential distrust following on from the Carneiro incident.

It's multi-faceted.  And without access to the day to day operations of the first team, it's impossible to do more than speculate as to what the human factors are. And those factors must be real, for Chelsea to have fallen so far outside predictable parameters, further inputs beyond those that can be modelled effectively must have had an impact.

I tweeted earlier that it reminded me of post Ferguson Man Utd, but in that case Ferguson was long gone and the problems he left were somebody else's issue.  Here at Chelsea, and maybe beyond the cost of his dismissal, this is why Mourinho remains: there is a big task at hand.  A lot needs to be achieved- the squad suddenly looks in dire need of new blood and it will take more than one window to solve this. 


After I posted this @tommillard replied and pointed out the BBC predictions table. No slight on their skills but only one picked Chelsea outside the top 2:

And Tom also pointed out the one guy who was down on Chelsea pre-season, Mike Holden

Have a look at his preview here

Wednesday, 7 October 2015


Once more the topic of PDO in football has risen to be debated among the analytics Twitterati; opinion has been kicking around for a few days and the Analytics FC podcast chatted around the topic too. I got a mention as someone who uses PDO in analysis, which amused me as generally, but not exclusively, I consciously try to avoid using it in writing. 

No matter, i'm well aware of the issues with the metric but slightly mystified as to the strength of anti-opinion it still generates.  Nobody is claiming PDO as a top line metric, as James Grayson has shown regularly, it does not habitually repeat and regresses over time.  What it does do is give a useful one hit overview of a metrics that a team has little to no functional control over.  Precisely because it oscillates around a fixed point gives it a degree of simple understanding.

As I write West Ham have a PDO of 122 which is high and incredibly unlikely to sustain.  Similarly Southampton and Liverpool have PDOs at 78 and 81, which are equally low and equally unlikely to sustain.  What does this tell us? Via one number we have an instant method of cutting through the overlying narrative that a team is good or bad; we can see whether a team is running hot or cold.  For this reason, some of the strengths of the metric as an informer lie around its ability to define reasonably short periods of time. 

At its extremes we find teams that are interesting and it offers us a route in for for further analysis.  Having highlighted West Ham via their extreme PDO, I can then quickly zoom in on the factors that are contributing.  Their save percentage is slightly high, but not too extreme (74%) but the rate in which their on target shots are being converted is huge (49%) and simply will not continue at that level long term.  Then we can look at shot levels and be less than impressed, or their wider all shot conversion and ponder that either their finishing talent is akin to Barcelona or it's a freak run. 

Liverpool are dying at both ends, Southampton's problems are all in the save percentage.  We spot this with a one hit metric, then we look deeper for further issues.  Are these teams spending a lot of time in deficit or ahead? Are they crippled by injuries to key players? What is going on?

Nobody conducts in depth analysis using PDO alone, like any metric, in isolation it has limited use.  Its components are far more useful as part of a chain of metrics that inform on different aspects of a team's qualities.  Chief among the chain will likely be a version of a shot metric be it simple or an iteration of expected goals and more and more we seem to be moving towards ideas around zones. 

A by-product of all this is as complexity increases and standardisation remains tantalisingly out of reach, there is still value in metrics derived from actual events.  With reticence still prominent in many aspects of football-- from clubs to media to fans-- there is a small duty among analytics proponents to maintain a degree of accessibility and understanding.  With many people focusing on a move towards club work, it is worth noting that recommendations for the forthcoming Optapro forum have placed an emphasis on relevance and applicability and we've certainly not reached a point where rejection of simply derived metrics benefits the whole.

And how do you replace a simple concept?

A kind of plus-minus expected goal figure is hindered by the aforementioned issue of non-standardisation and complexity of derivation and even in usage largely highlights the very same teams as a basic PDO calculation. Anyway the factors involved here are not fixed measurements of a team's quality.  Like a recipe, relevant ingredients come in different quantities yet combined can create the right mix.  For too long too much focus has been put on issues with this metric, be it the mysterious name or non-intuitive meshing of non-related aspects, and missed its simple strength; as a simple temperature gauge for a team's performance.

That's all it really does, and all it needs to.

Time to move on.

Friday, 24 April 2015

Analytics for the People

Richard Whittall has recently written an excellent column for 21st Club in which he explores the gap between analytics and its functional implementation with football clubs.  He finishes up with this: 
The current field in football analytics is very good at many things, but not so good sometimes in identifying specific problems for which analysts may provide a partial or whole solution. Work on the latter will help further bridge the gap between analyst and club. Sometimes, it’s important for analysts to step out of R and Tableau and start to breakdown if and how clubs can actually move the needle on some of these predictive metrics. Otherwise, they are like doctors who are only able to offer a diagnosis, but not a cure. 

Wise words and certainly advice that should be heeded if you are one of the many people in the market offering analytic solutions for football clubs.   With such work being necessarily proprietary and the retention of a competitive edge encouraging secrecy from within clubs, it is not always obvious how analytically switched on the industry is.   Leading data providers such as Opta and Prozone occasionally offer a window into the products they offer to clubs, in particular recent Prozone videos from Hector Ruiz and Paul Power showed great skill and clarity alongside the benefits and economies of scale afforded by full data access and a dedicated and skilful workforce.  One presumes that by this point most clubs will have at least a small analytics department and probably a lot more.  Whether such a department is fully integrated with coaching or the first team will likely vary on a club to club basis, but the point remains: analytics does not exist in a bubble, it is in place, there are professional companies that can offer a full package and access to the market from the outside is difficult.  Plenty of people want to work in football and whilst smart in principle, co-opting a few models from other industries and creating a brand is unlikely to improve on what is already available.  But, and I feel this is important, a desire to work in football is not the only reason people are interested in or learning about analytics.

Indeed, none of this has stopped a vibrant online amateur community from sprouting up in recent years.  The advent of data sites such as Whoscored and Squawka has offered easy access to data at a level that far exceeds what was available prior.  Now, anyone with curiosity can collate data from numerous competitions at a player and team level and play with it.  It can be analysed and truths, both whole or partial, can be uncovered.  These truths have a variable application.  For some, with good technical skills, predictive modelling can inform betting, for others fantasy football.  I choose to tell stories about what I've deduced and i've sunk many hundreds of hours into it because I find it interesting and intellectually rewarding.  As with any subject, there is a learning curve that never ends, not everything I do hits the mark and there are few short cuts to knowledge, but as others who've done this before me have noted, you do it because it's fun.

The current situation in analytics has created different viewpoints.  Firstly there is a great drive for predictability, repeatability and application.  These are entirely logical and commendable aims, but the arms race to maximise these effects has lead to a shroud being laid over the details involved.  In particular, and with one notable exception from Michael Caley, the multiple black box Expected Goals models and advanced derivatives regularly cited have obfuscated analysis due to their non-standardisation.  This is not criticism of any specific model, many hours of hard work and theoretical analysis will have gone into each by people with far more advanced skills than myself and those with such access have doubtless found multiple utilities for insight gained.  My concerns lie around accessibility and interpretation and this is where I feel some parts of the analytics community have missed the point.  

Barriers to entry may have reduced over time, but barriers to understanding have not.  There is no such thing as an "Expected Goal", it is entirely theoretical.  A layman interested in football statistics may not yet understand the value of a shot or a shot on target yet he is quickly encountering hypothetical versions of the things he does understand: goals.  That is a tough sell.  Shot counts are real and easily understandable, they aren't "outdated" metrics, they are the building blocks of all that comes after and if the analytics community has any interest in popularising it's method of thinking and transcending a niche corner of football, the stories told by our fundamental metrics are intrinsic.

And there are many stories to be told.  Variance in league seasons of 34 to 46 games is huge.  Half and whole seasons can go by where the measurable statistical reality of a team is skewed vastly in either direction.  Liverpool's huge overachievement of 2013-14 followed by an almost inevitable regression this season, just one obvious case.  It isn't just the board that need to understand the wider implication of such matters, the fans can benefit too.  Interpretations may differ but we can pull apart possible reasons in the numbers and disseminate the knowledge.  Oh for a day where the average pre-game conversation involves an understanding that a team's save percentage has been unsustainably high or a striker gets cheers rather than murmurs or abuse because fans understand he's been unfortunate rather than inept.

It's probably a long way off but as each year passes, we collect more data and we can test more outcomes.  Our knowledge can grow and with it our expertise and ability to inform.  It is important to encourage people new to the movement and support their effort.  We may strive for professionalism but we all start as amateurs.  Guide rather than chastise and realise that the more people that are interested in football statistics and analytics, the greater the likelihood of resulting success for everyone, whatever your desired end-game. 

This may seem somewhat utopian, but elitism will get us nowhere, accessibility and inclusiveness just might.

Wednesday, 4 March 2015

A few EPL shooting graphs

Messing about with the 'detailed' tab on Whoscored can be quite interesting and that you can separate out details into 15 minute segments of games allows us to look at rates for different parts of a game.  What do teams do and when do they do it?  I'll present these generally as informative with a little light analysis where I feel something is relevant but having looked at it, I've not the time to make much more of it, so take what you can.

Shots by time period

  • Injury time an obvious factor
  • Desperation?
Shot type by 15 minute game period

  • Early shots more speculative?  We've all seen a player hit a 30 yard 'sighter' early on.
  • Blocked results not intuitive here? I might have expected them to rise later on.

Percentage of shots in each half

  •  Every team takes more shots in the second half than the first half.

Percentage of shots in first 30 minutes

Who tries to start fast?
Percentage of shots in last 30 minutes
Kitchen sink time at Loftus Road?
Anfield roar?
Fergie, sorry, Van Gaal time?
  • Villa tail off horribly as the game goes on.  Something Sherwood will need to target. Fitness?
On target related to blocked shots

I've got a loose working theory that a blocked shot can be attributed to some degree to poor choice from the player, at least over large samples.  In contrast, 'on target' shots have been shown at team level to be of high value, repeatable and predictive.  Somewhat closer to a blocked shot in terms of value would be an 'off target' shot, I imagine a high degree of variance in 'value' dependent on how far off target the shot is.  Anyway just thoughts, but from this, I posit that a team that finds a low percentage of its shots blocked may be taking 'smart' shots.  It may be refusing a poor value, likely-blocked shot and continuing to keep possession, only shooting when sufficient space becomes available.   So we have a simple measure, the difference in percentage points between a team's rate of 'on target' (smart) shots and 'blocked' (stupid) shots.

  • It's is pleasing that the prescriptive football of Van Gaal, Koeman and Mourinho is represented at the top.
  • It is also pleasing that the 'run around a bit' football of Redknapp is represented at the bottom.

So just a quick post, bit of food for thought?


Thanks for reading!


I've got an article on '"Arsenal, Score Effects and a Season Of Two Halves" up at Statsbomb, give it a whirl!
The weekly column resides over there now too.