Player Compatibility Scoring with Graph Neural Networks

28 January, 2026

Football has this beautiful chaos to it. Twenty-two players, one ball, ninety minutes of decisions that cascade into each other. I've always been fascinated by what makes certain player partnerships click - why Messi and Neymar seemed to read each other's minds, while other expensive signings flounder despite individual brilliance. This question led me down a rabbit hole that combined my love for the game with everything I'd been learning about graph neural networks.

What started as a weekend project turned into something I'm genuinely proud of: a model that learns player compatibility directly from match data, no hand-tuned weights, no arbitrary formulas. Just patterns learned from how players actually move the ball together.

The Core Idea: Expected Threat

Before we can talk about compatibility, we need a way to measure whether a pass actually mattered. Not all completed passes are created equal - a sideways pass between centre-backs is fundamentally different from a through ball that splits the defense. This is where Expected Threat (xT) comes in, and honestly, it's one of the most elegant ideas in football analytics.

The pitch gets divided into a 12×8 grid of 96 zones. Each zone has a "threat value" - essentially, how likely is possession in this zone to eventually result in a goal? Zones near the opponent's goal have high threat; zones in your own half have low threat.

Computing these values uses Bellman iteration (if you've done any reinforcement learning, this will feel familiar):

\[ \text{xT}(z) = P_{\text{shot}}(z) \cdot P_{\text{goal}}(z) + \sum_{z' \in Z} T(z \to z') \cdot \text{xT}(z') \]

Here, \( P_{\text{shot}}(z) \) is the probability of shooting from zone \( z \), \( P_{\text{goal}}(z) \) is the probability of scoring if you do shoot, and \( T(z \to z') \) captures the likelihood of moving to zone \( z' \) via a pass. The equation says: the threat of a zone equals the immediate goal probability plus the expected threat of where you might pass next. Run this iteratively until convergence, and you get a threat map.

The change in threat for any pass is simply:

\[ \Delta \text{xT} = \text{xT}(\text{zone}_{\text{end}}) - \text{xT}(\text{zone}_{\text{start}}) \]

A forward pass into the box? Positive \( \Delta \)xT. A backpass to the keeper? Negative. This gives us a continuous measure of how much each action contributed to goal threat.

xT Zone Grid (12 columns × 8 rows = 96 zones):

y=1.0  |----|----|----|----|----|----|----|----|----|----|----|----|
       | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 |
y=0.875|----|----|----|----|----|----|----|----|----|----|----|----|
       | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 |
y=0.75 |----|----|----|----|----|----|----|----|----|----|----|----|
       | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 |
y=0.0  |----|----|----|----|----|----|----|----|----|----|----|----|
       | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 |
       |----|----|----|----|----|----|----|----|----|----|----|----|
       x=0  ...                                              x=1.0

High xT zones: Central attacking areas (zones 79-80, 91-92)
Medium xT zones: Midfield and flanks (zones 73-74, 81-82)
Low xT zones: Defensive areas (zones 0-3, 10-11)

Representing Players: The 23 Dimensions

I spent way too long thinking about what features actually capture a player's "style." Eventually, I settled on 23 dimensions that span four categories:

Dims Category What It Captures
1-9 Core Activity Passes, accuracy, dribbles, tackles, interceptions, clearances, key passes, xT created, xT received
10-16 Position Encoding One-hot: GK, DEF, MID, FWD, WB, etc.
17-21 Pass Signature Avg distance, avg angle, short/long/aerial pass percentages
22-23 Pressure Profile How often under pressure, accuracy when pressed

The intuition is straightforward: dimensions 1-9 capture what a player does, 10-16 encode where they play, 17-21 describe their passing style, and 22-23 measure composure. Together, these 23 values give us a pretty complete behavioral fingerprint.

Each player vector lives in \( \mathbb{R}^{23} \):

\[ \mathbf{x}_p = \begin{bmatrix} \mathbf{c} & \mathbf{p} & \mathbf{s} & \mathbf{pr} \end{bmatrix} \in \mathbb{R}^{23} \]

where \( \mathbf{c} \in \mathbb{R}^9 \) (core metrics), \( \mathbf{p} \in \mathbb{R}^7 \) (position), \( \mathbf{s} \in \mathbb{R}^5 \) (pass signature), \( \mathbf{pr} \in \mathbb{R}^2 \) (pressure). All normalized to [0,1] so no single feature dominates.

Passes as Graph Edges

Here's where it gets interesting. Every pass between two players isn't just a connection - it carries information. I encode each pass as a 10-dimensional edge:

\[ \mathbf{e} = \begin{bmatrix} x_{\text{start}} \\ y_{\text{start}} \\ x_{\text{end}} \\ y_{\text{end}} \\ \Delta xT \\ \text{pass length} \\ \text{pass angle} \\ \text{pass height} \\ \text{body part} \\ \text{outcome} \end{bmatrix} \in \mathbb{R}^{10} \]

Dimensions 1-4 tell us where on the pitch the pass happened. Dimension 5 is the threat change. Dimensions 6-7 describe the geometry. Dimensions 8-10 capture execution details - was it a header? Which foot? Did it complete?

Now, each possession becomes a graph. Nodes are players (with their 23-D features), edges are passes (with their 10-D attributes), and the whole structure captures who's connecting with whom and how.

Graph Neural Networks: Learning from Structure

The magic of GNNs is that they can learn representations that encode both individual player attributes AND how those players fit into the team structure. Each layer does "message passing" - every node aggregates information from its neighbors.

\[ h_i^{(\ell+1)} = \sigma \left( \mathbf{W}^{(\ell)} \left( h_i^{(\ell)} + \sum_{j \in \mathcal{N}(i)} \frac{1}{\sqrt{d_i d_j}} h_j^{(\ell)} \right) \right) \]

In layer 1, each player learns about their direct passing partners. Layer 2 extends to partners' partners. By layer 3, the representation captures global team structure. The result: a 128-dimensional embedding for each player that encodes not just what they do, but how they fit into the collective.

GNN Architecture:

Input: Graph (22 nodes × 23-D, |E| edges × 10-D attributes)
  |
  v
Edge Encoder: 10-D → 32 → 16-D (preprocess edge features)
  |
  v
GCN Layer 1: 23-D node features → 128-D (message passing + ReLU)
  |
  v
GCN Layer 2: 128-D → 128-D (deeper node interactions)
  |
  v
GCN Layer 3: 128-D → 128-D (final node representations)
  |
  v
Global Pool: Average all 22 node embeddings → 128-D graph summary
  |
  v
Readout MLP:
  128 → [Dropout(0.2)] → 64 → [ReLU] → [Dropout(0.1)] → 32 → [ReLU] → 1
  |
  v
Output: Predicted ΔxT (scalar)

I train this network to predict the \( \Delta \)xT of each possession. The supervision signal forces the embeddings to capture offensive chemistry, positioning, and threat awareness. After training on ~100K possession graphs, I extract the layer-2 hidden states as player embeddings.

The Compatibility Scorer

Now comes the fun part. Given two player embeddings, how do we score their compatibility?

My first attempt used hand-crafted signals: direct pass success rate (35% weight), co-presence frequency (30%), position complementarity (15%), and so on. It worked okay, but required constant tuning and didn't generalize well across different leagues.

The better approach: learn it end-to-end. I concatenate two player embeddings into a 256-D vector and feed it through a multi-task neural network:

Compatibility Scorer:

Input: Concatenated embeddings [h_A || h_B] ∈ ℝ^256
  |
  v
Shared Trunk MLP:
  256 → [ReLU, Dropout(0.2)] → 128 → [ReLU, Dropout(0.1)] → 64
  |
  ├──→ Main Head (Compatibility):     64 → [ReLU] → 32 → [Sigmoid] → 1
  ├──→ Aux Head 1 (Pass Quality):     64 → [ReLU] → 32 → [Sigmoid] → 1
  ├──→ Aux Head 2 (Threat Flow):      64 → [ReLU] → 32 → [Sigmoid] → 1
  └──→ Aux Head 3 (Position Synergy): 64 → [ReLU] → 32 → [Sigmoid] → 1

Output: 4 scores ∈ [0, 1]

The training labels come from co-occurrence: players who frequently touch the ball in the same possessions are more likely compatible. I use percentile bucketing to spread labels across [0,1]:

\[ \ell(c_{ij}) = \begin{cases} 0.7 + 0.3 \cdot \frac{c_{ij} - P_{75}}{P_{90} - P_{75}} & \text{if } c_{ij} \geq P_{75} \\ 0.4 + 0.3 \cdot \frac{c_{ij}}{P_{75}} & \text{if } 0 < c_{ij} < P_{75} \\ U(0.0, 0.3) & \text{if } c_{ij} = 0 \end{cases} \]

The multi-task setup is crucial. The three auxiliary heads - pass quality, threat flow, position synergy - regularize learning and provide interpretable diagnostics. When someone asks "why is this pair compatible?", I can point to specific factors.

The total loss combines everything:

\[ \mathcal{L}_{\text{total}} = \mathcal{L}_{\text{main}} + \lambda \left( \mathcal{L}_{\text{pass}} + \mathcal{L}_{\text{threat}} + \mathcal{L}_{\text{synergy}} \right) \]

with \( \lambda = 0.1 \) to downweight auxiliaries.

The Results

Let me cut to the chase. Messi → Neymar (Barcelona 2014-2017):

Output Score
Compatibility 0.87
Pass Quality 0.82
Threat Flow 0.78
Position Synergy 0.69

That 0.87 puts them in the 95th percentile globally. For comparison, a random defender-striker pair scores around 0.12. The 7× difference is what convinced me this actually works.

More broadly: known same-squad partnerships score median 0.64; random cross-squad pairs score 0.18. That 3.6× separation demonstrates the model captures real signals, not noise.

Why It Works (And Why I Think It's Cool)

A few things came together that I didn't fully appreciate until the end:

1. Learned, not tuned. The neural network discovers what makes players compatible. No manual parameter tweaking, no "I think pass success should be 35%." The data decides.

2. GCNs capture context. A player's embedding isn't just about their individual stats - it encodes how they fit into team structure. A creative midfielder looks different when surrounded by pacey wingers versus target men.

3. xT as supervision. Training on threat prediction forces embeddings to capture offensive chemistry. The network can't cheat; it has to learn what actually moves the ball toward goal.

4. Generalizes to sparse pairs. Two players who've rarely passed directly still get meaningful scores because their embeddings encode playing style. Hardcoded heuristics fail without data; learned embeddings don't.

The Pipeline

For completeness, here's the full data flow:

Stage Input Output Dimensions
1-2 StatsBomb CSV Events + Possession IDs
3 Events xT map (ΔxT per zone) 96 zones
4 Events Player feature vectors 23-D
5 Features + events Pass graphs 22 nodes, 10-D edges
6 Graphs Trained GCN ~400K params
7 GCN + graphs Player embeddings 128-D
8 Embeddings Labeled pairs ~28K pairs
9 Pairs Trained scorer ~200K params
10 Embeddings + scorer Compatibility scores 1 main + 3 aux

Complete execution takes about 60 seconds. The final model is 2.3 MB. Single-pair inference is under 1 millisecond - fast enough to score all 55 outfield player pairs in a match in under 100ms.

What's Next

I've been thinking about extending this to temporal dynamics - how does compatibility evolve over a season as players gel or drift apart? There's also the question of counterfactuals: given a squad, which signing would maximize overall compatibility?

But honestly, the most satisfying part was seeing Messi-Neymar light up the model. Numbers confirming what the eye test already knew. Sometimes that's exactly what good analytics should do.

The StatsBomb open data made this project possible. If you want to dig into football analytics, that's the place to start.