Pratyaksh Patel | Technical Blog

04 November, 2025

Every now and then, a paper comes along that doesn't just present a new model, but a new way of thinking. I recently stumbled upon one such paper from the StatsBomb Conference 2024 that genuinely made me sit up and reconsider how we can apply modern AI to sports analytics. It's called RisingBALLER, and its core premise is as elegant as it is powerful: what if we treat football matches like sentences and players like tokens?

This blog is my deep dive into that very idea. I'm going to walk you through the entire methodology, the math that underpins it, and the fascinating results, sharing my perspective on why I think this approach is so transformative. Everything here is sourced directly from the original paper—I'm just adding my own narrative as I explore their work.

The Core Idea: Football Through the Lens of NLP

The intuition behind RisingBALLER is what I found most captivating. The authors essentially asked: why can't we use the same foundation model concepts that revolutionized Natural Language Processing (NLP) for football? In NLP, a transformer model learns the meaning of words (tokens) by looking at their context within a sentence. RisingBALLER ports this idea directly to the pitch.

Each player in a match becomes a token, and the match itself becomes the sentence.

By feeding this sequence into a transformer, the model learns deeply contextualized player embeddings that are specific to that single match. This unlocks a whole host of downstream tasks, from predicting future performance and finding stylistically similar players to even estimating abstract concepts like team cohesion. It's a fundamental shift from static player attributes to dynamic, context-aware representations.

Building the Foundation: Data and Preprocessing

As any data scientist knows, an idea is only as good as the data it's built on. For this project, the authors used the incredible StatsBomb event dataset, focusing on the 2015–2016 season across the top 5 European leagues. The raw data for a single match consists of 3,500-4,000 event rows, which isn't directly usable by a transformer.

So, the first crucial step was a heavy dose of preprocessing. I was impressed by how they converted this event stream into a structured, per-player statistics table for each match. Every player in the match day squad (both the starting XI and the bench) was given a feature vector. For their main downstream task, Next Match Statistics Prediction (NMSP), they didn't just use raw stats. They selected 39 base statistics (like progressive passes, successful dribbles, aerial duels won, interceptions, xG, etc.) and then engineered aggregates (sums, means, and standard deviations over rolling 3 and 5-match windows) to create a rich, 234-variable feature vector for each player. This captures not just what a player did in one match, but their recent form and consistency.

The Model Architecture: Deconstructing a "Player-Token"

Now, let's get into the technical core of RisingBALLER. The fundamental building block is how they represent each player within a match. For any given player, the model constructs four separate embeddings and then sums them element-wise. This creates a rich initial "token" embedding that captures multiple facets of the player's context.

If we denote the embedding dimension by \(D\) and the number of players in the match sequence by \(N\) (the paper uses a fixed sequence length of up to 80, padding where necessary), then for a player \(i\), I can write their initial embedding like this:

\[ \mathbf{x}^{(i)}_{init} = \mathbf{e}^{(i)}_{player} + \mathbf{e}^{(i)}_{pos} + \mathbf{e}^{(i)}_{team} + \mathbf{e}^{(i)}_{tpe}, \]

The Four Pillars of a Player Embedding

Let me break down what each of these components represents, as this is key to the whole model:

\(\mathbf{e}_{player}\): This is the unique player-ID embedding. Think of this as the model learning an intrinsic signature for every single player, capturing their inherent skills and style, independent of context.
\(\mathbf{e}_{pos}\): This is a spatial positional embedding. It tells the model about the player's general position on the pitch (e.g., center-back, attacking midfielder), giving it a tactical anchor.
\(\mathbf{e}_{team}\): This is the team affiliation embedding. It captures the tactical system, coaching style, and overall quality of the team the player belongs to.
\(\mathbf{e}_{tpe}\): This is what they call the temporal positional encoding. This is the clever part where the actual match statistics come in. The vector of stats is projected into the same \(D\)-dimensional space, grounding the player's identity in their concrete performance in that specific match.

The Transformer's Role

Once these initial embeddings are created for all \(N\) players, they're stacked into a matrix \(X_{init} \in \mathbb{R}^{N\times D}\). This matrix is then fed through a standard transformer encoder. It's here that the real magic happens. Through its multi-head self-attention mechanism, the transformer allows every player-token to 'look' at every other token in the sequence. The model learns who to pay attention to, effectively asking questions like "Given this midfielder's performance, how should I update my understanding of the striker he was passing to?" The output is a matrix of context-aware embeddings \(X_{out}\in\mathbb{R}^{N\times D}\), where each player's vector is now enriched with information about everyone else in that match.

Pre-training: Developing a "Football IQ"

To get the transformer to learn these complex relationships, the authors used a self-supervised pre-training task they call Masked Player Prediction (MPP). If you're familiar with NLP models like BERT, this is directly analogous to Masked Language Modeling (MLM).

For each match, they randomly hide (or "mask") 25% of the players in the input sequence. The model's job is to predict the identities of these masked players based on the context provided by the unmasked players. This forces the model to develop a deep "football IQ." For instance, if it sees a sequence of Real Madrid defenders and midfielders from 2016, and one player is masked, it has to learn that the missing player is likely to be someone like Cristiano Ronaldo, based on the surrounding context.

From a technical perspective, for each masked position \(j\), the model takes the final contextualized output vector \(\mathbf{x}^{(j)}_{out}\) and projects it into a probability distribution over the entire vocabulary of players \(V\). This is done using a standard softmax layer:

\[ P(\hat{y}_j = v \mid X_{out}) = \mathrm{softmax}\left( W_{v}^\top \mathbf{x}^{(j)}_{out} + b_v\right) ,\quad v\in\{1,\dots,|V|\}. \]

The model is then trained to minimize the cross-entropy loss, which essentially penalizes it for making wrong predictions. For a single match, the MPP loss function is:

\[ \mathcal{L}_{MPP} = -\sum_{j\in M} \log P(y_j\mid X_{out}). \]

Fine-tuning: From General Knowledge to Specific Prediction

After the pre-training phase, the model has a deep, contextual understanding of players. The next step, which I think demonstrates the real utility of this approach, is fine-tuning it for a specific downstream task: Next Match Statistics Prediction (NMSP). They take the pre-trained transformer, remove the MPP head, and attach a new MLP head. This new head is trained to take the contextualized representations of a team's players and predict 18 team-level statistics for the *next* game.

For this task, the model's goal is to predict the next-match team stats vector \(\mathbf{y}\in\mathbb{R}^{2N_{stats}}\) (one set of stats for each of the two teams). The training objective is a straightforward mean squared error, averaged across all the predicted statistics:

\[ \mathcal{L}_{NMSP} = \frac{1}{2N_{stats}} \sum_{k=1}^{2N_{stats}} (\hat{y}_k - y_k)^2. \]

During fine-tuning, all the model's weights are updated. This allows the model to adapt its general football knowledge specifically to the task of statistical prediction, a process known as transfer learning.

So, Did It Work? The Key Results

This is the moment of truth. After all this clever setup, does the model actually perform well? Based on the paper, the answer is a resounding yes. Here's my summary of their key findings:

MPP Performance: On the pre-training task, both the 64-D and 128-D models achieved a top-3 accuracy of over 95%. To me, this is a strong signal that the model isn't just memorizing; it's genuinely learning meaningful player roles and relationships.
NMSP Performance: When fine-tuned, the RisingBALLER models significantly outperformed a tough baseline (which predicted stats as the average of the previous five matches). The 128-D model reduced the global average MSE by about 37.7%, and the 64-D model by 35.35%. That's a substantial improvement.
Ablation Studies: I always appreciate good ablation studies. The authors found that removing the team-affiliation embeddings caused the MPP top-3 accuracy to plummet from ~95% to ~82%, confirming that team context is a very strong signal. This tells us the model heavily relies on knowing who a player's teammates are.

Beyond Prediction: Unlocking the Player Embeddings

For me, this is the most exciting part of the paper. The predictive accuracy is great, but the true power of this approach lies in the learned embeddings themselves. They are rich, nuanced representations of players that can be used for all sorts of analysis. The authors showcase a few fantastic examples:

Positional Clustering: When they visualized the learned positional embeddings, they found that the vectors naturally clustered into defenders, midfielders, and attackers. This shows the model learns the tactical structure of a football pitch organically.
Similar Player Retrieval: This is a classic use case with a new twist. By finding the nearest neighbors to a player's embedding, you can find others who perform a similar role. When I saw their example of querying for players similar to the 2016 version of N'Golo Kanté, the results were stunning. It didn't just find other defensive midfielders; it found players with a similar "engine" and defensive work-rate, like Idrissa Gueye and Allan, even across different leagues.
A Metric for Team Cohesion: The authors proposed a fascinating heuristic for team cohesion: calculate the average pairwise similarity (e.g., cosine similarity) of all players in a team's starting lineup. A higher score might indicate a more stylistically coherent unit that "sings from the same hymn sheet."
Attention Analysis: While noted as future work, one could analyze the transformer's attention matrices to see which players "pay attention" to which other players, potentially revealing on-pitch synergies, like the connection between a playmaker and a striker.

A Frank Look at Limitations and Future Directions

Of course, no model is perfect, and the authors are transparent about the limitations.

Its key strengths, in my view, are:

It automates the discovery of complex, contextual player attributes, moving away from laborious handcrafted feature engineering.
The pre-training/fine-tuning paradigm makes the model incredibly flexible and adaptable to many different downstream tasks.
The learned embeddings provide a rich, quantitative foundation for previously qualitative analysis, like player similarity and team style.

However, there are important caveats:

The single-season dataset is relatively small, risking the model overfitting to specific team tactics of that season.
The model itself is tiny by modern standards. I'm very curious to see what would happen if this approach were scaled up with more data and larger models.
They note that predicting high-variance stats like goals was still very difficult. This makes sense—goals are rare events often influenced by luck, making them inherently harder to predict than high-volume stats like passes.

My Final Take

RisingBALLER isn't just another model; I see it as a blueprint for the future of sports analytics. It demonstrates that the principles of modern AI—self-supervised pre-training on large datasets followed by task-specific fine-tuning—are just as powerful on the football pitch as they are in natural language. It moves the field from static analysis to a dynamic, context-aware understanding of players and teams. I, for one, am incredibly excited to see where this line of research goes next.

RisingBALLER — A Deep Dive into Treating Players as Tokens