Big Data Event Analytics in Football For Tactical
Big Data Event Analytics in Football For Tactical
Decision Support
zur
vorgelegt von
Pavlina Kröckel
aus Bulgarien
Als Dissertation genehmigt
Acknowledgements
When I was growing up in the small country of Macedonia, I used to watch football with my dad
and two sisters. Sometimes, I found it fascinating, other times I just watched it because my dad was
so much into it. He was something of a local star and everyone knew him as a very good player
showing quite the talent in the local football games. Who knew that twenty something years later I
would be graduating in a doctoral program in Germany with a topic focused on football analytics?!
My doctoral studies were nice and challenging at times. But I can see the end. Therefore, I would
like to thank the people who supported me in this process.
I would like to thank my Professor and thesis advisor Prof. Dr. Bodendorf for his support during
this journey, but most of all, for offering me a place at his Chair. This did not only influence my
career, but as it happens, I met my husband right next door.
To my husband, Johannes, I am thankful for pushing me forward and his never-ending optimism
and tough love.
To my parents I am thankful for doing everything in their power to offer us the education that final-
ly brought all three of us here. To my sisters, I am happy to have you always by my side!
To my parents in law, I am thankful for their support and availability with anything we need. I
could not have asked for better parents in law being away from mine here in Germany.
I would also like to thank my colleague and friend Alex Piazza for his friendship and humor. I en-
joyed our time at the university very much, our mutual lunches, research discussions and the occa-
sional consolation sessions.
To my former and current colleagues at the Chair, thanks for always being friendly and helpful,
whenever I needed it. Looking forward to working with you in the upcoming months as well.
Last but not least, I am thankful to Prof. Kainz and Prof. Werner from the Institute of Football
Management in Ismaning for the cooperation and valuable support, especially concerning the data
used in the thesis. I also would like to thank my student, Isa Tümer, who collected the data used in
Chapter 4. Without him, this whole chapter would hardly be possible.
Finally, working on this topic has been a real rollercoaster. I have probably experienced most of the
emotions throughout, but it was all worth it. Because, as Alex Ferguson once said, following a hard
win: “Football, bloody hell!”
Nuremberg, 03.05.2019
Pavlina Kröckel
Abstract
Abstract
The thesis shows how tactical information in football can be obtained by applying analytics tech-
niques from the fields of network science, machine learning and process mining. It uses profession-
al event tracking data from the European Championship in 2016.
The main motivation behind this research is, on the one hand, the lack of studies that use profes-
sional tracking data in football, and on the other, the lack of studies investigating real-time decision
support in football based on data analytics. Therefore, the thesis aims at demonstrating how event
tracking data can support football coaches and their staff to make decisions pre- and post-match, as
well as during live games.
As a theoretical basis for the analytics concept followed in this thesis, the dynamic system theory is
used. According to this theory football teams are dynamic systems, composed of elements (the
players) who interact constantly with each other and their environment and who, by their dynamic
interactions, form behavioral patterns over time. These patterns are due to the self-organization
ability of the players, and thanks to this, they are able to reorganize themselves and regain a state of
balance following a perturbation occurrence (e.g., a counter attack). By following the principles of
this theory, analytics techniques such as social network analysis, self-organizing maps, and process
mining are applied on football event data.
Social network analysis answers questions related to the relevance of a player, the structure of a
team, as well as sudden changes occurring in the team related to different metrics of interest. Self-
organizing maps help to transform highly dimensional data about what happened in the game into
more understandable two dimensional maps. Process mining analyzes sequence data of a football
team and can be used to gain a quick idea about the team’s behavior, as well as identifying key
players.
The final chapter demonstrates how some of the results discussed in the thesis can be used for real
time decision support. A mockup displays examples of a dashboard and the type of results that a
coach can use in order to decide which players should be substituted.
The main contributions of the thesis are related to the use of a real world dataset, the methods used
for the analysis as well as the discussion of how the results can be used for real-time decision sup-
port in football, which has previously not been sufficiently investigated in literature.
Table of Contents
Table of Contents
Acknowledgements ................................................................................................................... IV
Abstract ....................................................................................................................................... V
1 Introduction ..................................................................................................................... 1
1.1 Motivation and research problem ...................................................................................... 1
1.2 Research questions ............................................................................................................ 3
1.3 Research design ................................................................................................................. 4
Part III - Advanced Analytics Methods for Team and Player Insights ................................ 52
Appendix F: Network metrics used for dynamic topology comparison ............. CLXXXVIII
References........................................................................................................................... CXCII
List of Figures
List of Figures
Figure 1. Mixed methods research design ..................................................................................... 5
Figure 2. Proportion of journal articles on sports analytics ........................................................... 8
Figure 3. Interest in sports analytics on Google.com..................................................................... 8
Figure 4. The four stages of match analysis development ............................................................. 9
Figure 5. Definitions of tactics .................................................................................................... 16
Figure 6. Issues in PA research in football .................................................................................. 18
Figure 7. The eco-dynamics account of performance .................................................................. 21
Figure 8. Qualitative study steps.................................................................................................. 25
Figure 9. Flow Diagram of the methodology for article search and selection (based on
PRISMA) ........................................................................................................................ 28
Figure 10. Substitution-related publications per year .................................................................. 30
Figure 11. Interview questions based on the literature review findings ...................................... 41
Figure 12. Distribution of codes (No of cases per code) .............................................................. 45
Figure 13. Derived substitution factors........................................................................................ 50
Figure 14. Steps followed in Part III............................................................................................ 53
Figure 15. Example of a passing event as recorded by OPTA .................................................... 54
Figure 16. Network metrics for football performance analysis ................................................... 66
Figure 17. Timeline of crucial events during 2nd half of the ENG vs. Russia match ................... 72
Figure 18. Match statistics for England and Russia ..................................................................... 73
Figure 19. England’s team networks before and after Rooney’s substitution ............................. 75
Figure 20. Core network in England’s team 15 min before Rooney’s substitution ..................... 78
Figure 21. Top ranked players for England’s team throughout the match .................................. 79
Figure 22. Immediate impact of removing Alli, Lallana or Walker on the team network........... 81
Figure 23. Total passes from both games .................................................................................... 83
Figure 24. Ball possession and attacking indicators for Iceland vs. Portugal .............................. 83
Figure 25. Team networks of Portugal and Iceland ..................................................................... 85
Figure 26. Team networks of England and Iceland ..................................................................... 86
Figure 27. Transitivity and Reciprocity of Portugal’s network throughout the match ................ 88
Figure 28. Transitivity and Reciprocity of Iceland’s network throughout the match .................. 88
Figure 29. Network assortativity for England throughout the whole match ................................ 90
Figure 30. Network assortativity for Iceland throughout the whole match ................................. 90
Figure 31. CUSUM for Iceland‘s team ........................................................................................ 93
Figure 32. CUSUM for England’s team ...................................................................................... 93
Figure 33. SOM architecture ..................................................................................................... 103
Figure 34. SOM training progress in Experiment 1 ................................................................... 111
Figure 35. SOM clusters for underdogs‘ winning matches ....................................................... 113
Figure 36. SOM clusters for favorite teams‘ winning matches ................................................. 115
Figure 37. An example of event log for process mining............................................................ 120
Figure 38. Types of process mining........................................................................................... 120
Figure 39. Trace clustering process ........................................................................................... 124
Figure 40. Schematic representation of a T-pattern ................................................................... 127
IX
List of Figures
Figure 41. Representations of T-patterns on football event data ............................................... 128
Figure 42. Process mining steps for analysis of the OPTA event data ...................................... 130
Figure 43. Pre-processed OPTA event log suitable for process mining tasks ........................... 132
Figure 44. Process models of England’s team by using different algorithms ............................ 135
Figure 45. Process model of England (vs. Iceland) mined with the IVM.................................. 136
Figure 46. Process model of Iceland (vs. England) mined with the IVM.................................. 137
Figure 47. Instances ending in a „miss“ event for England ....................................................... 138
Figure 48. Input and Output patterns for the event „dispossessed“ for Iceland’s team ............. 139
Figure 49. Instances ending in a „dispossessed“ event for Iceland ........................................... 139
Figure 50. Sequence details from the Log Visualizer in ProM .................................................. 140
Figure 51. England’s offensive sequences ending in a shooting attempt .................................. 141
Figure 52. Outcome of activities of England‘s defenders in first-half....................................... 142
Figure 53. Outcome of activities of Iceland‘s defenders in first-half ........................................ 143
Figure 54. Event types and outcomes per player from England’s team .................................... 143
Figure 55. Meter chart displaying event „Clearance” and player involvement from England’s
team............................................................................................................................... 144
Figure 56. SOM trace clustering – England .............................................................................. 145
Figure 57. SOM trace clustering – Iceland ................................................................................ 146
Figure 58. Markov chains for England’s team........................................................................... 148
Figure 59. Markov chain for Iceland’s team: cluster 3 with 34 instances ................................. 149
Figure 60. Markov chain for Iceland’s team: cluster 1 with 79 instances ................................. 150
Figure 61. England - Handover of Work ................................................................................... 151
Figure 62. Iceland - Handover of Work ..................................................................................... 152
Figure 63. Working Together comparison between England (left) and Iceland (right)............. 153
Figure 64. Analytics framework for real-time decision support ................................................ 165
Figure 65. Data available for football performance analysis ..................................................... 165
Figure 66. Player analysis dashboard ........................................................................................ 170
Figure 67. Own team analysis dashboard .................................................................................. 171
Figure 68. Opponent team analysis dashboard .......................................................................... 172
X
List of Tables
List of Tables
Table 1. Types of tracking systems currently available on the market ........................................ 10
Table 2. Tracking systems comparison........................................................................................ 11
Table 3. Overview of football performance indicators per category ........................................... 14
Table 4. Definitions of tactics and strategy ................................................................................. 15
Table 5. Types of complex systems and their characteristics ...................................................... 20
Table 6. Search terms used .......................................................................................................... 26
Table 7. Databases searched and number of papers collected ..................................................... 27
Table 8. Overview of papers included in the literature review .................................................... 29
Table 9. Number of publications per outlet ................................................................................. 31
Table 10. Methods, samples and tools used in previous research................................................ 32
Table 11. Substitution factors retrieved from reviewed literature ............................................... 35
Table 12. Proposed decision rule by Myers 2012 ........................................................................ 37
Table 13. Overview of participants and interview durations ....................................................... 43
Table 14. Coding categories and codes overview ........................................................................ 44
Table 15. Comparison of literature and empirical study findings................................................ 49
Table 16. Overview of matches included in the data analysis ..................................................... 55
Table 17. Dynamic network analysis terms ................................................................................. 61
Table 18. Overview of data and tools in SNA research in football ............................................. 63
Table 19. Social network metrics used in football performance analysis .................................... 65
Table 20. Data attributes used for (dynamic) social network analysis ........................................ 71
Table 21. Selected metrics for England’s players 15 minutes before Rooney is replaced .......... 76
Table 22. Immediate impact of removing Rooney on the team network ..................................... 76
Table 23. Immediate impact 15 min before Rooney - Alli .......................................................... 80
Table 24. Immediate impact 15 min before Rooney - Lallana .................................................... 80
Table 25. Immediate impact 15 min before Rooney - Walker..................................................... 80
Table 26. Brief overview of Iceland’s matches ........................................................................... 82
Table 27. Team network metrics for both matches...................................................................... 87
Table 28. Network assortativity for all teams in the analyzed matches ....................................... 90
Table 29. Dynamic network topologies and network level metrics – England 0‘ – 20‘ .............. 94
Table 30. Dynamic network topologies and network level metrics – Iceland 0‘ – 15‘ ............... 95
Table 31. Community detection in the teams of Iceland and England ........................................ 97
Table 32. Triad types and count for England and Iceland ........................................................... 98
Table 33. Popular triads in Iceland and England’s teams ............................................................ 99
Table 34. List of underdog and favorite teams and their market values (prior to Euro 2016) ... 109
Table 35. Extract from the dataset used for training the SOM .................................................. 110
Table 36. Matches considered in Experiment 1 ......................................................................... 112
Table 37. Influence of event type on match outcome for underdogs ......................................... 114
Table 38. Matches considered in Experiment 2 ......................................................................... 114
Table 39. Influence of event type on match outcome for favorite teams ................................... 116
Table 40. Comparison of mining algorithms ............................................................................. 121
Table 41. Types of social network metrics used for analyzing relationships from event logs .. 122
XI
List of Tables
Table 42. Clustering algorithms in ProM .................................................................................. 125
Table 43. Match statistics of England vs. Iceland’s game ......................................................... 130
Table 44. Overview of players involved in the offensive sequences ......................................... 141
Table 45. Preprocessing parameters for Markov chain sequence clustering for England’s team147
Table 46. Clusters and number of sequences for Iceland’s team ............................................... 149
Table 47. Process mining techniques and their insights in football ........................................... 154
Table 48. Methods and techniques used in Part III .................................................................... 158
Table 49. Data and information useful for real-time decision support ...................................... 168
XII
List of Abbreviations
List of Abbreviations
CD Community Discovery
DCD Dynamic Community Discovery
DNA Dynamic Network Analysis
DST Dynamic System Theory
DSS Decision Support System
XIII
Part I – Theoretical and Conceptual
Foundations
Introduction
1 Introduction
The overall motivation, research questions and research design of the thesis described in the first
chapter, have been published in Kröckel (2017). In this chapter, they are modified and extended.
2
Introduction
to understand how coaches make substitution, and which factors they consider. On the other hand,
there are no studies that discuss opportunities to support the coach in real-time decision making by
means of advanced analytics on player tracking data.
Finally, research has shown that coaches can remember only part of what actually happened during
a match. In their study, Franks and Miller (1991) found that international coaches in football recall
45 percent of key events correctly; according to Carling et al. (2005) coaches can remember less
than 50 percent of key match events, while Laird and Waters (2008) found this to be 59 percent of
critical events in only 45 minutes of play.
Considering the recent advancements in performance analysis and tracking technology and the un-
der researched but promising area of real-time decision support in football, a research gap is iden-
tified. There is a lack of studies regarding decision support of coaches during live matches. Making
a decision during a live game is different compared to pre- and post-match decision making. This is
primarily due to the stressful situation in which the coach needs to make an informative decision
quickly and on which the outcome of the game depends on. One of the main ways to influence the
game is deciding on which player should be substituted. Player substitution has been researched in
the literature, but not in relation to data analysis and how analytics can support the coach in decid-
ing which player should leave the game. Thus, it is of relevance to explore further real-time tactical
decision support in football based on data analytics, and more specifically, supporting the coach
and their staff in the substitution decision during live matches. Finally, very few studies use realis-
tic datasets from tracking companies. Researchers typically generate datasets themselves by pro-
cessing video data. This requires a considerable manual effort but it is also prone to mistakes and
the amount of attributes that can be collected in this way is limited. Thus, the resulting datasets are
usually limited in their ability to describe the behavior of the teams and their players.
How can the tactical decisions of coaches be supported by using event-based player tracking
data?
To answer the main research question the following sub-questions need to be addressed as well:
RQ1: Which factors drive the tactical decision making of coaches, especially during live games?
RQ2: Which analytical methods can be applied on event data to give the required insights?
RQ3: How can the research results be applied to the tactical decision of player substitution?
3
Introduction
1.3 Research design
The current study follows a mixed methods research design. As a guideline for the thesis, the in-
depth book on mixed methods research by Creswell and Plano Clark (2011) and the seminal paper
by Venkatesh et al. (2013, p.8) on using mixed methods research in information systems, are used.
Mixed methods as a research design is used when quantitative and qualitative methods are com-
bined to answer the same research question (Venkatesh et al., 2013). This is contrary to a multi-
method design in which the researcher employs two or more methods but may or may not restrict
the research to the same worldview (either qualitative or quantitative) to answer the research ques-
tion (Venkatesh et al., 2013). In a mixed methods study, both qualitative and quantitative data are
collected, either sequentially, concurrently or embedding one within the other (Creswell and Plano
Clark, 2011). For the most part, choosing this type of design depends on the research question, as
well as the skills and time of the researcher (Creswell and Plano Clark, 2011). Not all research
questions can and should be answered by a mixed methods research. Some of the research prob-
lems suitable for this type of design are “those in which one data source may be insufficient, results
need to be explained, exploratory findings need to be generalized, a second method is needed to
enhance a primary method, a theoretical stance needs to be employed, and an overall research
objective can be best addressed with multiple phases, or projects” (Creswell and Plano Clark,
2011, p. 8).
The reason for employing a mixed methods research design in the current thesis is that the main
research question cannot be answered by conducting either qualitative or quantitative studies alone.
First, the literature on tactical decision support during the game (substitution factors) was insuffi-
cient and inconclusive. This made the qualitative study in the form of interviews, mandatory.
Therefore, the qualitative study was conducted before the quantitative studies. The findings from
the interviews and the literature were consolidated in a list of substitution factors relevant for
coaches, which then informed the quantitative studies by means of advanced analytics methods
applied to live player tracking data. This means that the first study informs the subsequent studies,
which is in line with the guidelines mentioned by Creswell and Plano Clark (2011). And second,
the framework for decision support cannot be developed without the quantitative studies which
should provide the necessary information to the decision makers in a suitable format.
The study follows an exploratory sequential design, based on the guidelines by Creswell and Plano
Clark (2011). This type of mixed methods design is employed when, for instance, a guiding
framework or theory is not available, or when the variables for the quantitative study are not
known. For the research objective of the thesis, a definite list of factors for substitution could not
be readily used, and the nature of the substitution decision has not been studies in sufficient depth.
Furthermore, this type of design is best suitable for exploring a phenomenon or when the researcher
needs to develop and test an instrument when such is not available (Creswell and Plano Clark,
4
Introduction
2011). In the last part of the thesis, a decision support framework is developed for supporting the
coaches and their teams in their analytics strategy during live games. No such framework, guideline
or prototype have been identified in the existing literature that could be used in real-time decision
making in football. Creswell and Plano Clark also provide instructions on how to create procedural
diagrams to represent the design properly. Uppercase or lowercase letters are used to demonstrate
which method takes priority. For instance, QUAL -> quan means that the quantitative study is con-
ducted first and has a higher priority. In the current design, the scheme employed is QUAN ->
QUAL since both studies take equal priority in reaching the research objective.
Figure 1 gives an overview of the research design comprising five major steps.
Below, an explanation of each step is given in more detail for a better overview of the thesis struc-
ture.
The current state of research is discussed in chapters 1 to 3. This part gives the conceptual and
theoretical background, which introduces the reader to the topic and main concepts in football ana-
lytics that need to be understood in order to follow the rest of the thesis. These are mainly, perfor-
mance analysis and performance indicators in football, tracking systems currently available on the
market, tactical decision making and dynamic system theory. The method employed at this stage is
literature review. At the end of this chapter, the reader will be familiar with the latest developments
in football performance analysis research and the theory followed in this study.
A systematic literature review and a qualitative study are conducted and presented in chapter 4.
The aim of this part is to derive factors relevant for tactical decision making from previous research
and directly from football experts. These factors are then consolidated from both the literature and
practice. The derived factors will serve to choose appropriate advanced analytical methods, which
5
Introduction
in turn should produce relevant team and player insights, some of which can be used for substitu-
tion. The choice of methods will also follow the requirements of dynamic system theory. For more
information on the theory, refer to chapter 3.
The methodology used for the advanced analytics methods is discussed in chapter 5. The quantita-
tive studies are presented in chapters 6, 7 and 8. This is the main part of the thesis, in which three
methods applied on the acquired dataset (provided by OPTA Sports) are presented, the analytical
models are built, and derived team and player insights are discussed. Furthermore, the applicability
of these insights are discussed for various purposes in football practice.
Having identified factors for real-time decision making (specifically substitution), and applied data
analysis on real-world tracking data, chapter 9 discusses a real-time decision support use case – a
decision support framework for player substitution in football is developed based on the results
from the previous chapters. The main goal is to discuss requirements of a real-time player substitu-
tion system and to demonstrate how coaches can be supported in this tactical decision by using
some of the insights derived from the advanced analytics methods used in the thesis.
Contribution
The contribution of the current research is twofold. From an academic perspective, the thesis ex-
amines issues that have not been investigated sufficiently in previous literature. On the one hand, it
presents the first qualitative study with coaches and co-coaches on how they make substitution de-
cisions and which technologies they use in live games, if any. On the other hand, it is the first study
to investigate the opportunity for real-time analytics and decision support enhancement regarding
the most important in-game decision – player substitution. Substitution factors from both literature
and practice are consolidated and influence the quantitative studies, as well as the framework de-
veloped in chapter 9. The analytics methods in the current thesis have also not been previously
applied to event data and for the proposed research objective. Moreover, process mining has not
been previously applied for performance analysis in any sport.
From practical perspective, the study will give a starting point to discuss the feasibility of using a
decision support system during live games. The developed decision support framework aims to
give an overview of how a substitution system could look like and be implemented in practice. In
light of the fact that wearable technology is starting to be allowed during live games as well, there
is definitely need for more research on real-time decision support in football.
6
Current State of Research
7
Current State of Research
In practice, a quick search on Google Trends reveals the overall public interest in the topic since
2004. This is presented in Figure 3. Three terms are compared: “sports analytics”, “soccer analyt-
ics” and “football analytics”. “Soccer” is the American term but nevertheless it is used here for
general overview, while “football” can also refer to “American football”. The German term did not
render enough search results to enable comparison.
Figure 3 shows that the interest increases in 2012, with most search queries related to sports analyt-
ics in general. The year is not unexpected as this is the time analytics in practice became more
prominent.
The development of football match analysis can be split in four different stages, each characterized
by different focus in the evaluation of the game. This is presented in Figure 4.
8
Current State of Research
Match analysis in football has developed from (1) simple counting i.e. how many times certain
action occurred; (2) qualitative assessment of the game by experts, which also tends to be rather
subjective; (3) slightly more sophisticated quantitative evaluation of number of passes, movement
and running distances; to, finally, (4) dynamic tactical analysis where advanced analytics are used
on large datasets to reveal patterns, interactions and calculate more complex KPIs than frequencies
of actions. Even though the figure might give an impression that the standard performance indica-
tors like passes, tackles etc., have been around since the 1950s, it should be kept in mind that any
observations at that time were done mostly by pen and paper. The real revolution in football match
analysis occurred in the late 2000s with the fast technological development. When Carling et al.
(2005) wrote their book on soccer analysis, guidelines for paper based notational analysis were still
included. At that time, one would talk about video based statistical and tracking systems, electronic
tracking systems especially for supporting the referees in their decisions (Carling et al., 2005).
Since then, however, the hardware and software developed rapidly allowing for more portable sys-
tems with small environmental footprint, as well as fast and accurate real-time data processing.
9
Current State of Research
2.2 Tracking systems
The revolution that currently takes place in football and other sports, would not be available with-
out advanced tracking technologies. FIFA defines these as Electronic Performance and Tracking
Systems (EPTS) which are used to monitor and improve player performance by tracking the player
and ball positions, and can be used in combination with other devices like heart rate monitors, ac-
celerometers, gyroscopes or other devices for measuring physiological parameters (FIFA, 2017b).
FIFA also defines three major types of devices: (1) Optical-based camera systems; (2) Local posi-
tioning systems (LPS), and (3) GPS/GNSS satellite systems (FIFA, 2017b). Taking into account
the available systems on the market today which are also used by professional football clubs, the
categorization was slightly modified to portray a full picture of what is currently available and
used. This is presented in Table 1 together with an example of a company that offers the respective
type of tracking system.
Table 1. Types of tracking systems currently available on the market
Type System Company
Camera based Performance Data Feed OPTA Sports, UK
TRACAB Optical tracking ChyronHego, USA
Wearables GPS based SportVU STATS, USA
OptimEye S5 Catapult, Australia
Radio based ZXY Wearable Tracking ChyronHego, USA
Radio-based non-wearable RedFIR Fraunhofer IIS, Germany
Source: Self-compiled
Each of the systems in the table above has its own advantages and disadvantages. Most clubs,
therefore, rely on at least one of the above systems, often two or more. A comparison of their fea-
tures is included in Table 2.
The camera based systems usually rely on several cameras installed around the football field which
provide information on the events that happen as well as timestamps and positional data. OPTA
Sports is a leading provider of event data in several sports including football. The most detailed
data feed available via OPTA is the Performance Feed which includes detailed information on all
events, the timestamps, and player positions (OPTA Sports, 2017c). The basic feed is the Core
Data Feed which consists of mainly schedules, squads and line-ups, team and player profiles, and
scores, while the mid-tier Classic Feed gives information on the events without position and
timestamps included (OPTA Sports, 2017a, 2017b). OPTA has its own proprietary software with
which trained analysts work to record all the events as they occur during live games. Two analysts
work on a single match, each of them responsible for one of the teams involved. The process of
how data is collected by the company, has been compared by some to video games, as one hand is
on the keyboard choosing the actions, while the other is on the mouse, choosing the location, and
the systems adds the timestamps (IETFaraday, 2008). As soon as an event is collected it is sent to a
10
Current State of Research
central database system, and it is pushed out to media broadcasters, clubs, and sports websites
(IETFaraday, 2008).
Table 2. Tracking systems comparison
TRACAB
OPTA SportV
System optical track- OptimEye S5 ZXY RedFIR
Sports U
ing
Physiological data
Distance − x x x x x
Speed − x x x x x
Acceleration − x x x x x
Deceleration − x x x x x
Body Load − − x − −
Heart Rate − − x x −
Fatigue − − − − −
Tactical data
Ball Position − x − − x x
Player position x x x x x
Player move-
− x x x x
ment
Events x Selected − − Selected
Team for-
x x − − − x
mation
Timestamps x x − − x
Context x − − − −
Other
Live infor-
x x x x x
mation
Used by clubs Player Training Training Training Player perfor-
for: recruitment Pre- and post- Injury preven- mance evalua-
Opposition match analysis tion tion
analysis Opposition Rehabilitation Scouting
Performance analysis Simulate tac-
analysis Scouting tics
Other users Broadcasters Broadcasters − − Broadcasters
Betting Betting com-
companies panies
Brands & Brands &
Sponsors Sponsors
Source: Self-compiled based on publicly available information on the companies’ websites as of November, 2017
TRACAB is another camera based system which captures x, y and z coordinates of the players,
referee and the ball at up to 25 times per second (ChyronHego, 2017). Both Tracab and Opta have
similar customers (clubs, broadcasters, bookmakers, sponsors) and it is also possible for a club to
integrate data from both types of systems as the companies have an established partnership (OPTA
Sports, 2017e).
The second category are wearable based systems which can be either GPS or radio-based. This
depends on how the position data is captured. Examples of this category are the SportVU and the
OptimEye S5 devices which extract the position data via GPS or satellite, and the ZXY device
which uses radio technology for positioning. The mutual aspect is that they come in the form of a
small device that is usually attached to the players with the help of a bra-like vest. Furthermore,
11
Current State of Research
aside from the position coordinates, they provide mostly physiological and fitness data like distance
run, speed, acceleration and deceleration, heart rate, intensity level among others (McGann, 2014;
Premier League, 2017). With these devices one can also infer the level of fatigue and it is hard for
the players to hide how hard they work during training (McGann, 2014; Ogden, 2011). One of the
first elite football clubs to adopt GPS technology was Manchester United back in 2010 (Ogden,
2011). Currently, wearables are mostly used in training sessions, with clubs being the primary cus-
tomers of these companies. As mentioned in chapter 1, however, it is highly likely that wearables
will be soon allowed in live games as well.
Finally, a third type of system relying on radio frequencies is the RedFIR system developed by the
Fraunhofer Institute in Nuremberg, Germany. Although the system works with sensors being at-
tached to the jerseys and shoes of the players, and the ball, it is not a wearable in the full sense of
the term’s meaning. It is a more complex system than those described above as it works with more
than one sensor/transmitter attached to the objects of interest whose signals are picked up by sever-
al antennas installed around the sports field or any other area of observation (Fraunhofer IIS,
2017b). It provides physiological data (sprints and high intensity running, speed, distances) as well
as some tactical and technical data like position, player movement, selected events (possession,
passes, and shots on goal) (Fraunhofer IIS, 2017a). RedFIR does not seem to be widely used by
clubs at the moment, but it is a system that provides both fitness and tactical data unlike the sys-
tems from the previous categories which seem to focus on one or the other type of information they
provide.
In summary, the camera based systems like OPTA and TRACAB provide detailed tactical infor-
mation, while the wearable systems have a stronger focus on fitness data. The above solutions are
primarily used by clubs in their pre- and post-match analysis in assessing the player performance,
informing them of what went wrong and where they can improve, as well as injury prevention and
better training programs. The camera systems have a wider range of customer base spanning from
clubs, to broadcasters, betting companies, and other content providers, while the wearable compa-
nies have mostly clubs as customers. Another difference is that camera systems cannot provide the
position of the players and the ball at any point of time throughout the game as GPS and radio sys-
tems do. Due to the advantages and disadvantages of each system, clubs rely on at least two of
them as mentioned earlier. A limitation for researchers is that the data from these systems can be
prohibitively expensive, especially when data for more than once season or competition are re-
quired.
In the current thesis, data provided by OPTA Sports is used. More on this in chapter 5.
12
Current State of Research
2.3 Performance analysis in football
Performance analysis (PA) is concerned with investigating actual sports performance in competi-
tions or in training, and not in a laboratory setting (O'Donoghue, 2010). Before the technological
advancements of recent years, coaches and their staff have done PA of some sort but mainly via
observation, which is very subjective, or by using manual notational systems (Carling et al., 2005;
O'Donoghue, 2010). The rationale behind PA is to increase the understanding of performance via
objective measures, but an established theoretical basis which can explain results, is still missing
(O'Donoghue, 2010). Recently, the dynamic system theory has been suggested as a novel way to
explain team performance as a complex system beyond simple action frequencies (more on this in
chapter 3).
In general, performance analysis involves notational and motion analysis. Notational analysis re-
fers to the recording of all actions or events in a “what”, “where” and “when” manner (Carling et
al., 2005). It allows for key elements of the performance to be quantified in a valid and consistent
way (Nevill et al., 2008) to ensure an accurate and objective representation of the game (Carling et
al., 2005). There are four major purposes of notational analysis: (1) analysis of movement; (2) tac-
tical evaluation; (3) technical evaluation; (4) database developing, and (5) evaluation and immedi-
ate feedback by coaches (Hughes and Franks, 2004). Notational analysis has contributed signifi-
cantly to the coaching process over the years, something that is in no small part due to the comput-
erized tracking systems. However, in the last decade, it has undergone some criticism mainly due to
focusing on what happens and not how and why it happens i.e. focus has been on frequencies of
actions and considering them in isolation (Travassos et al., 2013). It has been suggested that more
advanced analytical methods such as artificial neural networks (ANNs) may allow to overcome the
limitations of notational analysis (Lees, 2002). Motion analysis (biomechanics) is concerned with
the raw aspects of player movement and mainly examines the work-rate and fitness levels of play-
ers (Carling et al., 2005). As these factors are excluded from the thesis, the definition is kept short.
An effective PA will depend on focusing on what information is important and why (Carling et al.,
2005). In general, there are a few aspects of team and player behavior that are well established in
PA research over the years. These are technical, biomechanical, tactical and physical aspects (Car-
ling et al., 2005; Hughes and Bartlett, 2002; Nevill et al., 2008). To measure skills in each category,
performance indicators are used. A performance indicator (PI) is not just a variable, but it is a term
for variables that have been established as valid measures of important aspects of performance and
can be interpreted (O'Donoghue, 2010). PIs differ between sports. An overview of the PIs in foot-
ball per category is presented in Table 3.
13
Current State of Research
Table 3. Overview of football performance indicators per category
a) Match classification indicators, as the name suggests, are used to classify the performance of
each team in a match, to determine more objectively the team that showed better performance.
b) Biomechanical indicators are related to studying the fine details of movement. The biomechan-
ics discipline is well established in sports and is related to mechanics and anatomy. (Bartlett,
2001).
c) Technical indicators refer to winning and errors, like for instance, accuracy in passing or loss
of possession (Hughes and Bartlett, 2002). Actions such as passing, shooting and heading are
also part of technical PA (Carling et al., 2005).
d) Tactical indicators reflect the relative importance of teamwork, pace, fitness and movement,
and target the technical strengths and weaknesses of the players (Hughes and Bartlett, 2002).
They are interrelated with and influenced by the technical indicators (see 2.4).
e) Physical indicators are generally related to the overall fitness of the player, and also refer to
variables such as heart rate, total distance or high-intensity running distance. (Carling et al.,
2005).
An overview of the PIs per player position and category is included in Appenix A. The table above
should serve as a brief and general overview of the standard PIs in each category. It should be not-
ed, however, that for answering the research question set by the current thesis, the technical and
tactical aspects are of relevance. The physical PIs especially, distance, work-rate and high-intensity
running, are also important as they are an indication of fatigue, but they are not present in the da-
taset used in this research project. Data for these PIs are typically collected via wearables which at
the moment are allowed only in training. The full overview of factors that can be considered in
answering the research question set in the current thesis is displayed in section 4.5 and it is based
on consolidation of PA literature, systematic literature review and empirical study.
The above mentioned PIs are used for analysis in three stages: (1) pre-match – to counteract the
opponent’s strength and exploit their weaknesses, decide on team’s strategy; (2) in-game – to im-
prove tactical decisions; and (3) post-game – detailed analysis of the past game performance by
14
Current State of Research
qualitative and quantitative methods to reinforce good performance and identify areas for im-
provement (Carling et al., 2005; Travassos et al., 2013).
At first glance, it seems that the definitions are quite similar. However, one can also recognize sub-
tle differences. For instance, Collins defines tactic as a method to achieve a goal in a particular
situation, while it defines strategy in a similar way but adding the long-term aspect. Merriam Web-
ster, on the other hand, defines both terms in a similar way. As the thesis is concerned with the
question of supporting the coach mainly with the real-time decision making, it is of relevance to
examine whether these terms refer to the same aspects or not.
Some sports scientists consider both terms as different. For instance, Gréhaigne et al. (1999) write
that strategy and tactic have a different relationship with time. Strategy are the decisions made be-
fore the game when there is no time restriction, while tactics operate under strong time constraint
(Gréhaigne et al., 1999). Furthermore, strategy concerns aspects such as team composition and as-
signed position – the one that each player is instructed to cover during training, while tactics refer
to effective position – position based on the place of the opponent during the game, and flexibility –
adapting to the conditions of play (Gréhaigne and Godbout, 1995). Carling et al. (2005) also con-
sider both terms to be different. According to the authors, strategy is the overall plan to achieve a
specific goal, while tactics are applied to achieve that strategy. Rein and Memmert (2016) however,
write that they do not understand the delineation between these terms as the real-time interactions
between the players will depend on the strategy that was pre-determined before the game. Accord-
ing to the authors, tactics includes decisions taken both before and during the game. The thesis fol-
lows the argumentation by Rein and Memmert (2016) and considers the terms tactics and strategy
15
Current State of Research
in the case of football as synonyms. Throughout the rest of the thesis, tactics is used over strategy.
When used, strategy will refer exclusively to decisions made before the game. Below a more de-
tailed overview of tactics in football is presented and discussed.
Tactical behavior has not been investigated in the literature as often as technical behavior, primari-
ly perhaps due to lack of appropriate data (Garganta, 2009; Rein and Memmert, 2016). An over-
view of tactics definition from different points of view is presented in Figure 5.
According to the number of players whose behavior is analyzed, tactics can be a) individual – when
the actions of single players are considered; b) group tactics – when sub-groups of players and their
actions are analyzed; c) team tactics – mostly refers to the chosen formation for a specific game
and shows whether the team plans to play more aggressively or to focus more on the defense (e.g.,
the 4-4-2 is more offensive formation while 5-2-2-1 is more defensive formation), and finally, d)
game tactics refer mostly to the overall team philosophy and whether the team prefers to take ad-
vantage of counter attacks, or they try to keep the ball longer in possession, etc.
The tactical behavior of a football team is related to the state of ball possession i.e. the defensive
phase - without ball possession, and offensive phase - with ball possession (Clemente, Couceiro,
Martins, Mendes et al., 2013). The thesis, thus, analyzes mostly the offensive phase, as the analyses
are done on event data which is mostly the ball related actions. The tactical performance of a team
depends on the quality of actions of players and teams in space and time (Memmert et al., 2017).
Space refers to location on the pitch where an action takes place, or the area which the team wants
to occupy during attack or defense; time refers to frequency of events or how quick actions are ini-
tiated, while individual actions are the various types of actions or events that happen on the field
(Rein and Memmert, 2016). How teams and players manage spice and time, however, and thus, the
quality of their tactics is influenced by a wide variety of factors. Some of these are contextual fac-
tors like the location of the match (whether the team plays home or away), type of competition,
16
Current State of Research
current standing in the league, referee decisions, the skills (technical and physiological) of the op-
ponent’s team, or even the weather (Rein and Memmert, 2016).
Essentially, all performance indicators mentioned in section 2.3, in addition to contextual factors
and historical data, influence the tactical performance of teams and players. The extent of this in-
fluence, though, has not been researched enough in literature (Garganta, 2009; Rein and Memmert,
2016). Some authors have suggested a few metrics that could give coaches an objective view of
their team’s tactics. Such metrics are, for instance, the team’s centroid (the mean position of all
outfield players from one team), stretch index (dispersion of the team in relation to the team’s cen-
troid) and the team’s effective area of play (the number of triangles of each team over time and the
effective space to play) (Clemente, Couceiro, Martins, Mendes, 2013; Memmert et al., 2017). Indi-
cators suggested by Garganta (2009) are: related to space – effective play-space, player movement
paths, paths of ball circulation, and players’ action zone; related to time – frequency of events, in-
dividual and team pace, and time of ball possession; game task – types of actions like interceptions,
turnovers, passes, shots on target. However, there are no standard variables established for measur-
ing the tactical performance (Rein and Memmert, 2016). Since a team game is composed of several
related micro-events “the main subject of tactical analysis should not be the player’s actions, taken
disjointedly, but the game play sequences resulting from the actions that occur during the different
phases of the match” (Garganta, 2009, p. 85).
A tactical change during a live game can be done by changing the formation or player substitution
(Hirotsu and Wright, 2002). Formations are different combinations of players in various positions.
Selecting a formation is one way to know whether the team plans on playing defensively or offen-
sively (Hirotsu and Wright, 2006). During the game, this can change by either making a defensive
or offensive substitution or instructing a player to temporarily assume another position (see 4.2).
The primary goal of a coach during a match is to use limited resources to support his team in adapt-
ing to the current situation, to either maintain advantage or to save a losing situation (Del Corral et
al., 2008). There are no timeouts, the only stoppage is half-time, and therefore, the substitutions can
be a determining factor that can make or break the game. This is why substitutions are scarce re-
sources which coaches should use wisely (Myers, 2012). They should be ready to face any circum-
stances during the course of the game, and change the tactics accordingly (Janković and Leonti-
jević, 2006). In this regard, the most effective measure is to replace a player, who has characteris-
tics that fit better to what is happening at that moment in the game (Janković and Leontijević,
2006).
17
Theory and Concept for Analytics
The first issue of results’ applicability is related to the fact that despite suggestions, research in per-
formance analysis has continued to investigate aspects of the game in isolation, using similar method-
ologies repetitively. This is one of the most often mentioned concerns in the literature especially in the
last couple of years. Several authors (Lames and McGarry, 2007; O'Donoghue, 2009; Reed and
Hughes, 2006) point out that considering isolated variables like ball possession and linking them to
the successful/unsuccessful outcomes of the game are meaningless. The reasons are mainly the fact
that football is an unpredictable sport in which “chance dominates the game” as Reep and Benjamin
concluded back in 1968 (Reep and Benjamin, 1968). A widely proposed solution to the above concern
has been the suggestion to include contextual factors like the type of the opposition, match location,
period of the season. Most importantly, a theoretical basis for improving the performance analysis
research has been continuously mentioned in the literature since 2002. This is based on the ecological
dynamics theory explained below.
18
Theory and Concept for Analytics
The second issue that Mackenzie and Cushion identified relates to two main methodological aspects.
One is sample size used for generalizations and the other transparent definitions from which results
have been derived in some studies. Concerning sample sizes, there is little to no consensus in the liter-
ature of what is a representative sample size (James, 2006). Mackenzie and Cushion make the obser-
vation that out of 44 technical articles included in their review, only 10 used a sample size of 100 or
more games, which does not seem sufficient if we consider that a season can consist of 380 games.
Furthermore, Castellano et al. (2014) reviewed the quality of 38 studies that used semi-automated
tracking systems like Prozone to quantify the physical profiles of players in football. The authors ob-
served that 50% of the studies only analyzed one team and the remainder studied more than on team
or did not specify the exact number analyzed. The problem with definitions and classifications as
pointed out by Mackenzie and Cushion (2013) is that 79% of the technical papers in their review did
not fully define the variables used in the analysis. This becomes an issue due to the impossibility to
directly compare and replicate the results of previous studies. If authors continue to use their own
variables without clearly defining them, this in turn also contributes to the third issue – usability of
results.
The usability of performance analysis research results relates to the fact that most studies end with the
same predictable findings, which are often common sense for coaches. Therefore, Mackenzie and
Cushion question whether and to which extent performance analysis research furthers the understand-
ing of performance.
To address these issues, the ecological dynamic theory has been promoted as a theoretical basis for
performance analysis research in team sports such as football.
Nine papers in total on dynamic system theory have been reviewed in detail. These papers have been
often cited and can therefore be considered as a very good representation of the literature suggesting
the shift in performance analysis research towards this theory. These are the studies by: Dutt-
Mazumder et al., 2011; Lames and McGarry, 2007; McGarry et al., 2002; McGarry, 2009; O'Dono-
ghue, 2009, 2009; Reed and Hughes, 2006; Travassos et al., 2010; Travassos et al., 2013; Vilar et al.,
2012. What comes to the attention is that very few authors appear regularly in the literature on ecolog-
ical dynamics, which coincides with the view of Coleman (2012) that the field of sports analytics is
still fragmented and very few prolific authors comprise the entire sports analytics research group at a
given school.
The above publications demonstrates that the idea of using dynamic system theory for performance
analysis has been around for over a decade. However, not much progress has been achieved mean-
while on the subject. One reason for this is that the computer systems for notational analysis only
started to improve since 2005 and have been implemented by clubs in the last couple of years. Anoth-
er reason is the confidentiality of the research results on the part of the clubs – they prefer not to share
their results since any success in analytics gives them a competitive advantage. However, perfor-
19
Theory and Concept for Analytics
mance analysis research in football continues with the efforts of establishing a new theoretical basis
for interpretation that would aid coaches to better understand the “how” and the “why” of the occur-
ring events.
Back in 2002, McGarry et al., realized that unlike previous conclusions, players exhibit different be-
havior against different opponents in a study of squash games. Therefore, they propose an alternative
approach for sports performance analysis, one that views the team as a complex system. A complex
system consists of many degrees of freedom in constant flux, but in which some regularity emerges
(McGarry et al., 2002). It consists of many components which interact among themselves and, as a
whole, interact with the environment (Hristovski et al., 2014). Complex systems can be homogenous
or heterogeneous. The differences and mutual characteristics of the two categories are presented in
Table 5.
Table 5. Types of complex systems and their characteristics
Complex Systems
Homogenous Heterogeneous
Example: piece of ice Living (biological) systems
Social systems
One kind of interaction between its components Different kinds of interactions
(informational and/or mechanical)
A component of the system can be studied in Isolating one part of the system is not possible
isolation – the behavior of the system will stay Adaptive and goal oriented
the same. Evolve, develop and learn to negotiate with their envi-
ronment by changing and fitting their behavior to emerg-
ing constraints.
Football is a heterogeneous complex system in which two teams pursue the same objective simultane-
ously, a feature that is typical of game sports (Lames and McGarry, 2007). Both teams have aims that
are mutually exclusive but are pursued at the same time, and therefore, tight interactions between
them arise. Since these interactions change over time, they are dynamic (Lames and McGarry, 2007).
The behavior that results as a consequence is therefore not an expression of stable properties of indi-
vidual players (e.g., technical skills). Therefore, game sports like football, should be considered as
unique action chains which are context and time dependent and not repeatable (Lames and McGarry,
2007).
Similarly, Travassos et al. (2010) write that team sports are characterized by the formation of sponta-
neous patterns through self-organization processes and that team behavior can be regarded as an
emergent process that results from the interaction between the individual (player), environment and
task constraints (e.g., distance to goal). This is presented in Figure 7.
20
Theory and Concept for Analytics
Self-organization is a key aspect of complex systems which explains how order emerges among dif-
ferent components (Vilar et al., 2012). The spatio-temporal trajectories of each player can be thought
of as a product of self-organization (Duarte et al., 2012). Dynamical system theory can be considered
as a reliable approach for accurate description and measurement of the space-time patterns that
emerge in the game (Travassos et al., 2010). Such approach can describe, explain and predict the co-
ordination patterns that emerge, adapt, persist and dissolve in complex systems (Button et al., 2014).
By using nonlinear equations to model behavior, it becomes clear how seemingly complex systems
follow simple, elegant principles (Button et al., 2014). In general, most scientist know about non-
linearity and usually try to avoid it (Hughes, 2008). However, it is important to study a system not
only in its linear range of operation where a change is smooth, but also to exploit its qualitative
change to identify variables that are essential to the system – the idea being that because these varia-
bles change abruptly, they are key variables when the system operates in the linear range (Hughes,
2008). This is explored in the thesis by using the CUSUM metric calculation (see section 6.5).
Simply put, a dynamic system is a system in which a change occurs. The challenge is to model the
behavior of the football team or players in a way that it will allow the analyst to detect when such
change occurs and to find ways to avoid it or tackle it when it does happen. So far, performance indi-
cators are used to describe the behavior of the players and teams. However, these are static measures
that poorly represent any change to the system as a whole and also its separate elements – the players.
This is why the dynamic system theory aims at finding new metrics, or indicators that would better
describe the state of the players and will better reflect changes that occur in the system (team). Aside
from the change that occurs in the system, the dynamic system theory suggests that dynamic complex
systems are also capable of adapting following a change and that stable relationships are then formed
at different levels of the system (e.g., different sub-groups of its components) (Dutt-Mazumder et al.,
2011).
21
Theory and Concept for Analytics
3.2 A concept for analysis based on the dynamic system theory
The literature does not offer a concrete list of dynamics that are relevant to measure in football. But
the theory is used as an inspiration and starting point for the analysis employed in the thesis. This is
discussed below.
The literature on nonlinear dynamics of team sports mentions two analytics methods that can be used
for analyzing the team and its players as a dynamic, complex system. The first method is a type of
artificial neural network architecture, known as Kohonen maps or self-organizing maps (SOMs). This
method is suggested as the best nonlinear method for pattern analysis. As self-organization refers to
the development of regularities of behavior of the system without the control of an external agent, a
method like SOM seems to be appropriate for studying behavior patterns of sports teams as dynamic
systems (Dutt-Mazumder et al., 2011). The SOM algorithm is capable of preserving the nonlinear
topological relationships in the data sets and, thus, retains the relevant information while discarding
the irrelevant information in high dimensional data sets, which is typical of dynamic systems (Dutt-
Mazumder et al., 2011).
Social network analysis is the second method that is mentioned in the literature in relation to analysis
based on the dynamic system theory. This type of analysis helps to discover patterns of interactions
between the players. It can reveal the collective behavior of team sports by quantifying the frequency
of internodal pairing (Dutt-Mazumder et al., 2011). Team sports are composed of multiple and dy-
namic couplings among dyads of players, which function under similar principles of coordination
dynamics (Dutt-Mazumder et al., 2011). By plotting the interactions between the players as a network
it is possible to calculate network measures at team, player and sub-group level, observe how these
change over time and link them to successful or unsuccessful outcomes.
Based on what is discussed so far, the main goal of analysis based on the dynamic system theory is to
discover new metrics that can properly describe the behavior and state of players and teams, as well as
measures to detect changes in the system, and using valid metrics to show how the system change
over time. The thesis works with event tracking data which is suitable for this type of analysis as it
contains all events that happen during the game, the players involved as well as the position and the
timestamps. As the position data is only related to the event that occurs, i.e. there is no information on
the position of all players and the ball at any point in time, spatio-temporal pattern detection is not
part of the analysis in the thesis. For this type of analysis, detailed GPS data is necessary in order to
measure the influence of the rest of the players and the ball on the event or player of interest. For ex-
ample, when the player with the ball is in the 10 meter section, about to shoot towards the goal, what
is the influence of the position of the other players from his team and the opposition on the way he
behaves? There are various options for analysis, but as mentioned, detailed positional data is neces-
sary for this.
22
Theory and Concept for Analytics
Therefore, the analysis in this thesis are mainly focused on the events that happen during the game,
the players involved and the timestamps. Both the self-organizing maps and the social network analy-
sis methods can be applied on such data. In addition, considering the events and the timestamps to-
gether, it is possible to analyze event sequences, as it was previously said, football can be considered
as consisting of unique action chains. A new method is used for this type of analysis. It is based on
process mining techniques and algorithms. It is a novel approach to analyze sequence data in team
sports and there is no study that has applied the same techniques for analyzing team sports data. How-
ever, considering the structure of the dataset, which is in fact an event log data, it seems that process
mining can be a suitable collection of methods to gain deeper insights into player and team behavior
based on event data. From the discussion in section 2.4., it is clear that individual actions as well as
team actions are an important part of the team and player tactics in football. Considering that in the
last few years, the static performance indicators derived from those actions/events are prone to cri-
tique, new methods that allow the analysis of events and event sequences can be useful in football
performance analysis especially because event data is one of the main data type available during live
matches.
23
Part II – Qualitative Study on Tactical
Decision Making in Football
24
Tactical decision making in football
The first step in any research enquiry is the analysis of the current state of research of the topic of
interest for the researcher and/or related work. Therefore, the current study starts with a detailed lit-
erature review on (real-time) decision making in football to understand better how coaches make de-
cisions. Following this step, a research gap is identified which requires the conducting of qualitative
analysis to extend the literature findings and explore the phenomenon in more detail to generalize the
findings. Semi-structured expert interviews were conducted and processed via content analysis. Find-
ings from both the literature and empirical study were consolidated in the form of a list with relevant
substitution factors.
25
Tactical decision making in football
4.2 Related work
In this section, the process of literature search regarding decision support during live games in foot-
ball is explained and results are presented and discussed. Several types of reviews exist depending on
the purpose of inquiry. The current thesis, follows the guidelines for systematic literature review
(SLR) as provided by Kitchenham (2004), and partly the PRISMA guidelines specifically for the
structure and reporting of the review (Moher et al., 2009). This type of review is used for identifying,
analyzing and interpreting all available research relevant to a topic of interest in an unbiased and re-
producible way (Kitchenham and Charters, 2007). Some of the reasons for conducting a SLR are a)
identify existing gaps in literature, or b) establishing a framework to appropriately position upcoming
research activities (Kitchenham, 2004; Kitchenham and Charters, 2007). This is especially relevant in
the current inquiry because, on the one hand, no existing review of substitution or real-time decision
support in football could be identified, and on the other, the primary reason to conduct it is to under-
stand how far existing literature is on the topic, identify gaps and decide how to proceed concerning
the quantitative analysis. Furthermore, the SLR follows a rigorous methodology for collecting and
analyzing the literature, and therefore ensures lower bias than traditional reviews (Mariano et al.,
2017).
Following the clarification of the literature review purpose, the next step is to specify the search
strategy. This includes primarily the search terms and databases used. Table 6 presents all search
terms used to extract relevant papers on the topic.
Table 6. Search terms used
Search terms
substitution
The terms in the table above were chosen after a quick preliminary literature search to get a first im-
pression of the amount of existing research as well as the keywords used by the retrieved papers. To
ensure that all relevant papers are retrieved, Boolean queries were constructed by adding the different
terms for football in the US, Europe and in German language. In total, nine databases were searched
for the paper collection, and all search terms were used in all nine databases with search in all fields,
with no restriction. Table 7 includes the databases together with the number of potentially interesting
articles collected in the first phase.
26
Tactical decision making in football
Table 7. Databases searched and number of papers collected
Database Results
PubMed 2
Sport Discus 14
Scopus 2
Science Direct 0
Web of Science 0
Google Scholar 79
Sports Medicine 8
Springer 5
Total 110
An initial search was conducted in March 2015, with final search conducted in June 2017. Since it
was clear from the preliminary search that the existing literature was rather scarce, all papers that
seemed potentially relevant (addressing any aspect of substitution or real-time decision support) based
on the title were included for further examination, regardless of the publication date.
The next step in a SLR is to define the selection strategy i.e. the criteria for filtering out the retrieved
papers. The selection is a three-stage process, where the first stage is selection based on title, the sec-
ond – based on abstract, and third stage – after reading the full paper (Heckman and Williams, 2011).
The selection process is presented in Figure 9.
Initially 110 publications in total were identified as potentially relevant for the research goal. Snow-
ball sampling strategy was used to identify additional studies, but no such studies were found. After
all duplicates were removed, 103 publications were screened further by reading the abstract. At this
stage, most of the publications were excluded form a more detailed review. These were publications
a) studying substitutions in children’s, college or women’s football teams, as the current thesis focus-
es specifically on professional football; b) analyzing various aspects of performance in football, such
as movement, goals, fitness aspects, GPS tracking, among others; and c) 10 publications in total were
not available as full-text. One is a PhD thesis which was later incorporated in two papers included in
the final review stage (Hirotsu et al., 2009; Hirotsu and Wright, 2006), and the rest were not available
including via inter-library loan. Therefore, they had to be excluded. In total, 26 publications were in-
cluded for final eligibility assessment which required full-text reading.
27
Tactical decision making in football
Figure 9. Flow Diagram of the methodology for article search and selection (based on PRISMA)
Following this stage, three publications were excluded: (1) a qualitative study on the psychological
effects of being a substitute player by Woods and Thatcher (2009) - this study did not fit the research
objective of the literature review which focuses on factors relevant for substituting players and how
these can be used to support the coach in live games; (2) Corrigendum of the paper by Hirotsu and
Wright (2002) referring to formulas they used; and (3) an opinion piece on substitution as a way of
injury prevention in teams sports by Orchard, 2012) with no mention of football, however. Finally, 23
papers were chosen for detailed analysis and included for further review. These are presented in Table
8.
28
Tactical decision making in football
Table 8. Overview of papers included in the literature review1
Author Title Year Published in
Using a Markov Process Model of an Association
Hirotsu & The Journal of the Operational Re-
Football Match to Determine the Optimal Timing 2002
Wright search Society
of Substitution and Tactical Decisions
The right players, right system: choosing lineups,
Lenahan &
changing systems, making substitutions in Real 2002 Soccer Journal
Solari
Madrid's title run
Hirotsu & Determining the Best Strategy for Changing the The Journal of the Operational Re-
2003
Wright Configuration of a Football Team search Society
Hirotsu & Modeling Tactical Changes of Formation in Asso- Journal of Quantitative Analysis in
2006
Wright ciation Football as a Zero-Sum Game Sports
Janković & Substitution of players in function of efficiency
2006 Fizička kultura
Leontijević increase of tactic play plan in football
International Workshop on Regula-
Ledru A Formalisation of the Soccer Substitution Rules 2006 tions Modelling and their Validation
and Verification
del Corral,
Barros & The Determinants of Soccer Player Substitutions: A
2008 Journal of Sports Economics
Prieto- Survival Analysis of the Spanish Soccer League
Rodríguez
Auswechselverhalten im Fußball − eine empirische Sport und Gesellschaft – Sport and
Geyer 2009
Analyse Society
Hirotsu, Ito,
Miyaji, Modeling Tactical Changes of Formation in Asso- Journal of Quantitative Analysis in
2009
Hamano, & ciation Football as a Non-Zero-Sum Game Sports
Taguchi
Maastricht Research School of Eco-
Never change a winning team: The effect of substi-
Mengel 2009 nomics of Technology and Organiza-
tutions on success in football tournaments
tion (METEOR)
Work-rate of substitutes in elite soccer: A prelimi- Journal of Science and Medicine in
Carling et al. 2010
nary study Sport
Effect of player substitutions on the intensity of Revista Brasileira de Cineantr. &
Coelho et al. 2012
second-half soccer match play Desem. Humano
A Proposed Decision Rule for the Timing of Soccer Journal of Quantitative Analysis in
Myers 2012
Substitutions Sports
Siegle &
Game interruptions in elite soccer 2012 Journal of Sports Sciences
Lames
Match running performance fluctuations in elite
Bradley &
soccer: Indicative of fatigue, pacing or situational 2013 Journal of Sports Sciences
Noakes
influences?
Evaluation of the Match Performances of Substitu- International journal of sports physi-
Bradley et al. 2014
tion Players in Elite Soccer ology and performance
Bartling,
Expectations as Reference Points: Field Evidence Gutenberg School of Management and
Brandes & 2015
from Professional Soccer Economics Discussion Paper Series
Schunk
Purnomo et Journal of Computational and Applied
Soccer game optimization with substitute players 2015
al. Mathematics
Timing and tactical analysis of player substitutions International Journal of Performance
Rey et al. 2015
in the UEFA Champions League Analysis in Sport
The influence of substitutions on elite soccer teams’ International Journal of Performance
Gomez et al. 2016
performance Analysis in Sport
Practitioners' Perceptions of the Soccer Extra-Time
Harper et al. 2016 PLoS ONE
Period: Implications for Future Research
Silva & Journal of Quantitative Analysis in
Analysis of substitution times in soccer 2016
Swartz Sports
Varela-
The effect of an additional substitution in associa-
Quintana et 2016 Revista de Psicologia del Deporte
tion football. Evidence from the Italian Serie A
al.
1
Initial version of the table is published in Kröckel (2017).
29
Tactical decision making in football
From the table above two things can be observed. On the one hand, studies are published by different
authors, except for the papers by Hirotsu and Wright and two papers by Bradley. On the other hand,
there is also a noticeable diversity in the publication outlets. This issue has been previously reported
by Coleman (2012) who noted that most papers in sports science research are written by authors who
do not write more papers, and that at times, scientists are not sure about the right outlet to publish
their papers. Over the years, this has improved with some sports journals establishing themselves as
high quality outlets in sports science research (e.g., SportsMed, especially for review articles, the In-
ternational Journal of Performance Analysis in Sport, the Journal of Quantitative Analysis in Sports,
to name a few). It can also be seen in Table 8 that in the last few years, the above publication outlets
are becoming a preferred choice by researchers. An overview of the distribution of publications on
substitution over the years is presented in Figure 10. It shows an interesting observation with three
papers published in 2006, 2009 and 2012, with a visible increase from 2014 onwards. This also shows
that the topic is becoming more relevant.
Table 9 presents an overview of the number of publications per outlet. The Journal of Quantitative
Analysis in Sport is represented with the highest number of papers (4), while International Journal of
Performance Analysis in Sport, Journal of Sport Sciences, and The Journal of the Operational Re-
search Society have two publications each. 21 papers are in English language, one paper in Serbian
and one in German.
30
Tactical decision making in football
Table 9. Number of publications per outlet
Published in Number of publications
Fizicka kultura 1
Gutenberg School of Management and Economics Discussion Paper Series 1
International Journal of Performance Analysis in Sport 2
International Journal of Sports Physiology and Performance 1
International Workshop on Regulations Modelling and their Validation and Verification 1
Journal of Computational and Applied Mathematics 1
Journal of Quantitative Analysis in Sports 4
Journal of Science and Medicine in Sport 1
Journal of Sports Economics 1
Journal of Sports Sciences 2
Maastricht Research School of Economics of Technology and Organization (METEOR) 1
PLoS ONE 1
Revista Brasiliera de Cineantr. & Desem. Humano 1
Revista de Psicologia del Deporte 1
Soccer Journal 1
Sport und Gesellschaft – Sport and Society 1
The Journal of Operational Research Society 2
Even though previous research on player substitution and tactical decisions, in general, during live
matches, has not been extensive, a review of the selected papers shows a development and growing
interest. One reason is most likely the availability of more tracking data than before, and the realiza-
tion that analytics, when done right, can significantly contribute to a team’s success. A coach has lim-
ited chance to influence a football match as, with the exception of the half-time break, there are no
timeouts as in other sports (e.g., basketball) (Geyer, 2009; Myers, 2012). Some of the options a coach
has are change in formation or replacing a player (Hirotsu and Wright, 2002). Formations are different
combinations of players in various positions which show whether the team plans on playing defen-
sively or offensively (Hirotsu and Wright, 2006). A formation change occurs when the team switches
a defender for a forward player, for instance. Hirotsu and Wright (2002) and Hirotsu and Wright
(2006) mention something called reversible tactics – another strategy in addition to substitution for
game influence. For instance, a midfielder can be instructed to play as an attacker and once the de-
sired outcome is achieved, the midfielder can revert to his normal position (Hirotsu and Wright,
2006). According to the authors, the effects are similar as to making a substitution. They investigate
this briefly but use hypothetical games and no other mention of this strategy has been found in the rest
of the reviewed literature. It can be concluded that previous research has investigated substitution as a
primary way for a coach to influence the game. Therefore, this is discussed in more detail below.
Substitution can make or break a team’s performance (Myers, 2012). Historically, FIFA regulated the
application of substitution in football, allowing for two replacements, back in 1970 for the World Cup
that same year (Varela-Quintana et al., 2016). The rule was slightly modified in 1994 when an extra
substitution was allowed for an injured goalkeeper (Janković and Leontijević, 2006). This restriction
31
Tactical decision making in football
for the third substitution was later removed and the flexible system as we know it today (with three
substitutions allowed) has not been changed (Varela-Quintana et al., 2016). Recently, however, FIFA
has allowed an experimental fourth substitution in specific games, and it is likely that an extra substi-
tution will be allowed in the World Cup in 2018. This would offer a wide range of possibilities for
coaches to influence the match (Varela-Quintana et al., 2016). It is therefore even more interesting to
analyze this decision in more detail and suggest strategies to make it more effective.
Table 10 presents the methods, sample data and tools used in the reviewed publications.
Table 10. Methods, samples and tools used in previous research
Author Method Sample Sample from Tool
Hirotsu & Dynamic programming; Single match data of game in EPL, Not mentioned Not mentioned
Wright, 2002 Markov process model season 1998/99
5 hypothetical games
Lenahan & Narrative/qualitative 5 games Not applicable Not applicable
Solari, 2002 expert analysis
Hirotsu & Dynamic programming Match statistics from the EPL from OPTA Sports GLIM statistical
Wright, 2003 season 1999/2000. software
Few games from the same team
were analyzed.
Hirotsu & Poisson regression; Team data of two teams of the Japa- Not mentioned Not mentioned
Wright, 2006 Game theory; nese J. League from the 2002 season
Quantitative Analysis
Janković & Counting The first 4 teams at World Cups Not mentioned Not mentioned
Leontijević, 1998, 2002 and 2006.
2006
Ledru, 2006 Modelling language One game Not mentioned VDM specifica-
tion language
KIDS and VDM
Tools
del Corral, Inverse Gaussian hazard 380 matches from the Spanish Pri- PC Futbol Not mentioned
Barros & model mera División from season 2004/05 Game
Prieto-
Rodríguez,
2008
Geyer, 2009 Empirical data Analysis German Bundesliga matches from DFB Not mentioned
seasons 1985/86 – 2004/05. Fußballdaten
vmLOGIC
Hirotsu et al., Poisson regression; Team data of two teams of the Japa- Not mentioned Not mentioned
2009 Game theory nese J.League from the 2002 season
Mengel, 2009 Spearman test for corre- WC 1986-2006, U20 WC 1997- FIFA.com NA
lation 2008, Olympic football tournament
Logictis regression 1980-2008.
Carling et al., Quantitative analysis Elite soccer team from the French AMISCO Pro AMISCO Pro
2010 Bonferroni t-tests 1st division in the 07/08 season. 18 SPSS
matches, 11 M and 14 CF 2nd half
subs.
Coelho et al., ANOVA 45 male professional soccer players Not mentioned Not mentioned
2012 Post hoc Tukey's test. from Brazilian 1st division over 29
official games
32
Tactical decision making in football
Table 10. Methods, samples and tools used in previous research (Continued)
33
Tactical decision making in football
(3) Tool: a similar issue is observed when it comes to the analytics tools. Eleven papers do not report
the tools. As to the rest, SPSS is mostly used, which is understandable as most of the methods em-
ployed were statistical. In all of their papers, Hirotsu and Wright most likely use a statistical software
GLIM, which is no longer developed. They reported it in only one of their papers from 2003, but as
the analysis are similar, this is inferred here. Professional tracking software was used in the papers by
Carling et al. (2010), Bradley and Noakes (2013), and Bradley et al. (2014).
In a next step, all papers were analyzed qualitatively in order to retrieve factors discussed as relevant
for replacing a player. An overview of all factors that have been analyzed or mentioned additionally
by the authors is presented in Table 11.
The most mentioned factor is tactical reasons, with 16 papers out of the 23 reviewed, mentioning it.
That is not surprising considering that by replacing a player, the coach directly influences and changes
the tactical plan for his team (Janković and Leontijević, 2006). Tactics are influenced by and related
to everything from substitution type and timing, to current score, location, and team formation, all of
which are explained below.
The game score is the most important factor for the timing of substitution – when the team is tied or
losing, substitutions are made earlier and vice versa (Del Corral et al., 2008; Geyer, 2009; Myers,
2012; Rey et al., 2015). This is also confirmed by Bartling et al. (2015), who in addition observe that,
on average, there is no strategy adjustment before a goal is scored. According to the authors, the be-
havior of players and coaches depends in large part on whether or not the team is losing. When the
team is winning, the coach should preferably encourage the same playing style that lead to this win,
and consider defensive strategy as not optimal in this case (Silva and Swartz, 2016).
Timing refers to attempts by authors to find the right time for making the first, second and third sub-
stitution, depending on factors like the current score, opponent’s strength, location (home/away),
among others. The other research focus is mostly retrospective analysis on determining the
timeframes or minutes when coaches substitute the most. Studies are uniform in that there are very
few substitutions done in the first half, and mostly due to reasons that cannot be influenced, like injury
or red card (Bartling et al., 2015; Del Corral et al., 2008; Myers, 2012; Rey et al., 2015). Large num-
ber of substitutes are made at the half time break (Del Corral et al., 2008; Rey et al., 2015). Most sub-
stitutions are done between 60 to 75 minute (Janković and Leontijević, 2006); 46 to 70 minute (Del
Corral et al., 2008); on average, first substitution at 57 min, second at 71 min, and third at 81 or 82
min (Myers, 2012); 57 to 78 minute (Rey et al., 2015), first substitution 46 to 75 min, second between
61-90, and third between 76-90 (Gomez et al., 2016). Finally, in the last 20 minutes of the match ac-
cording to Del Corral et al. (2008), there are less substitutes, while Rey et al. (2015) set this time to
last 10 minutes of the match.
34
Tactical decision making in football
Table 11. Substitution factors retrieved from reviewed literature
Substitution factor
35
Tactical decision making in football
36
Tactical decision making in football
Myers (2012) went as far as suggesting a concrete rule to guide coaches on the right substitution
timings depending on the current game score. Rule is presented in Table 12.
Table 12. Proposed decision rule by Myers 2012
If down:
Make 1st substitution prior to 58th minute
Make 2nd substitution prior to 73rd minute
Make 3rd substitution prior to 79th minute
If tied or ahead:
Make substitution at will
The suggested rule seems very attractive due to its simplicity, clarity and seemingly easy imple-
mentation in practice (Silva and Swartz, 2016). Following its publication, it gathered a lot of atten-
tion from mainstream media, and was even recognized by Anderson and Sally in their popular book
“The Numbers Game” (Silva and Swartz, 2016). The authors Silva and Schwartz review the pro-
posed rule by Myers and offer an alternative method of analysis. Their results do not identify times
in the second half in which it is beneficial to substitute, but as the authors mention themselves, they
are using different method (Bayesian logistic regression) and different variables. Silva and Swartz
then go on to say that coaches make good decisions and they start the game with the best team pos-
sible, which in turn makes a substitution basically a replacement of a good quality with a slightly
lower or at least equal quality. The authors’ conclusions are rather general and superficial, and a
coach does not always start with the best team for various reasons (e.g., to protect a player who has
a recent injury or is more important for another game, or the opponent is not at the same strength
level). Furthermore, they do mention that considering their contradicting results to Myers, the ques-
tion arises of what the managers are supposed to do in practice. This question, however, remains
unanswered. The author of this thesis does accept their recommendation that substitutions should
be made, especially when a player’s performance drops, but that these replacements should not be
tied to specific timings as done by Myers.
Substitution type refers to the position of the players that leave and enter the game. An offensive
substitution is done when, for instance, an attacker enters, and a midfielder leaves; defensive, when
e.g., a defender enters, and an attacker leaves; neutral, when e.g., a midfielder is exchanged for a
midfielder (Rey et al., 2015). Defensive substitutions are made later in the match than offensive
substitutions; especially when losing, more offensive substitutions are made and vice versa
(Bartling et al., 2015; Del Corral et al., 2008; Geyer, 2009). Overall, offensive substitutions are
more frequent than defensive ones (Del Corral et al., 2008; Rey et al., 2015), and substitutions be-
come more offensive in general, as the second half progresses (Bartling et al., 2015; Bradley et al.,
2014). If team is winning, there will be a tendency towards more defensive strategy (Bartling et al.,
2015). More specifically, considering the player positions separately, most often chosen positions
for leaving the game are midfielders, followed by forward players or strikers, and defenders, while
37
Tactical decision making in football
for entering the game, most preferred are strikers, then midfielders and defenders (Janković and
Leontijević, 2006). In contrast, Del Corral et al. (2008) and Rey et al. (2015) observe that most
substitutes are midfielders, and the most often used combination is midfielder for midfielder. This,
however, also contradicts to another statement by Del Corral et al., that offensive substitutions
were the most frequent ones. Another contradiction can be found in the results by Gomez et al.
(2016) who found that central midfielders were mostly used as substitutes, followed by forwards,
and wide midfielders, while neutral substitutions were the most common. Finally, when analyzing
the effect of an additional substitution on coaches decisions, Varela-Quintana et al. (2016) found
that a change in the rule resulted in more neutral substitutions done by coaches. Overall, it seems
that one thing which is common in these studies is that defenders are replaced and used as substi-
tutes the least, compared to other positions.
Fatigue as a substitution factor was investigated in more detail in four publications (Bradley et al.,
2014; Bradley and Noakes, 2013; Carling et al., 2010; Coelho et al., 2012), and mentioned in seven
others. The four papers mentioned analyzed the effect of substitute players on the high intensity
activities, distance ran, and work-rate performances of substitute players compared to the players
they replaced, or their own performances when playing from the beginning. Bradley and Noakes,
(2013) and Bradley et al. (2014) found that substitute players cover greater distance and had more
high intensity running compared to the equivalent time when playing the full match, as well as
compared to the rest of the players in the game, and the players they replaced. The tendency of
players to decrease their effort intensity during the second half, suggests that fatigue develops dur-
ing the game (Coelho et al., 2012). Thus, substitute players are likely to be able to perform higher
intensity actions than athletes who played the entire game, and as a result, increase the chances of
winning (Coelho et al., 2012). Furthermore, with regard to player positions of in- and out players, it
was determined that midfielders cover greater overall distance including in high intensity activity
compared to midfielders who remained in the game (Bradley et al., 2014; Carling et al., 2010).
However, there is a contradictory finding regarding the attackers’ performance. Carling et al.
(2010) found that attackers cover less distance as a substitute compared to their profile when play-
ing from the beginning, while Bradley et al. (2014) found the opposite – attackers covered more
high-intensity running than peers or their own performances during an entire match. Bradley and
Noakes (2013), however, got a similar result as Carling et al. (2010). The reasons for the different
results are hard to estimate. One possibility could be the different datasets used and the fact that
Carling et al., distinguished between midfielders and attackers only, while Bradley et al., consid-
ered five different playing positions. Overall, regarding this substitution factor, it can be concluded
that from a work-rate perspective, a substitution seems to be effective, but it is still not clear
whether substitution is also effective from a technical and outcome perspective (Bradley et al.,
2014). Having this information would help the coach in optimizing the performances of his players
during the game, as it will become more clear the contribution of each player position to the overall
38
Tactical decision making in football
team performance (Bradley et al., 2014). This understanding is very important for gaining tactical
advantage (Coelho et al., 2012).
There are a few factors retrieved from the reviewed literature which seem to be less important or at
least have been discussed only by few studies. Weather was only mentioned once by Lenahan and
Solari (2002) and does not seem to be an important factor, especially compared to the rest. Oppo-
nent’s strength and tactics (especially opponent’s substitutions) have been mentioned in a few
papers as possibilities for future research (Geyer, 2009; Hirotsu and Wright, 2003; Rey et al.,
2015). Yellow card is mentioned but not studied in detail, as a way to protect the player who re-
ceived warning and to avoid his potential dismissal by e.g., red card (Gomez et al., 2016; Rey et al.,
2015). Location refers to whether the team plays home or away. Additionally, it does not seem that
existing research has focused enough on the technical aspects of players’ performance in relation
to substitution and how this affects the game outcome and the team’s performance overall. Myers
(2012) and Rey et al. (2015) specifically call for the integration of more factors when studying sub-
stitution in the future.
Existing research on player substitution in football has advanced and in recent years the papers
build upon previous research results. Professional tracking data contributes the most for better qual-
ity of the studies, as early research relies mostly on hypothetical data (e.g., the papers by Hirotsu),
or data freely available on the web with which very limited information about the game is availa-
ble, and it tends to be aggregated. A major drawback of previous research is the lack of rigorous
dataset description which makes the reproducibility impossible. Small sample sizes make generali-
zation of the results meaningless. In addition, as seen previously, there have been contradictory
findings in previous research results. Most importantly, though, almost none of the papers included
in the review process discuss the implications for practice. There are no guidelines on how and why
coaches should implement the results in practice, or the potential outcome by following some of the
rules and recommendations found. However, with the differences reported between studies, this
would also make not much sense. Most of the authors are not sports scientists, so even though this
makes the statistical analysis rigorous, the interpretation of the results is superficial or non-existing.
Another important outcome of the review is that existing research has focused on analyzing aspects
of the substitution decision, but none have referred directly to decision makers to investigate how
they make their decisions, and which factors they consider. If research outcomes are to have practi-
cal implications, then it would be mandatory to first understand the decision-making process of
coaches and find out what would be important for them to have as information.
Considering the critical review of previous research results, in a next step, a qualitative study is
conducted with coaches and football experts from the world of professional football in Germany.
39
Tactical decision making in football
4.3 Qualitative study
In this section, a qualitative study is conducted via semi-structured interviews with football experts
from the professional football league in Germany. The study’s main objective is to understand in
more detail how experts decide on which player should leave the game and when. To achieve the
research goal, it is of relevance to (1) confirm, revoke and extend the derived substitution factors
from literature and, thus, make generalization possible, and (2) to investigate whether coaches and
their teams rely on tracking systems when making this decision – the goal here is to get an under-
standing of how coaches perceive the use of analytical support systems in such situation.
Following the definition of the research objective, the next step in a qualitative study is deciding on
the method which would give the best results in reaching this objective. Some of the methods used
in this type of study design are interviews, observations or focus groups (Anderson, 2010). In this
case, interviews were chosen to be most appropriate, and specifically, semi-structured interviews.
This type of interviews has the advantage that on the one hand, the researcher can ask all the pre-
determined questions, but on the other, they can leave space to the interviewees to offer new mean-
ings to the studied phenomenon, and thus allows for considerable reciprocity between the partici-
pant and researcher (Galletta, 2013). This offers both structure and flexibility in the research in-
quiry.
In a next step, the interview questions need to be defined. These should relate to the objective
which in turn is derived from the literature review. Stemming from the review findings, the inter-
view questions were developed. Each question is related to one or more of the derived substitution
factors. An overview is given in Figure 11.
Mostly factors that were often mentioned and analyzed were chosen when formulating the inter-
view questions. As it was anticipated that experts will tend to give short and clear answers and will
have a limited time at their disposal, the number of questions was kept as short as possible. Oppor-
tunity to gain additional information was taken spontaneously, depending on the answers of the
interviewees.
40
Tactical decision making in football
4.3.1 Participants
Potential participants were chosen by using the purposive sampling method. This means, only par-
ticipants who have characteristics relevant to the study and can be most informative were contacted
(Anderson, 2010). Data was collected as part of a bachelor thesis (Tümer, 2016), under the supervi-
sion of the author of the current thesis. The student was part of a professional tracking system com-
pany and had direct access to football experts. Therefore, the contact was established relatively
easy and what could have been a major obstacle, was avoided in this case. The major criteria for
sampling, was the expertise level of the participants. An expert coach is someone who (a) has ex-
tensive knowledge obtained via a number of formal (e.g., certification programs) or non-formal
(e.g., coaching workshops) learning environments, (b) who’s knowledge and experience allows
them to make intuitive decisions, and (c) who achieves consistent and superior performance
(O'Connor, 2013). In this case, as football expert we considered someone who works/has worked in
the professional Bundesliga either as football player turned coach or a coach who has demonstrated
expertise and success over the years. Experts were divided into three categories, according to their
level of expertise. Category A is the highest level and consists of former national players turned
professional coaches. These are well-known experts whom we have watched over the years in ei-
ther capacity, and professionals about whom we read in magazines and TV shows. Category B are
41
Tactical decision making in football
coaches and experts who work currently at professional level but have not yet achieved the stardom
level of category A. Finally, category C are former players or coaches with professional coaching
license that work as support staff or are at the beginning of their coaching career.
42
Tactical decision making in football
Table 13. Overview of participants and interview durations
Expert
# Interviewee Position Language Via Min.
level
Total 97
Average 12
4.3.3 Analysis
All interviews were recorded and transcribed by using the smooth verbatim transcription system, as
described by (Mayring, 2014). This system was chosen because transcription is done word by word
but utterances like uhms or ahs, or decorating words like, right, you know, yeah are left out (Mayr-
ing, 2014, p. 45). The coding procedure was done twice by two coders, and in two cycles. In the
first cycle, codes were predefined based on the interview questions. During the first coding cycle,
additional emerging themes were assigned new codes. When the first round was finalized, codes
were reviewed and restructured or modified as needed. The final list of codes contained four cate-
gories, and under each category, several codes were added. These are presented and described in
Table 14.
43
Tactical decision making in football
Table 14. Coding categories and codes overview
Category Code Description
Sub-Importance Is substitution important for the interviewee?
General Sub-Positive influence How often the substitute player had a positive influence?
Sub-definite rule A rule/situation which will definitely provoke substitution.
Tactics Anything referring to tactical aspects.
Score The role of current game score as a factor.
Timing-Score dependent Timing of substitution depending on the current score.
Timing-General Timing in general.
Factors Performance Technical performance
Weather and pitch conditions Are these factors relevant for player substitution?
Injury Injury as a sub factor.
Cards Cards, including yellow and red cards.
Other Other factors mentioned by participant.
Forward Factors for sub forward players.
Midfielder Factors for sub midfield players.
Position
Defender Factors for sub defenders.
Position-General General comments on sub positions.
Reasons Reasons and experience for using tracking systems (TS)
Tracking
Timing When are these TS used the most? Pre- post or during the game?
systems
Results presentation How are results presented and communicated to the team?
QDA Miner2 was chosen as tool for the coding and content analysis (Version 5.0.17, Provalis Re-
search).
4.4 Results
Following the coding system presented in the previous section, this step of the content analysis
resulted in 169 coded segments. An overview is presented in Figure 12. The highest number of
coded segments belongs to the categories relevance of the substitution decision, positive influence
of substitute players, definite rule that will provoke substitution, timing in general, factors for re-
placing strikers, midfielders and defenders, and finally reasons for using tracking systems. Below,
results will be summarized based on each of the four categories.
2
https://provalisresearch.com/
44
Tactical decision making in football
45
Tactical decision making in football
in general, any percentage can be considered as purely a rule of thumb estimation, as there is no
scientific proof, yet.
Three main situations would make coaches definitely replace a player: (1) tactical reasons, (2) un-
satisfactory physical (i.e. fatigue) and technical performance, and (3) current score or the way in
which the game develops in general.
46
Tactical decision making in football
needs to skip next game (Bobic), or red cards including for goalkeepers (Gitschier, Yilmaz). These
are unforeseeable, but a coach must always be prepared for them and be able to react accordingly.
At times, coaches gave a longer answer on substitution factors, and in this case, segments were
coded under “other” as it was difficult to extract full sentences regarding one single factor. In addi-
tion to referring to the previously mentioned factors, some coaches also mentioned aspects like
replacing a player who performed very well (e.g., scored 3 goals) to give a chance to the audience
to award him with applause (Daum, Götz, Koc), interpersonal skills (Daum), whether it is im-
portant to win the game, or perhaps a tie is enough (Effenberg), is the team palying home or away
(Effenberg), and finally, regarding the opponent – to disturb their rhythm (Daum), to prepare in
advance for a specific opponent, considering theor strengths and weaknesses (Effenberg), or to
respond to the tactical moves of the opponent (Gitschier & Konrad, Goetz).
47
Tactical decision making in football
for using are: showing the players where their errors were (Balakov, Effenberg), coaches’ own er-
rors (Yilmaz), speed and heart rate (Pezzaiuoli), preparation for the next game (Goetz, Yilmaz),
analysis of the opponent (Effenberg, Yilmaz, Gitschier & Konrad), training program. Most of the
coaches have very positive experiences with analytics systems: “Ich bin schon ein großer Fan”
(Balakov), “…dass es sehr hilfreich ist für die Spieler“ (Yilmaz). Effenberg, is still more reserved:
„aber unter dem Strich steht dann das Fußballspiel“.
All respondents said they are using tracking systems before and after the game. Bobic uses them
also in real-time mostly during the half-time break, when analysts show directly with the most im-
portant information in a shortened form. Gitschier and Konrad even show short video snippets dur-
ing the half-break to their team, while Goetz is using them reluctantly as, according to him, there is
not enough time to act on the results.
Coaches discuss the videos and analysis results either with the whole team, or in separate groups,
especially groups comprised of the separate positions. Individual talks are not conducted very of-
ten, only when necessary (Balakov).
48
Tactical decision making in football
The empirical study confirms some of the literature findings and extends the knowledge base fur-
ther. An overview of which findings from the literature were confirmed by the QUAL study is pre-
sented in Table 15.
Table 15. Comparison of literature and empirical study findings
Confirmed in
Literature finding Elaboration
QUAL study?
Tactics is one of the most relevant factors Yes
Current game score is important factor and strongly relat-
Yes
ed to timing of substitution
Interviewees point out minutes in
which most substitutes are made but
Attempts to find specific time slots for substitution No
mention that this depends on the
current situation.
If the team is winning, there will be a tendency towards
Yes
more defensive strategy
Midfielders and strikers are men-
Most often chosen positions for leaving the game are
tioned as substituted most often, while
midfielders, followed by forward players or strikers, and Partly
defenders are confirmed to be re-
defenders
placed the least.
But mostly based on simple observa-
Fatigue as relevant factor Yes
tion, and not analytics.
Weather does not seem to be relevant factor Yes At least not in Germany.
Coaches consider this as a relevant
Opponent's strength and tactics as future research Yes factor which they consider for player
substitution.
Yellow card as way to protect players from being com-
Yes
pletely dismissed and red cards should be anticipated
Location Partly Only mentioned by one coach.
Additional substitution and its effect on the types of sub- This was omitted from QUAL study.
No
stitutions
The major difference between research and practice was found in the timing of substitution. Large
body of previous research has focused on finding the best time for replacing a player depending
mostly on the current score. In practice, this is not relevant, and it depends on the interaction of a
wider variety of factors in addition to the current game score. Coaches are not interested in finding
the perfect timing and it is questionable if such can be determined, or more importantly, if this
makes sense. The effect or relevance of additional substitution was not investigated in the empirical
study and therefore comparison is not possible. However, according to latest developments, as
mentioned earlier, such substitution is most likely going to be permanently introduced and allowed
in professional football. Future studies could make perhaps a comparison between the effect of the
third and fourth additional substitution rule changes.
Based on the results from previous research and the empirical study, it is possible to derive a more
detail overview of factors that could be considered for substitution in football. These are presented
in Figure 13.
49
Tactical decision making in football
Derived factors are categorized in three different categories. First category consists of the contex-
tual and situational variables: match location, opponent’s strength, competition type (national,
international, cup league), current standing in the table, weather (included for full representation).
Second, the player performance factors: technical and physiological performance which refers to
typical PIs as described in section 2.3. And third category, related to tactics and strategy: team
formation – the team structure determined pre-game, in-game team tactics from own team and the
opponent, and in-game player tactics (own and opponent’s players). Additionally, in the figure
above, the availability of capturing data related to each factor is displayed, including which data are
available via Opta Sports, as this will be the dataset used in the current thesis. The only factor that
cannot be investigated in the current study is the physiological performance which is related to
fatigue. These data are currently not being collected in live games, and are therefore, not readily
available for analysis.
50
Tactical decision making in football
erature that were identified as most relevant to answer the thesis’ research question, were chosen
when formulating the interview questions. Finally, focus groups with several coaches could be con-
sidered as a method of data collection, if this would be feasible.
51
Part III - Advanced Analytics Methods
for Team and Player Insights
52
Study Design
5 Study Design
A first step in any data analytics project is understanding the domain, goals that need to be reached,
and the questions that should be answered by means of advanced analytics methods. Chapters 3 and
4 discussed the dynamic system theory as an alternative way of analyzing football performance
data, and explained the tactical aspects of football, with special focus on tactical decisions during
live games – mainly the substitution decision. By conducting a literature review and a qualitative
study a better understanding of the domain area, in this case, tactical decision making in football,
could be gained.
Following the domain understanding, three different analytics methods, or collection of methods of
different types (social network analysis, artificial neural networks and process mining), are applied
on professional tracking data. Each method is described in a separate chapter that follows a similar
structure:
53
Study Design
1. The method itself is described and related work in the area of football performance analysis is
presented and discussed.
2. A brief explanation of the data preprocessing is given, describing which attributes from the
dataset are relevant for the specific method and how data was transformed so that it is possible
to apply the method. This step is the most time consuming but also one of the crucial steps in
any data mining process, as any mistakes at this point will affect the quality of the final results.
3. Data analysis by applying analytics techniques from the three methods.
4. Discovered knowledge - discussing the derived player and team insights and their practical
implications. In chapter 9, it is further discussed which of these insights could be used by
coaches during real-time decision making.
3
http://www.optasports.com/
54
Study Design
OPTA uses numbers to describe what happened and each number has a specific meaning. For ex-
ample, in the figure above the event in question is a “pass” marked with the number 1 (“what”).
OPTA distinguishes between approximately 67 different events and each event then has several
qualifiers which contain additional information about the event. There are in total 280 qualifiers. In
this example here, there are six qualifiers which give additional information on the pass (e.g., 140 –
Pass end x, 141 – Pass end y, showing the coordinates of the pass). Basically, there is a very de-
tailed description of “what” happened, and in addition there is information on “who” did that or
who was involved in this action/event, plus the timestamp. The min and the sec are also available
(“when”), plus there is an additional timestamp with date and time which denotes the time of the
event during the game. Information is also available on the “where”, by using the x and y pitch
coordinates. Furthermore, every action/event has a specific definition and these actions are also
grouped in categories e.g., attacking or defensive events, goalkeeper events among others. As men-
tioned in section 2.2, data is retrieved from cameras installed on the football pitch and two opera-
tors use a proprietary software to record each action, the time and position of it. Operators go
through rigorous training and familiarize themselves with the system and OPTA’s event definitions
but also the characteristics of football in each country.
The dataset contains all the games from the European Championship in 2016 (Euro 2016) which
was won by Portugal. There are 51 games in total available for the analysis. Not all games are used
for each method. A selection is made depending on the purpose of the analysis and the required
amount of data. Below is an overview of the games which were used in each chapter in Part III. In
Appendix B a list of all 51 games from the Championship and the final result is included.
Table 16. Overview of matches included in the data analysis
Method Chapter Matches included in the analysis Matches total
Self-organizing Maps or SOMs are a type of neural network, a method which is used in chapter 7.
In total, 35 games are selected for the analysis. All winning matches of the favorite (20) and all
winning matches of the underdog teams (15) are part of this analysis. Process Mining techniques
are used in chapter 8. This is an exploratory study, and therefore, one game was sufficient in order
to evaluate whether a certain algorithm or analytics technique from the area of process mining is
55
Study Design
suitable for football performance analysis. In addition, some of the algorithms (for instance, Mar-
kov chain clustering) take longer if more data is used.
The questions which are investigated in each chapter are chosen based on characteristics of the
Euro 2016 championship, what was considered interesting, controversial or unique for the competi-
tion. This makes the analysis more interesting but it also gives a chance to (dis)prove the popular
opinions with concrete data analytics results.
56
Social Network Analysis
6.1 Method
Social network analysis (SNA) is used to analyze relationship between entities. There are networks
in many parts of life and the sciences. For example, transportation networks, biological networks,
corporate networks, to name a few. Entities are represented by vertices or nodes, while the relation-
ship between them is represented by edges in a visualization that is called a network graph. SNA
methods date back to the 1700s but became popular in the 1990s mainly due to the work done in
statistical physics and computer science (Kolaczyk and Csárdi, 2014). Statistical physics has en-
couraged the use of network science for analyzing complex systems, and especially the idea that by
understanding how the separate parts of one system interact with each other, one can better under-
stand what drives the collective behavior of the system as a whole (Kolaczyk and Csárdi, 2014).
This is rooted in the dynamic system theory, as seen in chapter 3. A complex system, as discussed,
is not static in nature but dynamic. It evolves and changes over time as it adapts to its environment.
The logical consequence is that a network, corresponding to a complex system, is also dynamic,
and should be represented as such (Kolaczyk and Csárdi, 2014). However, up to now, networks
have been mostly analyzed and presented as static, as the people in the networks were not consid-
ered as adaptive agents capable of taking action, learning, and altering their networks (Carley,
2003; Kolaczyk and Csárdi, 2014).
Dynamic networks are networks which undergo structural changes over time - nodes and edges
appear and disappear or their attributes change over a certain time period (Shi et al., 2015; Zaidi et
al., 2014). Thus, such networks can be used to display time-varying relationships (Shi et al., 2015).
In general, one can consider the a) dynamics of a network (edges and nodes change over time); b)
dynamics on a network (attributes of edges or nodes change in time), or c) both simultaneously, if
57
Social Network Analysis
applicable (Kolaczyk and Csárdi, 2014). In mathematic terms, a dynamic graph is represented in
the following way:
Equation 1. Mathematical formulation for dynamic networks
Dynamic network analysis (DNA) is concerned more with the actors’ activity and their relation-
ships and to a lesser extent with the structural changes of the network, part of traditional SNA
(Trier, 2008). Efficiency and stability of the network are in the focus of the analysis (Zaidi et al.,
2014). The reasons why DNA gained popularity in research only recently are mainly related to the
lack of network data with timestamps available, and consequently the relative immaturity of the
field itself as it was challenging to test theories without available data (Kolaczyk and Csárdi, 2014;
Rossetti and Cazabet, 2017; Zaidi et al., 2014). Another challenge is that the field of study is done
in various disciplines without much connection between them, which results in different terminolo-
gies being used for the same concepts (Holme and Saramäki, 2012). For instance, except “dynamic
networks”, one can also use terms as “temporal networks”, “dynamic graphs” or “longitudinal net-
works” to extract potentially valuable research discussing similar ideas. On the other hand, while
real-world networks are rarely static, it does not necessarily mean that networks with time data
should not be represented as static graphs or with some level of aggregation. According to Holme
and Saramäki (2012, p.99), a temporal network is suitable for modelling and analysis when “the
system under study should consist of agents that interact pairwise, so that the interactions have both
some degree of randomness and some regularity (i.e., there is some structure)”. This statement re-
lates to the dynamic system theory which considers a system as dynamic when its separate ele-
ments interact in a state of constant change but with a certain degree of regularity.
A detailed review of the literature on temporal or dynamic networks is out of the scope of this the-
sis. However, a brief overview of the main research areas of the field is presented in the following.
Research in the area of DNA can be roughly split up in two (overlapping) areas. One area is analyt-
ics which, up to now, has focused mainly on developing relevant metrics, detecting change or dis-
covering communities in a dynamic network. However, previous studies are mostly isolated works
which focus on specific problems, and therefore, not many have dealt with developing a general
methodology for DNA (Sloot et al., 2013). The three research directions are explained briefly be-
low.
58
Social Network Analysis
Concerning the dynamic metrics, their number, type, complexity and value changes as traditional
SNA metrics cannot be directly mapped to dynamic networks due to the temporal factor (Carley,
2003; Zaidi et al., 2014). Research in this area is ongoing, and so far there are no well-established
dynamic metrics as there are such for traditional network analysis (e.g., betweenness or closeness
centrality, degree centralization). Furthermore, some metrics are more suitable for networks aggre-
gated in several time windows, while other metrics are dependent on the order of link activations
(Holme and Saramäki, 2012). Metrics that have been suggested in the context of DNA are: paths,
as they are known in SNA, or time preserving paths in DNA, connectivity and transitive connectiv-
ity, latency and distance in temporal networks, diameter or network efficiency (the harmonic aver-
age of the latency), closeness and betweenness centrality adjusted to dynamic networks, motifs and
entropies4.
The second focus of DNA is the detection of change in dynamic networks. Changes in interaction
patterns could be linked to team’s effectiveness or the emergence of informal leaders, and being
able to detect changes early before they impacts the whole group can be of great significance as it
would enable faster response to change and prevent unwanted consequences (McCulloh and Car-
ley, 2008a). According to McCulloh (2009), in dynamic or longitudinal networks, there are four
potential network states: stability – when the relationship between group members stays the same
over time; evolution – interactions between members lead to a change in the relationships over
time; shock – change that is exogenous to the social group, and mutation – when an exogenous
change initiates evolutionary behavior. Most of the existing research is focused on network evolu-
tion and there is not much research on the states of shock and mutation (McCulloh, 2009). These
are not the only possible changes, however, as some authors discuss changes in dynamic events
over networks (Li, S. et al., 2017). The change that occurs can be local, which can mean, for in-
stance, change in node attributes, or global, for example, change in network topology (Li, W. et al.,
2017). Although they are interdependent and change in the nodes can lead to a global change in the
network, research has mostly investigated them independently (Li, W. et al., 2017). In principle,
change detection can be applied to network measures of any level – node, graph, or edge level
(McCulloh and Carley, 2011). The challenge is to develop metrics that can detect meaningful
change in a state of normal variability (McCulloh and Carley, 2011). Existing research on methods
for change detection in dynamic networks is limited. McCulloh and Carley (2008) discuss three
such methods: the cumulative sum (CUSUM), the exponentially weighted moving average
(EWMA), and a scan statistic (SS). In their study, they conclude that the CUSUM works best for
detecting a sudden and unexpected network change. They demonstrate that the CUSUM statistic
shows a clear network change compared to a graph that only shows how a metric changes over
time. Such result is due to CUSUM considering previous changes of the network. This method has
4
For more detail overview of the metrics, the reader is referred to Holme and Saramäki (2012), Holme
(2015) and Lerman et al. (2010) for a centrality metric in dynamic networks.
59
Social Network Analysis
also been implemented in a tool for DNA developed by the authors’ research group at Carnegie
Mellon University, called ORA. The tool is also used for the analysis in section 5.5. The results are
confirmed in a later study by the authors in which they stress on the fact that the CUSUM statistic
not only informs that a change has occurred (change detection), but also when this change hap-
pened (known as change point detection) (McCulloh and Carley, 2011). Other studies have used
the SS method (Marchette, 2012; Neil et al., 2013). However, in their recent paper, Zou and Li
(2017) point out that studies like the mentioned above treat network data from different time-steps
as independent observations instead of as time series, and thus, exclude the natural evolution of the
dynamic networks from the change detection methods. The authors suggest their own method,
called Network State Space Model (NSSM) which addresses the problem with previous research,
and is based on the classical SSM approach for modelling multivariate time series data.
The third area of research is focused on community discovery (CD) or graph clustering, an area
known as Dynamic Community Discovery (DCD). Community discovery has been widely re-
searched in the area of static social networks, however, there is no formal definition of community,
yet (Cazabet, 2017). Even though intuitively, the notion of community is clear, the problem arises
with its formulation, as any given network can have different partitions each capturing valuable
network information (Rossetti and Cazabet, 2017). A community has been defined in the literature
as sets of nodes densely connected together and weakly connected to the rest of the network (Ca-
zabet, 2017); simply as dense network subgraphs (Rossetti and Cazabet, 2017); or entities that
share a set of actions with the other community members (Coscia et al., 2011). As there is no wide-
ly accepted definition of community in the literature, there is no perfect algorithm for extracting a
community. Each algorithm, focuses on one or few network properties and, thus, it implies its own
definition of community (Coscia et al., 2011). The goal of DCD is to detect time-varying clusters
of densely connected communities that evolve over time (Zaidi et al., 2014). An overview of algo-
rithms used in the literature for DCD is given by Coscia et al. (2011) and Bedi and Sharma (2016).
It should be noted, however, that most of the codes are not available publicly and not all of the
suggested algorithms would be applicable for CD in football. Finally, it is important to note that the
nature of the DCD does not depend on the manner in which the network evolution is represented -
e.g., whether the chosen time-steps are sequential or overlapping (Cazabet and Amblard, 2014).
The second research area is visualization of dynamic networks, in which the main question is how
to appropriately incorporate the notion of time in the network graph (Kolaczyk and Csárdi, 2014).
In an ideal case, one would like to maintain as much of the available temporal information (Ko-
laczyk and Csárdi, 2014). There are a few aspects that should be considered when deciding on the
visualization type. These are summarized in Table 17.
60
Social Network Analysis
Table 17. Dynamic network analysis terms
Term Explanation
Relational pace Rate of change in relations, with focus on irregularities. It can be described by:
a) Levels: fast, slow
considerations
Continuous Interactions being recorded with exact starting and ending time
Time
When analyzing dynamic networks, there are some theoretical aspects for the researcher to consid-
er: a) the pace of change in the relations between entities – the exact meaning will depend on the
context (Moody et al., 2005). In football, there is not enough research in the area of dynamic net-
works for football match analysis (see 6.2). Therefore, the assumption is that analysis can be fo-
cused on all three types of irregularities mentioned in Table 17. One can think of change in sec-
onds, minutes, weeks, months etc. (Moody et al., 2005). In a single football match, the change can
be investigated over seconds and minutes; b) the sequence in which relations occur – helps to ex-
plain the prevalence of a given network structure (Moody et al., 2005). Sequence is important in
football as well and we can consider sequences of actions, or in the case of network analysis, se-
quences of interactions between the players; c) concurrency – the case of overlapping relations in
time which share at least one person (Moody et al., 2005); and d) transitivity – the proportion of
node triples that form triangles i.e. all three node pairs are connected by edges (Kolaczyk and
Csárdi, 2014). It is not yet clear whether and to what extent concurrency and transitivity, or any
other measurement in DNA are relevant in the case of football PA, as the field of research is only
recently taking off. However, this is investigated in more detail in 6.5.
Temporal aspects, also relevant for visualization are: a) discrete time –networks are represented as
snapshots in discrete time windows, within which nodes and interactions appear or disappear (Zaidi
et al., 2014); b) continuous streaming – continuous time scales are used to represent nodes and edg-
es, for instance, data packets moving over a LAN or the Internet (Zaidi et al., 2014). Continuous
time representations help researchers to identify how overall network changes emerge through or-
dered dyadic events (Moody et al., 2005); c) time-step is the term used for indication of the sub-
graph of a network and d) time window is the amount of time which is represented in each subgraph
(Zaidi et al., 2014). For example, if one splits the first half of a football match in three snapshots or
61
Social Network Analysis
subgraphs, the time-step can be noted as a, b, c, or 1, 2, 3, or 0-15 min, 15 to 30 min etc., while the
time window for each would be, in this case, 15 minutes.
To sum up, the research in the area of dynamic networks is still relatively young. The fact that the
field is interdisciplinary, and contributions come from various scientific fields is a blessing and a
curse at the same time. On the one hand, quite a few algorithms are developed for similar issues
e.g., community discovery. On the other, they are not available or are not based on existing re-
search. It can also be the case that an algorithm developed for a specific biology problem, for ex-
ample, cannot be readily adopted in another area of study. The differences in terminology and defi-
nitions also makes it harder to find and consolidate relevant research. Therefore, as a next step, the
application of network analysis, including DNA, in football specifically will be discussed by re-
viewing existing publications in the literature. The findings serve as a starting point for the network
analysis included in this thesis.
Duch et al., 2010 Team & Player EuroCup 2008 UEFA stats. All matches. Not mentioned
Clemente, José et al., 2016 Player 20 matches from La Liga and EPL SocNetV
Career histories of 800 players from
Grund, 2016 Team & Player soccerbase.com, & 760 EPL matches Not mentioned
from OPTA.
109 matches of best 16 teams in UEFA SocNetV &
Clemente and Martins, 2017 Team
15/16 SPSS
Quarter final teams from Euro 16 &
McLean et al., 2017 Team & Player Agna & SPSS
COPA 16
63
Social Network Analysis
Furthermore, most publications consider the attacking and ignore the defensive performances. At-
tacking performance is, in general, easier to analyze and quantify than defense. Additionally, when
networks are created, they are based on the attacking sequences – a sequence starts from the mo-
ment the team gains possession of the ball, and ends when it loses the ball to an opposition player.
Whenever the team has the ball, one considers that they are in attack. Therefore, most analysis have
been focusing on this part of the players’ and teams’ performance.
From the overview in Table 18, several observations can be made. First, SNA in football has only
become popular since 2010. Nine papers are published by the same set of authors, mainly Clemen-
te, Martins, Couceiro and Mendes. Although all of the papers have a valuable contribution to SNA
research in football, as the field is still quite young, the structure and methodology used by
Clemente and colleagues is rather repetitive in most of their papers. The papers by Lusher et al.
(2010), Wäsche et al. (2017) and Ribeiro et al. (2017) are review papers discussing the potential
and recent advancements of network methods for sports team performance analysis, not exclusively
focused on football.
Next, the papers retrieved use different data sets in their analysis, with only few publications using
professional tracking systems data - Grund (2012) and Grund (2016) used OPTA data, while Gama
et al. (2014) used the Amisco system. The rest of the authors either extracted the data from UEFA
or other online channels or have retrieved the passes manually from recorded matches. The variety
in number and type of matches analyzed also makes it difficult to generalize findings. Additionally,
authors have been mostly using team level of network analysis, while the combination of both team
and player network metrics, has been employed in eight of the reviewed papers. Third, various
tools, not necessarily network analysis tools, have been used so far. SPSS seems to be preferred
when statistical analysis were employed, while a few of the papers relied on Matlab and Mathemat-
ica for their calculations. It is interesting that some well-established SNA tools have not been
broadly adopted by researchers in this area. In a few papers, there is no mention at all of the tools
or programming languages used (Cintia et al., 2015; Cotta et al., 2013; Duch et al., 2010; Grund,
2012, 2016; Mendes et al., 2015).
Network metrics
Previous research has used well established network metrics when analyzing the interaction be-
tween football players. An overview of the metrics that have been used so far is given in Table 19.
A definition of the metrics and their meaning in football is given in Appendix C.
64
Social Network Analysis
Table 19. Social network metrics used in football performance analysis
Metric Used/Discussed in Level
Pena and Touchette, 2012 Player
Betweenness centrality Trequattrini, Lombardi & Battista, 2015
Ribeiro et al., 2017
Grund, 2012 Team
Clemente, Martins, Couceiro, Mendes & Figueiredo, 2014
Centralization Clemente, Couceiro, Martins & Mendes, 2015
Grund, 2016
Ribeiro et al., 2017
Clemente, Martins, Couceiro, Mendes & Figueiredo, 2014 Player
Centroid
Clemente, Couceiro & Mendes, 2014
Pena and Touchette, 2012 Team
Clique
Trequattrini, Lombardi & Battista, 2015
Pena and Touchette, 2012 Player
Closeness centrality Clemente, Mendes & Martins, 2014
Ribeiro et al., 2017
Pena and Touchette, 2012 Team
Cotta et al., 2013
Clemente, Martins, Couceiro, Mendes & Figueiredo, 2014
Clemente, Couceiro & Mendes, 2014
Clustering
Clemente, Martins, Kalamaras, Wong & Mendes, 2015
Clemente, Couceiro, Martins & Mendes, 2015
Clemente & Martins, 2017
Ribeiro et al., 2017
McLean et al., 2017 Team
Cohesion
Wäsche et al., 2017
Lusher & Robins, 2010 Player
Cotta et al., 2013
Degree Clemente, Mendes & Martins, 2014
centrality Trequattrini, Lombardi & Battista, 2015
Clemente, Martins, Kalamaras, Oliveira, Oliveira & Mendes, 2015
Clemente et al., 2016
Ribeiro et al., 2017
Clemente, Martins, Couceiro, Mendes & Figueiredo, 2014 Team
Clemente, Martins, Kalamaras, Wong & Mendes, 2015
Trequattrini, Lombardi & Battista, 2015
Clemente, Martins, Kalamaras, Oliveira, Oliveira & Mendes, 2015
Density
Clemente, Couceiro, Martins & Mendes, 2015
Clemente & Martins, 2017
McLean et al., 2017
Ribeiro et al., 2017
Diameter Clemente, Martins, Kalamaras, Wong & Mendes, 2015 Team
Pena and Touchette, 2012 Team
Distance
Trequattrini, Lombardi & Battista, 2015
Eigenvector centrality Cotta et al., 2013 Player
Ribeiro et al., 2017
Flow centrality Player
Duch et al., 2010
65
Social Network Analysis
Table 19. Social network metrics used in football performance analysis (continued)
The above table shows that eight network metrics at player level have been used by researchers
while there are 12 metrics at team level used to explain performance in football. Some of the more
popular metrics are betweenness centrality, closeness centrality, degree centrality and prestige at
player level, while centralization, clustering, density, and heterogeneity are mostly used at team
level analysis. Additionally, Clemente, Martins et al. (2016) discuss several metrics at player (mi-
cro), team (macro) and sub-group (meso) level which the authors suggest can be used in network
performance analysis in football. An overview is presented in Figure 16.
66
Social Network Analysis
As seen, there is not a lack of metrics to use in football PA. The issue is rather with choosing the
right metrics to answer a specific question. Moreover, there are several mathematical definitions of
the same metric. It is not clear whether this would have an effect on the results when different defi-
nitions are used. In theory, the equations should reflect the same idea behind a metric, but with
various SNA tools and equations available it is worth pointing out that this could be a possibility.
However, when researchers use tools for SNA, like for instance Gephi, Pajek or ORA, not all met-
rics are available, and the definitions are sometimes lacking. This also has to be considered when
one conducts SNA. Finally, previous research has not thoroughly discussed if a metric should have
a higher or lower value in order for the player or team to show a better performance. This is hard to
achieve, however, as to make such conclusions, there need to be quite a few studies, conducted on
the same type of data and following the same methodology. Otherwise, it would be hard to general-
ize the findings and say e.g., a team needs to have a higher degree of centralization in order to be
successful, or a striker needs to have higher betweenness centrality in order to have more shots on
goal.
The thesis uses most of the mentioned metrics in Figure 16. All metrics that are available in the
ORA tool are used in the analysis. At sub-group level, the network assortativity metric is used. At
the team level, total links, density, diameter, clique, transitivity, reciprocity, and degree centraliza-
tion are calculated. And finally, at the player level, betweenness, closeness, eigenvector, and Page-
Rank centralities are used. The list in Figure 16 should not be considered as an exhaustive list,
however, and thus, the metrics in the analysis part of the thesis are based on the metrics that have
been used in the literature (as seen in Table 19), the suggestions in Figure 16, as well as the metrics
available in the ORA tool. Also, some metrics provide similar information about the relevance of a
player and, thus, not all metrics need to be included in the analysis. Most importantly, metrics are
chosen in a way that they help to best answer the question of interest (e.g., “was substituting
Rooney a good decision?”).
Dynamic analysis
Previous research in the area of SNA has struggled to determine whether the network topology or
structure drives the performance of a team, or the different performance levels would promote a
certain type of network structures – known as the issue of causality (Grund, 2012). In the previous
section, it is mentioned that one can consider the dynamic of a network (i.e. global dynamics) and
dynamics on a network (local dynamics). In adaptive networks, such as the network of football
players, both types of dynamics interchange and the feedback gives rise to the complex player in-
teractions i.e. the cooperation patterns of a team (the network topology) influence the local dynam-
ics of the players, which in turn affect the global dynamics or topology itself (Yamamoto and
Yokoyama, 2011).
67
Social Network Analysis
Small-world vs. scale-free networks
Current SNA research in football discusses a football team network as potentially displaying the
properties of small-world and a scale-free network. In a small-world network, any two nodes can
be connected by a path of only a few links (Passos et al., 2011). As networks in football consist of
11 players per single team, one can reasonably assume that the small-world effect can be observed
in such networks. Specifically, a point of interest would be to examine the interactions in an attack-
er or defender subunits consisting of more than two players, for instance, 2 vs. 1 or 3 vs. 2 situa-
tions (Passos et al., 2011). So far, very few papers have focused on sub-group analysis in football.
Grund (2016) suggests own measure for what he calls “network experience” which is based on the
dyadic experience between team mates, while Yamamoto and Yokoyama (2011) looked into the
connection between the number of triangles and the frequency of successful attacks. They conclud-
ed that the game momentum may be represented by the number of triangles in each attacking se-
quence (Ribeiro et al, 2017). Triangles as a measurement, has been mentioned briefly in Ribeiro et
al. (2017), Lusher et al. (2010), Peña and Touchette (2012) and Wäsche et al. (2017) but has not
been subject of analysis in any of the reviewed papers, with the exception of the paper by Yama-
moto and Yokoyama (2011) who found that the more triangles existed in the five minute intervals,
the more attacks the team had. This metric is closely related to transitivity, a mechanism that leads
to cohesion or clustering in a network, and gives indication of how the network as a whole is held
together (Lusher et al., 2010).
A scale-free network, has two main properties: growth, as new nodes are being added to the net-
work over a period of time, and preferential attachment, the case when the new nodes tend to con-
nect mostly to nodes that already have many connections themselves (Wang and Chen, 2003). Such
nodes in a scale-free network are also known as hubs. In a football team, players that are preferred
by their teammates, i.e. who get the ball passed to them more often, are the preferential attachments
nodes or hubs in that network (Passos et al., 2011). Identifying such players will give insight into
who are the key decision makers in the specific match, and one could also investigate how the team
will re-organize itself if a hub is removed from the network (Passos et al., 2011). This aspect is
related to the Achilles Heel of complex networks, the “robust, yet fragile feature” – these networks
are robust against random attacks but vulnerable to targeted attacks (Wang and Chen, 2003). The
only paper which investigated this aspect in football is also one of the very few papers on DNA in
football – the paper by Yamamoto and Yokoyama (2011). The authors assume that a football team
network must have simultaneously low vulnerability to intentional attacks, and a power law for
self-organization of the network. This would in turn mean that when a hub is attacked, the network
topology will change to follow the power law, when the function of the hub switches to another
node (Yamamoto and Yokoyama, 2011).
68
Social Network Analysis
Previous research has several limitations. Most papers have considered team or player metrics in
isolation. It is not yet clear which metrics should be chosen for team and player performance evalu-
ations, and metrics at the sub-group level are under-researched overall. With a few exceptions, spe-
cifically Grund (2012; 2016), most of the other papers have a problematic methodology and at
times superficial or narrow interpretation and discussion of the results. There has not been a suffi-
cient reflection on the connection between specific metric values and the different player positions
nor the correlation between a) standard performance indicators and network metrics, b) different
team level metrics and the course of the game development, c) the network insights and the game
outcome, or d) the network and PIs of the opponent. Finally, there is a high diversity of the publica-
tion outlets, which not only suggests the immaturity of the field, as pointed out by Wäsche et al.
(2017), but it also raises concerns regarding the quality of the peer review process – not just regard-
ing the quality of the publication outlet but also the qualifications of the reviewers and their
knowledge in this area. Not all outlets are sports or network related.
There have been a few contradictory findings in existing research. For instance, contradictory to
Grund (2012) who found that high density was associated with better performance, Clemente, Mar-
tins, Kalamaras, Oliveira et al. (2015) found different density values in two matches and in both
Switzerland won. This perhaps serves to show that one cannot consider network metrics in isola-
tion, as every match is different, as are the opponents as well. There are a lot of factors that influ-
ence simultaneously the performance and self-organization capability of the team, and researchers
should be careful when making general conclusions regarding any of the network metrics.
As seen, research of SNA and especially DNA in football is still an emerging research area. Thus,
there are many possible avenues that can be explored in future research. For instance, most papers
have investigated similar and rather few social network metrics, mainly, density, degree centrality,
betweenness centrality, clustering and centralization. Moreover, the temporal aspect was not in-
cluded as the authors have mostly used non-temporal and aggregated passing data from UEFA or
manually collected the passes from video footage. One paper investigated the scale-free properties
of a football network in five minute intervals on two matches from different competitions, not men-
tioning how data was collected (Yamamoto and Yokoyama, 2011), and a second paper conducted
spatio-temporal analysis on a manually collected game data split in 15 minute intervals (Cotta et
al., 2013). Recently, the paper by Kröckel et al. (2017) presented preliminary results from DNA of
the final game from the EU Championship in 2016. The paper demonstrates how by tracking cer-
tain network metrics and their change over the course of the match, as well as the change in net-
work topology can be a valuable method to support coaches in their decision making during live
matches. Another potential research direction is the linkage between metric values and good or bad
performances at player and team level, as well as linking certain network metrics to the different
player positions. This has not been investigated enough but it would be worthwhile to explore it in
more detail as, for instance, not every player position should have a high degree of betweenness
69
Social Network Analysis
centrality. Such findings will be of high relevance for assessing team and player performance based
on network metrics. For these type of analyses, however, much larger datasets are necessary.
Finally, recent papers call for more dynamic network analysis in football (Ribeiro et al., 2017;
Wäsche et al., 2017). However, there are no specific directions for future research mentioned in
this regard, except employing spatio-temporal aspects. Thus, one could focus specifically on inves-
tigating how sub-group interactions affect performance, detecting sudden network changes, inte-
grating the opponent analysis via network metrics, or perhaps also using simulation methods to
assess the effect of removing a specific player from the team/network.
70
Social Network Analysis
Match data used in the analysis:
In this chapter, three games are used to demonstrate how network metrics can help in evaluating
player and team performances. These are the games between:
England vs. Russia (player level)
England vs. Iceland, and Iceland vs. Portugal (team and subgroup level).
The tools used for SNA analysis are the following:
ORA Lite5 - this is a dynamic network analysis tool developed by Carnegie Mellon University.
It is chosen over other tools with user interface because it offers a wide variety of metrics, has a
good documentation and offers the possibility for dynamic network analysis and change detec-
tion, which is important for the analysis.
UCINET6 – is a software package for the analysis of social network data, developed by Lin
Freeman, Martin Everett and Steve Borgatti. The tool was used for calculation of team level
network metrics.
Player who starts a pass Player who receives the pass Time when pass is made
The “source” attribute includes the initiator of the pass, while the “target” attribute refers to the
player that receives the pass. The attribute “timestamp” includes the time when the source passed
the ball towards the target. For the analysis conducted in the current thesis, the location data of the
passes is not considered as this is not part of the network metrics calculations i.e. the metrics con-
5
http://www.casos.cs.cmu.edu/projects/ora/
6
https://sites.google.com/site/ucinetsoftware/home
71
Social Network Analysis
sider the interactions (passes) between the players and it does not matter for these analyses where
the pass originates and towards which part of the pitch it is directed.
Player level
As the current thesis has a special focus on real-time decision support, in this part of the social
network analysis at player level, a controversial substitution decision during the Euro 2016 is in-
vestigated in more detail. This is the decision of England’s coach Roy Hodgson to substitute
Wayne Rooney (a midfielder) in the 78th minute in the match against Russia with Jack Wilshere
(midfielder). Hodgson defended his decision by telling reporters: “I thought he [Rooney] had a
good game, but was tiring” (Paul, 2016). He also made a second substitution, when he replaced
Raheem Sterling, “who had worked very hard”, with James Milner in minute 87 (Paul, 2016). The
decision to substitute Rooney is perhaps even more controversial considering that the leading goal
for England was scored in minute 73 by Dier. Thus, it is indeed surprising that England’s coach
would decide only 5 minutes later, to replace what was most likely one of his key players. Nine
minutes later the second substitution followed, and 5 minutes after that – Russia managed to score
a goal and the game ended in a draw. A timeline of the events in this crucial part of the game is
presented in Figure 17.
Figure 17. Timeline of crucial events during 2nd half of the ENG vs. Russia match
The above figure presents an interesting situation with a team managing to even the score follow-
ing three substitutions 13 minutes before the official time ends, and another team losing the lead by
making perhaps a fatal substitution only 5 minutes following this lead goal. The standard perfor-
mance indicators do not give a clear picture of which team was better. A summary of the main PIs
of this match for both teams is presented in Figure 18. From the figure, one can conclude that both
teams were very close in their performance with England’s team showing stronger offense as they
have more total attempts on goal and total on-target shots. This, however, does not give a clear
picture on the team’s performances, and is thus, not helpful for evaluating a substitution decision.
72
Social Network Analysis
England Russia
In a next step, network metrics on player level are calculated to find key players in England’s team,
in order to investigate the substitution decision more closely. The related work section showed that
at player level metrics such as betweenness, degree, PageRank or eigenvector centrality are rec-
ommended. All of these metrics are used in the current analysis in addition to metrics that have not
been used in previous research but that have the potential to be useful when determining valuable
players in a football team. These are the Authority and Contribution Centralities. A definition of
the new metrics is given below.
Authority Centrality
A node is authority-central to the extent that its in-links are from nodes that have many out-
links. Individuals or organizations that act as authorities are receiving information from a wide
range of others each of whom sends information to a large number of others. (ORA, Documen-
tation File, 2018)
Contribution Centrality
This computes Eigenvector Centrality on a transformation of the input network. Link values are
transformed to be proportional to the dissimilarity of the nodes they connect. The intuition is
that a link between two nodes with the same neighbors is not an important link since neither
node gains new neighbors by the connection. Specifically, each link is weighted by the inverse
of the Jaccard similarity of its nodes.
In a given organization, this measure can tell us who is connected to the most powerful (e.g.,
other highly connected agents) people. (ORA, Documentation File, 2018)
Based on the definitions above, both metrics show which nodes in the network have incoming links
from other well connected nodes, but the contribution centrality additionally considers the
(dis)similarity between two nodes and gives more weight to links that open more connection op-
tions to the nodes in question.
73
Social Network Analysis
To gain an initial impression on the team structure before and after Rooney’s substitution, the net-
work of England’s team is visualized in two different periods – a) 15 minutes before Rooney was
replaced, and b) 15 minutes after his replacement (which is until the end of the match). Both net-
works are included in Figure 19. The timeframe was set at 15 minutes as this is what was used in
the literature so far, as presented in the related work section in the study by Cotta et al. (2013). Be-
sides, a shorter time frame would not be sufficient for meaningful analysis as the number of passes
in a few minutes time is rather small.
Figure 19 (a) shows that Rooney is a player who receives many passes from his teammates, which
suggests high popularity. Also, there is a visible triangle between Rooney, Rose and Dier which
leads to the assumption that these three players may be part of a core subgroup in England’s team,
the removal of which can bring significant changes to the network as a whole. This is investigated
further by calculating several centrality metrics for England’s players.
74
Social Network Analysis
Based on the figures above and the centrality metrics presented in Table 21, Rooney is one of the
key players in the 15 minutes before he was replaced. He has the highest value for Authority (1.0)
in this timeframe, which means that he receives a lot of passes from players that themselves have
many outgoing passes (i.e., they pass to many other players). He also has the highest values for
contribution and eigenvector centralities. Thus, in the timeframe before his replacement Rooney is
the player to whom other teammates preferred to pass the ball to, which in turn means that he was
crucial for his team’s offensive actions (high authority); he is the player who has the most options
to pass the ball to other important players that themselves can reach other players (high contribu-
tion), and finally, he is the player with a central regulatory role, and the player who is crucial for
organization of the offensive actions in his team (high eigenvector centrality).
75
Social Network Analysis
Table 21. Selected metrics for England’s players 15 minutes before Rooney is replaced
Compared to the rest of his teammates, Rooney also is often situated between his teammates (high
betweeness centrality), and it is highly likely to have the ball after a reasonable amount of passes
has been made (high PageRank centrality). Rooney is not the player with the highest hub centrality.
These are Rose and Dier – players that have more connections than the average player in England’s
team does, and are thus the dominant players in their team. Rooney’s hub value is half of the value
of these players but higher than 8 of his teammates.
Based on the initial analysis, the substitution decision of coach Hodgson seems to be unreasonable.
To further confirm these results, additional analyses are performed. One option is to measure the
immediate impact a removal of a node has on a network. In this case, England’s network for the 15
minutes timeframe before his substitution took place is considered again. Rooney is removed from
the network and a comparison between both networks is conducted by calculating several metrics
at team level. This shows what the immediate impact of removing Rooney is on the team network.
Results are presented in Table 22.
Table 22. Immediate impact of removing Rooney on the team network
Network metric Before After Percent change Change type
Overall Complexity 0.322 0.290 -10.03%
Diffusion 0.919 0.884 -3.82%
Clustering Coefficient 0.266 0.210 -21.35%
Characteristic Path Length 2.227 2.467 +10.75%
Social Density 0.322 0.290 -10.03%
Average Communication Speed 0.449 0.405 -9.71%
The table above shows that all the team level network metrics change for the worse when Rooney
is removed. The network complexity is a measure that in the case of football gives an impression
about the overall cohesion between teammates. This metric decreases by 10.03% when Rooney is
76
Social Network Analysis
removed. The clustering coefficient is the metric that decreases the most (21.35%). It measures the
level of cooperation between teammates. Greater values suggest the capacity of players to involve
the teammates in the cooperation processes (Clemente and Martins, 2017). The fact that it decreas-
es by 20% means that the ability of some players to involve their teammates in the offensive play
would decrease following Rooney’s replacement. Changes in both the diffusion and the character-
istic path length metrics suggest that Rooney’s removal makes passing of the ball between team-
mates more difficult. The decrease of social density means that overall the affection between
teammates would decrease by 10.03%, while the lower average communication speed means that
passing of the ball between any two reachable players is slower than before. Thus, England’s of-
fensive actions would become less effective with Rooney’s removal from the network.
As a next step, the core network i.e. the largest sub-graph in the network of 15 minutes before
Rooney’s substitution is calculated. This shows which group of players are most densely connected
between each other and to other smaller, and less significant groups of players. The resulting graph
is presented in Figure 20.
The core network, comprised of the most connected and influential players in England’s team for
the 15 minute time period, is comprised of Rooney, Rose, Smalling and Dier, and partly includes
Cahill. Player nodes belonging to the same sub-graph are colored the same and nodes are sized
based on their values for betweenness and degree centralities. Removing players belonging to the
core network will most likely cause significant changes to the team network. Thus, such analysis
can also be helpful for coaches to decide which players should perhaps not be substituted. Once
again, the decision to replace Rooney makes less sense as proved by the additional analysis con-
ducted so far. Thus, a logical next question is, if Rooney is a bad substitution decision, is there a
player in England’s team who was a better candidate for replacement?
77
Social Network Analysis
Figure 20. Core network in England’s team 15 min before Rooney’s substitution
In order to answer this question, the player network metrics that have been used so far are calculat-
ed for all players from the start of the match. The goal is to see if there is a player who has consist-
ently performed worse than the rest throughout the game, or alternatively a player that shows low
values from the beginning of the 2nd half compared to his performance during the 1st half. Figure 21
shows the players that are top ranked based on the mentioned metrics for the first and second half
of the match.
All players were included in the analysis. Substitutes were not excluded in order to compare their
overall performance with the rest of the players during the 2nd half. Substitute players would nor-
mally have lower network metrics values than their teammates, and are, therefore, usually ranked
lower when the network is calculated for all teammates together. That should be considered when
discussing the performance of the players in general, and it is necessary to consider whether substi-
tutes should be excluded from the analysis. But in this case, it is interesting to see if there are some
players who have lower values than the substitute players.
78
Social Network Analysis
From Figure 21 few interesting observations can be made: Lallana, Walker and Alli have lower
values in general for all measured metrics throughout the whole game; in the 2nd half, the metric
values for these players are even lower than both of the substitutes (Milner and Wilshere);
Smalling, however, has visibly improved ranking during the 2nd half, when he is ranked as the third
key player compared to rank 8 during the 1st half. Therefore, these three players are investigated
further.
The question is: Should one of Alli, Lallana and Walker be substituted instead of Rooney and Ster-
ling? Unfortunately, a straightforward answer is not possible in this case. There is no way to know
what would happen when either of these players leaves the game instead of Rooney. However, one
can make reasonable assumptions based on analysis such as: “immediate impact” conducted earlier
on replacing Rooney from the network of 15 minutes before his actual substitution; the dynamic
change of a few key metrics over a period of time can be observed for these players and in combi-
79
Social Network Analysis
nation with additional data, e.g., negative event they participated in (loss of ball possession, foul
etc.), a more detailed player evaluation during the live game can be conducted. Tables 23, 24, and
25 present the immediate impact a removal of players Alli, Lallana and Walker has on the team
network of 15 minutes before Rooney is replaced.
Table 23. Immediate impact 15 min before Rooney - Alli
Before After Percent Change
Overall Complexity 0.322 0.340 +5.49%
Diffusion 0.919 0.910 -1.00%
Clustering Coefficient 0.266 0.291 +9.35%
Characteristic Path Length 2.227 2.233 +0.27%
Social Density 0.322 0.340 +5.49%
Average Communication Speed 0.449 0.448 -0.27%
The results from the tables above are summarized in Figure 22.
80
Social Network Analysis
Figure 22. Immediate impact of removing Alli, Lallana or Walker on the team network
Note: The percentage change that occurs in the network metrics after removing each node is displayed. Network is the
team network of England, 15 minutes before Rooney is replaced.
The biggest change in the network occurs when Walker is removed. Specifically, the clustering
coefficient increases by 22.61%. This means that the cooperation level between his teammates im-
proves when he is removed from the network. The path length also increases by 10.75 % while the
communication speed decreases by 9.71%. The removal of the other two players brings less signif-
icant changes to the team network. Walker played as a defender in this game. Previous research has
found that defenders are the players with highest centrality values (specifically central defenders)
and highest prestige values (external defenders) as their tactical position means that they are the
ones who mostly initiate the offensive sequences. The fact that Walker keeps having the lowest
values from the beginning of the game, combined with the results of the immediate impact on the
network following his removal, suggests that Walker should have been substituted instead of
Rooney or even Sterling. Opinions form football analysts on Walker’s performance in this game
differ (which is not unusual in football). The Telegraph writes: “Final ball could be more consistent
though and still not totally convincing defensively”, while the BBC correspondent remarks: “Vi-
brant attacking performance in the first half justified his selection. May be put under more pressure
against Wales but did well here”. These are two rather contrasting opinions on the performance of
the same player. The network analysis presented so far, support the first point of view. Finally, this
proves that an opinion without data to support it, is only a subjective view of a situation. Data anal-
ysis can give a more clear idea of a player’s performance.
81
Social Network Analysis
Team level analysis
In the second part of the social network analysis, the focus is primarily on the team network analy-
sis. The goal is to demonstrate how metrics at the team level can help explain good or bad perfor-
mances of a football team. This is driven mostly by the fact that traditional team performance indi-
cators cannot always demonstrate meaningful differences between the teams.
Two games of Iceland’s team are chosen for the analysis at network level because this was one of
the underdog teams that marked the Euro2016 with their unexpectedly good performance. There
are a couple of aspects that make this team interesting to analyze. First, this was the first appear-
ance of Iceland in the 60 years of history of the European Championship. Second, Iceland does not
have professional football clubs and its national team players are not playing at high professional
level as the players of other more popular teams. Third, Iceland demonstrated that a tactic consid-
ered outdated by most modern managers, the 4-4-2, should not be overlooked. Iceland was elimi-
nated at the quarter finals stage by France which is a strong competitor. Nevertheless, the team
managed a draw against Portugal and a win against England. A brief overview of both games is
presented in Table 26.
Table 26. Brief overview of Iceland’s matches
Date Teams Outcome Substitutes Goals
14 June 2016 Portugal - Iceland 1-1 3/2 31’ Nani – 50’ Bjarnason
In both matches, Iceland used two out of three substitutions. In the match against England, all three
goals were scored in the first 18 minutes. Therefore, this timeframe is generally more interesting
for the analysis. The game against Portugal ended in a draw with one goal on each side and per
match half. Iceland scored in the 2nd half in minute 50 with a goal by Bjarnason. The overall team
statistics for both matches are presented in Figures 23 and 24.
82
Social Network Analysis
83
Social Network Analysis
Considering the attacking performance indicators in Figure 24, an initial conclusion would be that
Iceland showed worse performance than Portugal or England. England and Portugal have both
more attempts for shot-on-goal, more corners and higher ball possession. Additionally, Iceland has
considerably less passes in both matches, although the long passes are similar to both England and
Iceland. All three teams preferred to avoid long passes, which means that they did not take ad-
vantage of quick attacking sequences, and preferred to be more conservative during attack. Consid-
ering the above charts, the goal of the analysis in this part is to investigate if there are some inter-
esting patterns or observations to be found in Iceland’s games by means of social network analysis
that could explain the good performance of Iceland in the games against Portugal and England.
In a first step, the team networks over the course of the game as well as the main network level
metrics are calculated. The goal is to gain an initial impression about the differences between the
teams in each match. The team network are presented in Figures 25 and 26 for the matches against
Portugal and England respectively.
84
Social Network Analysis
85
Social Network Analysis
The team networks show that Iceland’s players were well connected in spite of having less passes
than their opponents. The networks of both Portugal and England show a strong connection (thick-
er edges) between their defense players. This is not visible in Iceland’s case. In the match against
Portugal, Iceland seems to have stronger connections between midfield players, visibly more passes
between the midfielder Gudmundsson and the striker Bjarvasson. There is also a strong connection
between the goalkeeper and the striker Sigthorsson, which suggests that there were a few attempts
for a counter attack against Portugal, involving these two players. In the match against England, the
results are similar. There is a strong midfield and a visible triad involving two defenders – Sigurds-
son and Bjarnason, and a defender – Skulason.
Table 27 shows the results for the teams’ network metrics.
86
Social Network Analysis
Table 27. Team network metrics for both matches7
Clique Count 4 20 7 11
Density 0.633 0.423 0.617 0.494
Diameter 6 182 238 104
Diffusion 0.972 0.87 0.907 0.885
Hierarchy 0 0.154 0.143 0.154
Interdependence 0.014 0.022 0.015 0.02
Network Centralization-Betweenness 0.155 0.121 0.137 0.111
Network Centralization-Closeness 0.285 0.392 0.303 0.022
Network Centralization-Eigenvector 0.292 0.383 0.356 0.312
Network Centralization-Total Degree 0.177 0.068 0.161 0.139
Reciprocity 0.063 0.138 0.112 0.149
Transitivity 0.778 0.531 0.804 0.618
Iceland has the highest values for clique count in both matches. The difference is especially high in
the match with Portugal. The network density is in both matches lower than the opponents. This
contradicts to the findings of (Clemente, Martins, Kalamaras, Wong et al., 2015) who found that
teams who are successful tend to have higher density values than their opponents. Diffusion values
are also lower for Iceland but the gap is not significant. Both values are close to 1 which means that
nodes are closer together and not farther apart. In this case this means ball can travel well between
teammates in all teams.
What is rather surprising is that Iceland has low betweenness and degree centralizations in both
matches, compared to their opponents. Iceland’s closeness and eigenvector centralizations are
higher than Portugal’s, while in the match against England, which the team won, the values for
these metrics are lower. Iceland is also visibly different than its opponents in the interdependence
and hierarchy metrics, which are higher in both matches. The interdependence measures the extent
to which passes go in both directions (from player A to B and vice versa). For this reason, Iceland
has higher values for reciprocity. In general, when a network has higher reciprocity, it tends to be a
more stable network, where players tend to form dyadic relationships (Clemente, Martins et al.,
2016). In this case, what is interesting is that Iceland has lower values for transitivity, which means
players form less triadic relationships (there is a link between A and B, and B and C, but not be-
tween A and C), and prefer to pass the ball to the same players (hence the higher reciprocity). The
higher hierarchy value is thus also related to this, and means that in Iceland’s team there are a few
players more popular than the others, i.e. there is a hierarchical structure of relations and more
asymmetric connections.
7
The network metrics on team level are explained in Appendix D.
87
Social Network Analysis
The analysis on team level reveal some differences between Iceland and their opponents. However,
to gain a more clear picture of the differences which led to the goals scored in each match, a dy-
namic analysis at the network level is conducted in a next step. Specifically, the change in a few
selected metrics over the course of the game is investigated, as well as the change in the team net-
work topology.
Figure 27 and Figure 28 present the change in reciprocity and transitivity of Portugal and Iceland
respectively throughout the course of the match. The change in crucial time periods (goals scored
in minutes 30 and 50 are highlighted).
Figure 27. Transitivity and Reciprocity of Portugal’s network throughout the match
Figure 28. Transitivity and Reciprocity of Iceland’s network throughout the match
The figures above show that in the minutes before each team scored a goal, there is an increase in
reciprocity and decrease in transitivity for their own team, while the opposition has a decrease in
both reciprocity and transitivity. These results suggest that both teams took advantage of successful
dyadic relationships between their players, while triads were not preferred in both cases. One can
replicate the study over more data and see whether this is true for all successful teams. Triads give
an information on how the network as a whole is being held together and it leads to cohesion of the
network (Lusher et al., 2010), which is considered as a positive aspect in network level analysis.
However, in football, it may be that dyadic relationships are more important than triadic ones. The
88
Social Network Analysis
ability to form triads is one of the metrics that is also part of subgroup level of network analysis,
and it is thus discussed later with the network assortativity metric.
Formula
let A be the unimodal input network with N nodes
let d be the out-degree of each node
let CoVariance = ∑(Ai,j - didj/2m)*(didj)
let Variance = ∑(di,j*kroni,j - didj/2m)*(didj)
where kroni,j = 1 if i == j, else 0
Then Network Assortativity = CoVariance / Variance
The results for the assortativity metric for all three teams in the two matches analyzed, are present-
ed in Table 28.
89
Social Network Analysis
Table 28. Network assortativity for all teams in the analyzed matches
Match 1 Match 2
Subgroup metric Portugal Iceland England Iceland
Network Assortativity -0.053 -0.212 -0.004 0.164
The positive value of the network assortativity for Iceland in the match against England, means that
in this game, players who are similar to each other in terms of connectivity, tend to be clustered
together, i.e. they pass more to each other than to players with lower degree of connections. This
metric has not been investigated in previous research on SNA in football PA and there is, unfortu-
nately, no basis for comparison. Iceland’s positive value in a game they won, suggests that it may
be better that a team has positive values for this metric. To examine this further, the assortativity
metric for both England and Iceland throughout the game is presented in Figure 29 and Figure 30
respectively.
Figure 29. Network assortativity for England throughout the whole match
Figure 30. Network assortativity for Iceland throughout the whole match
The dynamic analysis for this metric give a more clear picture regarding the metric and its values in
a game. Not only does England has mostly negative values throughput the whole match but most of
the values are close to (-1). The only period of the match when their assortativity index was visibly
positive i.e. nearing (+1) value is during the 2nd half of the match around minute 65’ (21:20 hrs).
Iceland also has negative values, but during the 1st half of the game when both goals were scored
90
Social Network Analysis
there are two periods with a high positive value for assortativity. The first period is a few minutes
before the 2nd goal was scored, and the 2nd period is between minutes 30’ and 35’. Iceland and Eng-
land both have negative assortativity values when the opposition manages to score a goal. These
results suggest that a positive and higher values for assortativity contribute for a better team per-
formance in football i.e. players with high connectivity should connect to other players well con-
nected themselves.
Change detection
Part of the dynamic analysis of a network is the topic of change detection. Change detection is the
process of monitoring networks to determine when significant changes to their organizational struc-
ture occur and what caused them (McCulloh and Carley, 2008b). Different network level measures
over a period of time can be monitored and a control chart can be used to signal when significant
changes occur in the network (McCulloh and Carley, 2008b). Using a control chart is inspired by
the Statistical Process Control (SPC) technique. This is used in manufacturing processes for quality
control and monitoring the process stability. A process is said to be “in statistical control” if the
probability distribution representing a quality characteristic is constant over time; if there is some
change over time in this distribution, the process is said to be “out of control” (Woodall, 2000).
As mentioned in section 6.1, several studies have shown that the Cumulative Sum (CUSUM) as a
control chart is a reliable technique for change detection in network metrics over time. The calcula-
tion of CUSUM is presented in Equation 3.
Equation 3. Calculation of CUSUM
The decision rule of the CUSUM chart runs off the cumulative statistic
and the common choice for k is 0.5, which corresponds to a standardized magnitude of change
of 1. The CUSUM control chart sequentially compares the statistic Ct against a control limit A’
until Ct > A’. Since we are not interested in concluding that the network is unchanged, the cu-
mulative statistic is
91
Social Network Analysis
Equation 3. Calculation of CUSUM (Continued)
The statistic Ct+ is compared to the constant control limit, h+. If Ct+ > h+, then the control chart
signals that an increase in a network measure has occurred. Since this rule only detects increas-
es in the mean, a second cumulative statistic rule must be used to detect decreases in the mean.
which signals a decrease in a network measure’s mean when Ct- > h-.
One advantage of CUSUM is that it is able to detect small changes in the network. An argument
against it could be that sometimes changes develop quickly in football, and thus, detecting small
changes is not relevant. However, when things develop quickly, there is not much of an opportunity
for a quick counter action to it. If small changes are detected, they can be perhaps signals of more
drastic changes ahead (McCulloh and Carley, 2008b). This can be still of relevance in football per-
formance analysis. So far, the literature has not considered this. Thus, change detection is examined
further by the example of Iceland’s match against Portugal.
There are different metrics that can be considered when detecting a change in the network. Usually,
network level measures work better as metrics at node level need to be translated into a network
picture of the entire graph (McCulloh and Carley, 2008b). Furthermore, the metrics values are
normalized in order to be able to compare them across different time periods (McCulloh and Car-
ley, 2008b). In the current analysis, three metrics at the network level were chosen as they give an
idea on how the team works together as a whole, its structure and connectedness. These metrics
are: density, assortativity and efficiency. Density is a metric that is often used in football PA to
analyze the team performances. Earlier, the assortativity metric and its meaning was explained. The
efficiency metric shows the degree to which each component in a network contains the minimum
links necessary to keep it connected (ORA Documentation File, 2018). The resulting CUSUM cal-
culation for Iceland’s team (in the match against Portugal) is presented in Figures 31 and 32.
92
Social Network Analysis
It is of relevance to mention that change detection is not about predicting a change but determining
that a change has occurred quickly and being able to make some inference about the actual time of
change (ORA Documentation File, 2018). The figures above demonstrate that a change in the net-
works of both teams occurs between minutes 10 and minutes 25 (England) and minutes 5 and 20
(Iceland).
Network analysis allows for more in-depth studying of the networks in the suspected time period of
network change. By analyzing the networks further, as well as the opponent’s network, it is possi-
ble to gain a more detailed understanding of the changes that occur in players’ interactions espe-
93
Social Network Analysis
cially when there are important events happening in those same time periods. For this reason, two
networks are created for both teams and the timeframe is based on the CUSUSM charts for both
teams. The investigated time of the match is between minutes 0’ and 30’ for Englad and 0’ and 15’
for Iceland. The network topologies in each time frame and the network level metrics for each to-
pology are presented in Tables 29 and 30 for England and Iceland respectively. .
Table 29. Dynamic network topologies and network level metrics – England 0‘ – 20‘
England Deg Centralization 0.139
Out-Central 0.123
minutes 0 – 10 In-Central 0.370
Density 0.333
10 nodes, 30 links
Connectedness 1
Fragmentation 0
Closure 0.244
Avg Distance 1.822
SD Distance 0.676
Diameter 3
Breadth 0.359
Compactness 0.641
Small Worldness 0.947
Mutuals 0.178
Asymmetrics 0.311
Nulls 0.511
Arc Reciprocity 0.533
Dyad Reciprocity 0.364
Transitivity 0.244
Reciprocity: 0.3636
England Deg Centralization 0.489
Out-Central 0.440
minutes 10 – 20 In-Central 0.220
Density 0.400
11 nodes, 44 links
Connectedness 1
Fragmentation 0
Closure 0.462
Avg Distance 1.773
SD Distance: 0.734
Diameter: 4
Breadth: 0.328
Compactness: 0.672
Small Worldness: 0.146
Mutuals: 0.255
Asymmetrics: 0.291
Nulls: 0.455
Arc Reciprocity: 0.636
Dyad Reciprocity: 0.467
Transitivity: 0.462
Reciprocity: 0.4667
Note: A brief description of the metrics used in this table is included in Appendix F.
94
Social Network Analysis
Table 30. Dynamic network topologies and network level metrics – Iceland 0‘ – 15‘
Iceland
Avg Degree 0.900 Diameter 6
minutes 6 – 15 Deg Centralization 0.153 Breadth 0.816
Out-Central 0.136 Compactness 0.184
10 nodes, 9 links In-Central 0.136 Small Worldness 0
Density 0.100 Mutuals 0
Connectedness 0.344 Asymmetrics 0.200
Fragmentation 0.656 Nulls 0.800
Closure 0 Arc Reciprocity 0
Avg Distance 2.677 Dyad Reciprocity 0
SD Distance 1.553 Transitivity 0
Note: A brief description of the metrics used in this table is included in Appendix F.
95
Social Network Analysis
By conducting the dynamic network analysis in addition to the change detection, it is possible to
see how exactly the network of both teams have changed in the analyzed timeframes. The CUSUM
charts for both teams show an alert when there is a visible decrease in the efficiency value of the
network. The efficiency is the degree to which each node in a network contains the minimum links
necessary to keep it connected. For Iceland’s team it decreases drastically following their first goal,
while for England it decreases before the 2nd goal of Iceland. This suggests that after Iceland equal-
ized the score to 1:1, England’s some of England’s players are significantly disconnected from the
rest of their teammates. A closer look at the network level metrics from the time period before and
after the change detection for England, reveals that there is significant change in a few metrics.
Most significantly, the small worldness of the team decreases drastically in the second timeframe
of 10 to 20 minutes in the game. Thus, it can be concluded that England’s players are more dis-
persed and there is longer distance between the players i.e. if A is connected to B and C, there is
not necessarily a link between B and C as well (Gama et al., 2014). Therefore, it is understandable
that the efficiency of the network as a whole decreases as well as not all players are close to each
other.
In Iceland’s team the main change is increase in the degree centralization and specifically, the out-
degree centralization. This signifies that some of its players have become more prominent by pass-
ing more to other players. Consequently, this signals Iceland’s has significantly intensified its at-
tacking play. An information like this might be useful for the coach to try and react before another
goal is scored (as it happened in this case). This type of analysis shows that network metrics should
never be considered in isolation but only in combination with additional analysis, to avid mislead-
ing conclusions.
Community detection - As mentioned in section 6.1, community detection in social networks is a
problematic area of research simply because of various definitions of community as well as algo-
rithms that detect such communities. In football, a community can be defined as a smaller group of
three to five players that are more closely connected with each other than to the rest of their team-
mates. The literature on football performance analysis and SNA, has mostly relied on cliques and
clustering metrics to determine the cohesion of the team as a whole. There is little research on
community detection in football teams. Therefore, below several algorithms are used to calculate
such communities in the teams of Iceland and England.
96
Social Network Analysis
Table 31. Community detection in the teams of Iceland and England
Six different algorithms are used for community detection and detecting smaller and cohesive
groups in England and Iceland’s teams. The results from these analyses demonstrate that England’s
team does not tend to form clusters as much as Iceland’s team. Most of the algorithms clustered all
team members of England’s team together, except for the k-Means, CONCOR and Newman clus-
tering algorithms. The Dense Subgraph Extraction and the Girvan-Newman algorithms does not
seem to be suitable for community extraction in football teams as they tend to cluster the whole
team together. Results were similar for the match between Iceland and Portugal as well. In general,
97
Social Network Analysis
with only 11 players, a football team tends to have high clustering tendency, and in most cases,
connections exist between all players. Thus, community detection algorithms seem to not be help-
ful for detecting closely connected subgroups of players. Techniques such as the core network pre-
sented earlier, clique counts or triads seem to be more useful.
In a next step, triads are calculated for the teams of England and Iceland. In simple terms, a triad is
three actors and the relations between them (Lusher et al., 2010). It is an important approach to
study social structures as it goes beyond the dyad level and may indicate how the network as whole
is held together (Lusher et al., 2010). The formula for this calculation is presented in Equation 4.
Equation 4. Triad count formula
Formula
let A be the binary, unimodal input network
The results for both teams are displayed in Table 32. The types of triads and the triad count is cal-
culated for both teams, as well as the percentage of each triad type.
Table 32. Triad types and count for England and Iceland
Iceland England
18 8.18% 21 5.77%
21 9.55% 56 15.38%
6 2.73% 4 1.10%
2 0.91% 1 0.27%
6 2.73% 4 1.10%
19 8.64% 19 5.22%
98
Social Network Analysis
Table 32. Triad types and count for England and Iceland (continued)
Iceland England
6 2.73% 1 0.27%
1 0.45% 1 0.27%
24 10.91% 29 7.97%
9 4.09% 3 0.82%
7 3.18% 8 2.20%
14 6.36% 14 3.85%
38 17.27% 61 16.76%
There are significant differences between both teams. England has the highest amount of type 16
triad, while Iceland’s players participate mostly in type 15 triad. Both teams have a lot of connec-
tions from type 3 triad, which is practically a dyad.
Table 33. Popular triads in Iceland and England’s teams
By studying the triadic relationships between players it is possible to gain additional information on
the exact way players interact with each other. Obviously, dyadic relationships are crucial for the
success of a team, however, players display more complex interactions than the dyadic level which
is mostly studies in existing literature.
99
Social Network Analysis
6.6 Discussion
The results demonstrate how network analytics techniques can be used for performance evaluation
in football as well as to make tactical adjustments during the game or prepare for an upcoming
match.
At player level, the analyses in the current thesis use the centrality metrics most often mentioned in
the literature, but also extend them by adding two new metrics (authority and contribution). The
results are used to demonstrate how such analysis at player level can support decision makers in
football during live matches in deciding which player should leave the game. According to the
findings based on network metrics, Rooney leaving the game against Russia was not a wise deci-
sion as the impact on the network was significant. Furthermore, additional calculations based on
network metrics, such as immediate impact and core network, as well as dynamic analysis of the
metrics over a period of time, can show not only who are the key players but this can also be used
to decide which players should not be considered for substitution. These type of analyses have not
been applied in previous research.
Team level analysis can be used to explain good or bad performances of teams. The results show
that by calculating change detection, network metrics in key timeframes of the game, as well as
triadic relationships, it is possible to understand better how players connect with each other. This
can be used for decisions both pre- and during the match. In this case, Iceland is the team with less
ball possession, and thus lower number of passes overall. However, their performance is better than
or at least as good as their opponents’. Iceland takes advantage of successful dyadic relationship
between its players and thus, it is the team with higher reciprocity. On the other hand, the team has
lower transitivity as players participated in less triadic relationships than their opponents. If deci-
sion makers know that Iceland’s players are successful due to these dyadic relationships, such
analysis can support real-time tactical adjustments based on this information as well as which play-
ers are involved in the dyads. Therefore, relying on dyads could also increase Iceland’s vulnerabil-
ity if such information is to be used by their opponents.
Similarly, by knowing the interaction patterns between players of a team, coaches can prepare bet-
ter against the opposition by having more insights on the opposition’s tactical strengths and weak-
nesses. They can make more informative decisions on the roster for the next match. By knowing
who the key players are and with whom they interact mostly, specific strategies can be devised on
how to intercept those interactions during the match.
Finally, in a football scenario, change detection techniques can be used for a real-time system sig-
naling that a change has occurred in the team. With various network metrics available, different
aspects of interest regarding the team’s interactions can be implemented in such system. It can be
used to alert the coach and their team on sudden changes in their team as well as their opponent.
The implementation of such techniques in real-time would not be straightforward, however. There
100
Social Network Analysis
is an overall difficulty in using network metrics for explaining team and player performances relat-
ed to the wide range of metrics available and the lack of methodology and guidelines for their ap-
plication in practice.
The dynamics in the interactions between the players are analyzed with the help of metrics and
algorithms from the area of network science. Static network analysis only focuses on calculation of
the various network metrics which give a summarized view of the player interactions in a match.
The thesis shows how in depth analysis of the metrics change can answer important questions dur-
ing the live match – such as, “which player should be replaced?”, “who are the key players in the
team?”, “what impact would a replacement of a player have on the team?”, “in which way do the
players interact the most?” (the amount and direction of passes analyzed between the players
through the use of triads), “what is the team network like before and after a player replacement/or
before and after a goal or another key event?”, “when does a change occur in the team?”. These are
all questions that can be answered by means of network analysis focused on the dynamic interac-
tions between the players in different time periods throughout the game.
101
SOMs
7.1 Method
An unsupervised neural network model that is applicable for performance analysis of football as a
complex dynamic system is the Self-Organizing Map or SOM. SOMs have been proposed in the
1970s by Willshaw and Von Der Malsburg, who used the method to model neurobiological phe-
nomena in animals (Bonaccorso, 2018). However, SOMs were made popular by the Finnish scien-
tist Teuvo Kohonen in the late 1980s, and his particular SOM model is also known as Kohonen
feature maps (KFM) or simply Kohonen network(s)8. This method is used for converting complex
multi-dimensional data into a simpler (and usually) two dimensional map. It is a powerful tech-
nique for clustering records based on hidden patterns in the data.
8
Whenever a SOM is mentioned it is meant a Kohonen SOM.
102
SOMs
A SOM has an input and an output layer with no hidden layers in between. It operates on the “win-
ner-takes-all” principle as during the training phase, all units in the output layer are equally excited
by an input signal, but only one unit will produce the highest response (Bonaccorso, 2018). That
“winner” unit then becomes a candidate to be the recipient of that specific pattern (Bonaccorso,
2018). Thus, a SOM structures the output nodes into clusters where similar nodes are in close prox-
imity, while nodes that are different (i.e. recognize different patterns) are farther apart from each
other on the two dimensional map (Larose and Larose, 2015). A typical SOM architecture is pre-
sented in Figure 33.
As seen in Figure 33, a SOM is a feedforward network, which means that every node in the output
layer is connected to all input nodes. However, output nodes are not connected to each other. Input
signals are passed from an input node to all of the output nodes and each output node competes to
be the “winner”. This is determined based on a scoring function, such as Euclidean distance
(Larose and Larose, 2015). The value of an input node together with the assigned weights of the
connection will determine the values of the scoring function for a particular output node. The out-
put node with the highest value of the scoring function will be the winning node (Larose and
Larose, 2015). This node will then have its connection weight adjusted by a factor determined by
the learning rate parameter (Bigus, 1996). In addition, the weights of the neighborhood nodes of the
“winner” are also readjusted and so the whole neighborhood moves closer to the input pattern (Bi-
gus, 1996). In the beginning, the weight values of the connections between the nodes in both layers
are randomized. This is similar to the backpropagation algorithm. As training progresses, the size
of the wining node neighborhood decreases – smaller number of output nodes is being updated, and
103
SOMs
in the end of the training process, only the winning node is adjusted (Bigus, 1996). Once the train-
ing is completed, a new instance (observation) will activate only one neuron on the map – the neu-
ron whose weight vector is closest to the input vector (Géron, 2017). Thus, this method is especial-
ly suitable for clustering or dimensionality reduction. When used for pattern recognition, the types
of patterns recognized by the SOM will depend on the data used for training (Perl et al., 2013). For
example, if the training data consist of interactions, consisting of movement data from two teams
and the ball, the patterns will show typical interactions (Perl et al., 2013).
Based on the above, a SOM is characterized by three processes:
1. Competition – due to the output nodes competing with each other to produce the best value
for a particular scoring function.
2. Cooperation – the adjacent nodes of the winning node also are “rewarded” and share the
“excitement” earned by the winning node.
3. Adaptation – refers to the adjusted weights of the neighboring nodes of the winning node.
This is part of the learning process.
The Kohonen SOM algorithm consists of the steps described in Equation 5.
A Kohonen feature map has some limitations. First, it usually requires a lot of training data, and
second, continuous training is not possible – once the training process is finished, it cannot be re-
started (Perl and Dauscher, 2006). This is due to the learning process being controlled by an exter-
nal algorithm with parameters that have final values, which means that eventually the learning pro-
cess will end (Perl and Dauscher, 2006). Once the learning process is closed, the network can only
be used for testing and is not able to learn any new patterns (Perl, 2002). To deal with these limita-
tions, an alternative type of Kohonen Feature Map was developed by the Institute of Computer
Science at the University of Mainz in Germany. This type of KFM is known as Dynamically Con-
trolled Network or DyCoN. A DyCoN consists of a conventional KFM combined with a time-
independent neuron-driven control (Perl, 2001). Each neuron in a DyCoN contains an internal
memory and a self-controlling algorithm. Thus, a DyCoN has no final state and can always adapt
its internal memory to new input – it can learn continuously over time; can continue learning pro-
cesses after interruptions, and it can learn in separate phases (Perl, 2002, 2004). A DyCoN needs
only some hundred data to recognize a pattern, compared to a conventional KFM which needs 10
000 to 20 000 (Perl, 2001). However, as the DyCoN model is used commercially, technical infor-
mation is not publicly available, and it is only used in publications by the members of the Comput-
er Science department at the University of Mainz – mainly Prof. Juergen Perl. Other scholars, how-
ever, use the conventional KFM algorithm for studying sports behaviors. An overview of the stud-
ies in football behavior based on the dynamic system theory and SOM is given in the next section.
104
SOMs
Equation 5. Kohonen feature map algorithm
where k = 0,1, … denotes the discrete time steps, α(k) is the learning rate, and ˄(i, c) is the
neighborhood function of the winner.
Neighborhood function:
In the steps described above, the neighborhood function, ˄(i, c), equals 1 for i = c, and falls off
with the distance || rc – ri || between node i and the winner c in the output layer, where rc and ri
denote the coordinate positions of the winner c and node i in the output layer, respectively.
Thus, nodes close to the winner, as well as the winner c itself, will have their weights change
appreciably, whereas those further away, where ˄(i, c) is small, will experience little effect.
The original neighborhood function, as defined by Kohonen (1988) was the squared neighbor-
hood function, defined as
where Nc(k) is a decreasing function of time. Its value is usually large at the beginning of the
learning and shrinks as training progresses. Frequently used in practice has also been the bell-
shaped neighborhood function by Kohonen (1990):
˄(i, c) = exp (–|| rc – ri ||2 / 2σ2 (k))
Where σ(k) is the width parameter that affects the topology order in the output map and is
gradually decreasing during training.
Learning rate:
The learning rate is denoted with α(k) in the above description of the SOM algorithm. It is es-
sential for convergence and it should be largen enough so that the network could adapt quickly
to the new training patterns. It should, however, be small enough so that the network does not
forget experience from past training patterns. It varies from 0 to 1. If α(k) = 0, there is no up-
date; if α(k) = 1, Wc becomes X.
Source: Si et al., 2003, p.58
105
SOMs
7.2 Related work
In sports science, ANNs, and specifically, SOMs, are usually applied in the area of movement pat-
tern analysis. What is common in those studies is that they are characterized by time-dependent
behavioral processes, which can be classified by an ANN (Perl and Dauscher, 2006). Each neuron
in the Kohonen map will then represent a type of process, and each cluster will represent a class of
similar process types (Perl and Dauscher, 2006). A process analysis in sport is the analysis of time
series of positions, constellations, or tactical patterns in games or of positions, angles, or speed of
articulations and limbs in motions (Grunz et al., 2009). Such analyses are characterized by the
complexity and the dynamics of the data, which make it difficult to use conventional statistical
approaches to detect patterns (Grunz et al., 2009).
In sports science, SOMs have been usually applied for movement pattern analysis.
Other than movement analysis, SOM has also been applied for tactical PA in various sports. For
instance in rugby, Croft et al (2015) applied a SOM neural network on OPTA data in order to find
out which performance indicators discriminate best between successful and unsuccessful outcomes.
As input data the authors used the frequencies of common performance indicators in rugby. In
handball, Pfeiffer and Perl (2006) trained a DyCoN network with offensive attempts (processes)
from all teams in order to coin offensive attempt patterns. The authors showed that a neural net-
work can be used to identify typical tactics of handball teams. In a study of basketball, two differ-
ent types of SOMs, a hierarchical model (DyCoN) and recursive model in the form of merge self-
organising map (MSOM), were applied on player tracking data to test whether an automatic recog-
nition of tactical behaviors (fastbreak, high-pick and horns) is possible (Kempe et al., 2015). The
authors concluded that both SOM architectures achieve high accuracy, although, not surprisingly,
the DyCoN resulted in higher accuracy (97%).
In football, there has not been a wide use of SOM for performance analysis. One reason for this is
most likely the inability, until recently, to collect detailed datasets from a football game. As this is
no longer the case, one could reasonably expect that in the future, there will be more studies taking
advantage of the SOM method for football PA. Another reason could be that ANNs are known as a
black box method. They produce an output without explanation, which makes the interpretation
harder. Thus, a considerable effort and experience is required from the researcher to explain the
“reasoning” of the network (Dutt-Mazumder et al., 2011). Very few performance analysts have the
required skills to achieve this (Dutt-Mazumder et al., 2011). Therefore, it is not unusual to come
across studies in football PA done by scientists from various disciplines, mainly sports and com-
puter scientists. Studies that have applied SOM in football are briefly mentioned below.
Grunz et al. (2009) use constellations (collections of the player positions of offensive and defensive
groups) from the world championship final in 2006 as input data for training a neural network. This
results in a neural network with neurons representing the different constellations. Then, those con-
106
SOMs
stellations or specific sequences of them can be matched to categories of situations or tactical units
from a soccer category system (Grunz et al., 2009). The authors discuss the difficulty of this type of
analysis in football, as compared to basketball, for example. Specifically, they briefly touch the
problem that a SOM has a fixed dimension and cannot work with vectors of different length, which
happens when the data consists of constellations in which different numbers of players are involved
– which is very usual in football. Therefore, they choose a solution in which they work with a fixed
number of players, representing offense or defense players. This type of analysis, although valid,
does not seem to be very practical. In a more recent publication, Grunz et al. (2012) use positional
player data to automatically classify long and short game initiations in football. The authors define
a long game initiation if the first pass after winning the ball is longer than 30 meters. Otherwise, it
is considered a short game initiation. A DyCoN network is trained and the results show that it is
possible to use a hierarchical SOM network for automatic detection of tactical patterns in football.
Their approach detected 84% of all game initiations. They use the sliding window technique
though, which makes the analysis rather cumbersome. Bartlett et al. (2012) though not using SOM
in their study on coordination dynamics between opposing teams in open play attacks, suggest that
SOMs can be used to analyze multidimensional coordination of groups of more than two players.
As input data, the authors suggest the use of player trajectories along and across the pitch or the
various types of attack; or alternatively, some inter-player measure, such as the stretch index (cal-
culating this measure with the OPTA data is not possible, however)9. Finally, Perl and Memmert
(2016) use the DyCoN network to analyze the formations of tactical groups where the positions of
the players are condensed to those of tactical groups, and the formations of the tactical groups are
mapped to a small number of characteristic patterns. The idea is to reduce the team’s activities to a
smaller number of tactical patterns, which in turn would make it easier to detect regular or strik-
ing/unusual tactical features.
Based on the above, the few studies that used SOM for tactical analysis in football, are largely pub-
lished by the same group of authors, who have access to DyCoN and other proprietary tools used in
some of their papers (like, for instance the SOCCER tool also created by the group of Prof. Juergen
Perl). Obviously, the results of these studies are interesting to read, however, the contribution, es-
pecially to their practical application is questionable. Evidently, SOM have a high potential and can
be a useful tool for performance analysis in football. A SOM could potentially allow researchers to
objectively quantify the skill level of teams or players in a game; it could also reduce time and ef-
fort to convert recorded data into useful tactical information, so that such analyses could be done
even during the half time break during a football match (Dutt-Mazumder et al., 2011; Grunz et al.,
2012). SOMs are a valuable technique to reduce high-dimensional datasets in low-dimensional
relevant information (Dutt-Mazumder et al., 2011). The method can be very useful in an initial
9
Stretch index is defined as the average radial distance of the players’ positions to the team centroid Benito
Santos et al. (2018).
107
SOMs
exploratory analysis of the data and gaining some insights via their map visualizations. Therefore,
it is interesting to explore this method further especially having the opportunity to train a SOM
network on a real player tracking data by OPTA.
108
SOMs
added to this group as based on their track record, no one expected these teams to achieve success
in the competition. Portugal, although perhaps not a typical favorite team, was classified as such as
according to the FIFA rankings before the championship took place, the national team was ranked
8th in the UEFA teams list (FIFA, 2016a). And the team does have talented players who compete in
the biggest European clubs, as well as the World’s best player, Ronaldo. Similarly, Croatia was
ranked 23rd right before Wales in the FIFA ranking list, therefore their team was also classified as
an underdog. The final list is presented in Table 34.
Table 34. List of underdog and favorite teams and their market values (prior to Euro 2016)
Favorite teams Market value [mil. €] Underdog teams Market value [mil. €]
Switzerland 175 €
Wales 170 €
The table above also shows the market values10 of each team. There is a clear difference between
the teams in both groups. The average market value of the favorite teams is 456 (in million EUR)
while the average value of the underdogs is 123 million euros.
Kohonen R package – offers various functions for self-organizing maps with focus on vis-
ualization. It has a function for the standard SOM, supervised SOM, and supersom – a
SOM with multiple parallel maps (Wehrens and Buydens, 2007).
Gsom package - A growing self-organizing map (GrowingSOM, GSOM) is a growing var-
iant of the popular self-organizing map. It was developed to address the issue with finding
10
Market values were extracted from Transfermarkt.de.
109
SOMs
a suitable map size. It starts with a minimal number of nodes and grows new nodes on the
boundary based on a heuristic. (Hunziker, 2017)
The final dataset consists of all performance indicators listed under the variable type_id in OPTA
which includes all events that happened during the game, including direct-play performance indica-
tors such as pass, goal, tackle, corner, keeper punch, etc. The table below gives an overview of the
final dataset.
Table 35. Extract from the dataset used for training the SOM
Match data Event types
Match Team
Result Pass Offside Pass Take On Foul Out Corner Awarded …
ID ID
Note: The variable “Result” shows whether the team lost (0) or won (2).
As seen, each row of the dataset represents the aggregated OPTA statistics per match and per team
(thus, for each match included, there are two rows). In total, there are 47 different event types ana-
lyzed for each team.
110
SOMs
growing SOM. An ideal size for SOM nodes is such that not too many or too few samples (rows
from the dataset) are in each SOM node/neuron. Both experiments have a 2x3 SOM grid.
The training progress of the SOM clustering is presented in Figure 34.
The chart above represents the training progress of a SOM network. The measure of how well a
SOM network has been trained is the distance between node weights and the respective data points
included in a specific node. In other words, with the training process, the aim is to minimize this
distance. Once this distance has reached a kind of minimum plateau (i.e. the distance does not de-
crease further), the training process has been completed. The visualization shows the training pro-
cess for the Underdogs Win experiment. As the training iterations go on, the distance decreases,
meaning the model is learning. It shows that after 100 iterations, the mean distance has fallen to
about 0.065 and is not decreasing further.
111
SOMs
Table 36. Matches considered in Experiment 1
The below visualizations represent regular (i.e. not normalized) results from the SOM analysis of
matches where the underdog team wins. For example, the visualization for the variable “Result”
represents the ordering/clustering of win, loss, or draw per team per match. The red color nodes
represent teams that have noted more favorable results, while the blue color represents teams that
tended to accumulate more losses.
This type of visualization allows for a quick and intuitive inspection of the SOM analysis. For ex-
ample, only by comparing the Result and Pass grids, one can see that those underdog teams that
had less passes also tended to end the game with a favorable result. The red colored nodes on the
cluster grid for “result” are colored blue in the “pass” grid. Red means higher values while blue
denotes lower values for the given attribute. In case of the connection between passes and final
outcome, the result is not surprising as underdogs usually have lower ball possession rate than fa-
vorite teams.
112
SOMs
113
SOMs
The table below contains a summary of the clustering results and shows which events exactly have
a positive or negative influence on the final outcome of the underdog games in which they win.
Table 37. Influence of event type on match outcome for underdogs
Positive Negative
Take On Pass
Foul Offside Pass
Out Interception
Tackle Save
Goal Corner Award
Card Player Off
Aerial Claim
Keeper Sweep Clearance
Contentious Referee Decision Challenge
Ball Touch Ball Recovery
Smother Player On
Formation Change
Offside Provoked
Keeper Pickup
114
SOMs
Table 38. Matches considered in Experiment 2 (continued)
The results from the SOM clustering analysis involving the winning matches of the favorite teams
are presented in Figure 36.
115
SOMs
The table below summarizes the events that have a positive and negative influence for the winning
matches of the favorite teams.
Table 39. Influence of event type on match outcome for favorite teams
Positive Negative
Pass Interception
Take On Save
Foul Claim
Miss Tackle
Blocked Pass Keeper Pickup
Formation Change Card
Chance Missed
Ball Touch
Contentious Referee Decision
Keeper Sweep
Offside Provoked
Attempt Saved
Goal
7.6 Discussion
From the analyses presented in the previous section, it is possible to determine which types of
events have a positive or negative influence for the outcome of the matches for favorite and under-
dog teams. Events that are positive for the match outcome only for underdog teams are: Out, Aeri-
al, Smother. This means that winning aerial fights is crucial for underdog team performance.
Smother is the event when a keeper comes out and gains possession of the ball in the box. This
usually happens during corners or high balls in the box meant to be played by head. Therefore, an
active keeper who can win these aerial balls is of benefit to an underdog.
Events with a positive influence on the match outcome of favorite teams are: miss, blocked pass,
chance missed, attempt saved. It is interesting to note that both favorites and underdogs tend to
commit more fouls when winning, but only the underdogs get more yellow or red cards. The event
miss may be positive only for favorite teams because they create more chances than the underdogs.
The events blocked pass and attempt saved suggest that an active defense, i.e. a defense style that
challenges the attackers, is required for victory.
What is positive for both types of teams are the following events: take on, foul, goal, keeper
sweep, contentious referee decision, ball touch. Keeper sweep is an event where the goalkeeper
leaves the 11-meter box, which suggests that in this case the winning team is also the one attacking.
Other variables also suggest that the team that takes the initiative and plays more aggressively (take
on/dribble, foul, ball touch-that’s a touch with an arm) secures a favorable outcome.
116
SOMs
Events that have a negative influence for the outcome of the matches for both favorite and under-
dog teams are: interception, save, claim, keeper pickup. Keeper pickup is an event where the
goalkeeper picks up the ball with a hand. This event happens frequently when a team plays defen-
sive football, and defense players interact with the keeper in order to prolong ball possession.
Looking at keeper events such as sweep and pickup clearly suggests that a defensive tactic is a bad
decision for underdog teams. This is further confirmed by the fact that more saves suggests a loss
(more saves implies again a defensive tactic).
More Tackle and Card are positive for underdogs, and negative for favorites. Clearly, a game
where individual underdog team players challenge the opponent players more through tackling, as
well as more physical play which would result in more cards, pays off for underdog teams. On the
other hand, registering more passes, offsides provoked, as well as trying out formation changes, are
positive for favorites, and negative for underdogs. The different influence of the number of passes
on favorite and underdogs’ success simply relates to the fact that underdogs have lower possession
rates in their winning matches. Similarly, the offsides provoked having a positive influence for fa-
vorites and negative for underdogs, is due to the more chances created/missed by the favorite
teams, while formation changes shows that underdog teams tend to stay with a tactic and not exper-
iment too much throughout a game. This has brought them success especially if one considers Ice-
land’s decision to stick to the traditional, and what some consider an outdated strategy, the 4-4-2.
Based on the results presented so far, it can be concluded that an analytics method like SOM can be
used to gain a quick understanding of the tactical aspects of a team and the dependencies between
the attributes in a dataset. The frequencies of events are important but there are many other factors
that are challenging to be analyzed quickly, which is important especially in a real-time scenario.
SOM could be used for tactical analysis of the opponent team and readjustment of the strategy and
tactics in the break before the 2nd half. It can also be a valuable tool to study an opponent before the
game itself. By analyzing the match data of the opponent, it is possible to find patterns in their be-
havior otherwise not visible to the naked eye. Furthermore, the results from these analyses show
that SOM as a method has advantages over linear methods because various different types of be-
haviors expressed through the occurring events and their frequencies can lead to different out-
comes. The decision to classify similar pattern is not either – or and thus, linear methods are not
suitable for this purpose.
Future research could focus on analyzing specific games versus specific opponents, or with a larg-
er dataset it is possible to analyze in more detail successful versus unsuccessful teams or alterna-
tively analyzing player behavior is worth mentioning as well. However, larger dataset is required
for this purpose to reveal meaningful patterns and make generalizations. Furthermore, pattern clas-
sification by SOM can cluster similar game processes in a match that are hidden in the large num-
ber of complex variables (Dutt-Mazumder et al., 2011).
117
SOMs
The limitation of SOM as an analytics tool is the level of intuitive interpretation i.e. the ease of use
of the self-organizing maps. As neural network methods are famously known as black box methods
because of the inability to understand their reasoning behind the results, one could assume that for
coaches and their team it is not the most intuitive way to conduct performance analysis. However,
considering that almost all teams from the premier leagues and even the second leagues have in-
house analysts, the interpretation should not be a problem.
118
Process Mining
8 Process Mining
Analyzing the tactical behavior in team sports is of paramount importance in sports performance
analysis. As discussed in section 2.4., the individual actions performed are of interest when analyz-
ing the team’s tactics. For quite some time, the action frequencies by teams and players has been
the only way to gain insight into this performance aspect. However, this is not enough to gain a
complete picture of the performance, and especially the tactical behavior. Therefore, action/event
sequences have been suggested for deeper insight into the game. One reason mentioned by Carling
et al. (2008) is that “on-the-ball” activity, physical contact and the sequence in which these actions
occur contribute to physiological energy expenditure. This means that in addition to tactics, se-
quential analysis could give insight into the player fatigue. Action sequences are chains of sequen-
tial single actions during a game (Schrapf et al., 2017). As the OPTA data is based entirely on
event or action data, with timestamps and positional coordinates available, it is especially suitable
for this type of analysis.
In this chapter, a novel technique for sequence analysis of event data is suggested and its ad-
vantages and disadvantages for decision making in football are presented and discussed.
8.1 Method
Process mining aims at discovering, monitoring and improving real processes by extracting
knowledge from event logs (van der Aalst, 2011). As a discipline, process mining sits between, on
the one hand, machine learning and data mining, and on the other hand, process modelling and
analysis (van der Aalst, 2011). Some of the answers which process mining can deliver are: (a) what
really happened, (b) why did it happen, (c) what is likely to happen in the future, (d) when and why
do organizations and people deviate, to name a few (van der Aalst, 2011). The pioneer of this tech-
nique is Prof. van der Aalst from the TU Eindhoven in The Netherlands. When he started with pro-
cess mining in the early 2000s, most people were saying that there is no data for automated process
discovery (Rozinat, 2011). This has changed over the course of a decade, and currently in the Big
Data era, companies can be at times overwhelmed with the amount of data they collect from all of
their business operations.
There are different algorithms used in process mining, depending on the data available and the
questions that need to be answered – heuristic, genetic. Irrespective of this, process mining requires
structured data, and specifically, event logs of business (or other) processes. The goal is to analyze
event data from a process oriented perspective (van der Aalst, 2011). An example of a typical event
log necessary for process mining is presented in Figure 37.
119
Process Mining
The event log in the figure above contains information from a call center. Each row represents an
event, and each event corresponds to an activity executed in the process (names for different pro-
cess steps or status changes that were performed in the process); multiple events are linked together
in a process instance or case (necessary to distinguish different executions of the same process);
each case forms a sequence of events which are ordered by their timestamp (Rozinat, 2012). Usual-
ly, an event log will have additional information about the activities in the process, as for instance,
above the columns “service line” and “urgency”. These columns are called attributes. They are not
mandatory, but if available give more detailed information on the processes. As a process is a se-
quence of steps, in order to successfully apply process mining, the event log must fulfill the three
minimum requirements – it must have case IDs (e.g., customer number, order number, patient ID),
timestamps, and activity columns (Rozinat and Gunther, 2015), as highlighted in Figure 37.
Event logs can be used to conduct three types of process mining: discovery, conformance checking
and enhancement (van der Aalst et al., 2012). These are presented in Figure 38.
120
Process Mining
The most often used type of process mining is discovery (van der Aalst et al., 2012). As the name
suggests, this techniques converts an event log into a process model, without any a-priori infor-
mation (van der Aalst, 2011). The discovered model can be in the form of a Petri net, BPMN, EPC,
or UML activity diagram, but it can also be a social network model, depending on the perspective
needed (van der Aalst et al., 2012). Conformance checking uses an event log and a model as in-
puts. It is used for finding discrepancies between the reality (event log) conforms to the model and
vice versa (van der Aalst, 2011; van der Aalst et al., 2012). The third type, enhancement, also uses
an event log and a model as an input, but the information form the event log is used to improve the
existing process model (van der Aalst et al., 2012).
Finally, process mining may refer to different perspectives of the analyzed processes. These are
explained below:
Control-flow perspective – ordering of activities. Its goal is to find a good characterization of all
possible paths by deriving a process model that provides the best summary of the flow followed by
most or all of the cases in the event log (ProM, 2017a). It can answer questions such as:
Which tasks precede which other ones?
Are there concurrent tasks?
Are there loops?
There are several options for analyzing the case-flow. Some of the algorithms that can be used are
the Alpha algorithm, the Heuristic Miner, Fuzzy Miner, Inductive Visual Miner, among others. A
short comparison of these algorithms is presented in Table 40.
Table 40. Comparison of mining algorithms
Algorithm Input Output When to use
Alpha Miner Event log Petri Net Not recommended for real-life data.
Heuristic Miner Event log Heuristic net For real-life data with not too many different
events.
Fuzzy Miner Event log Fuzzy Model For complex and unstructured log data or for
simplification of the model.
Inductive Visual Event log Petri Net or For discovering process delays, deviations, and
Miner Process Tree animation of the model.
Source: Self-compiled based on Rozinat, 2010; Leemans et al., 2014
The answer of which algorithm should be used in a specific case is not a straightforward one. The
above table provides a starting guideline, but there are other options in ProM, and best is to test
various algorithms and inspect the results. As seen from Table 40, the Alpha algorithm, which was
the first process mining algorithm developed, is not recommended for analysis of a real-world
event log data. The Heuristic Miner was developed following the Alpha Miner to address its defi-
ciencies and is therefore also able to simplify the process model abstracting exceptional behavior
and noise - by leaving out edges (Rozinat, 2010). This algorithm is able to detect short loops and
skipping of activities. However, it still shows rather complex process models (Buijs, 2017a). The
121
Process Mining
Fuzzy Miner interactively simplifies the process model by hiding some activities and paths, if de-
sired (Rozinat, 2010). The Disco tool relies on this algorithm for deriving process models.
Organizational perspective – focuses on information about the resources, which can be people,
departments, roles etc., and how they are related with each other. This relationship can be also rep-
resented as a social network based on the activities of the resources and can be used to find interac-
tion patterns or evaluate the role of individuals (RapidProM, 2017). There are three options which
can give answers regarding these questions in the ProM tool. These are the Social Network Miner,
which creates a social network model of the event log by using various metrics, the Dotted Chart,
which gives an overview of working patterns, or time and date patterns, and the Inductive Miner,
which shows how cases are handed over between activities (Buijs, 2017b).
Social network mining is the most useful in the case of the organizational perspective, since net-
work science is an area that studies interactions and relations between individuals. To discover so-
ciograms from event logs, there are a few categories of metrics that have been developed (see Ta-
ble 41).
Table 41. Types of social network metrics used for analyzing relationships from event logs
Metric category Definition Examples of metrics
Metrics based on (possible) Analyze how work moves among performers. Handover of work (HoW)
causality
Subcontracting
Metrics based on joint cases Count how frequently two individuals are per- Working Together
forming activities for the same case.
Metrics based on joint activities Focus on the activities performed by individuals - Similar task metric
> people are more similar if they perform the
same activities.
Metrics based on special event Consider the type of event. Reassignment
types
Source: Self compiled based on van der Aalst et al., 2005
The metrics based on (possible) causality consider how work moves among performers (van der
Aalst et al., 2005). In a football game, it will consider the flow of events between the players. For
instance, there will be a Handover of Work, or HoW, between two players, if there are two subse-
quent activities/events where the first is completed by player A and the second by player B. In ad-
dition to a direct succession, it is also possible to analyze “indirect succession using a “causality
fall factor” β, i.e., if there are 3 activities in-between an activity completed by i and an activity
completed by j, the causality fall factor is β3” (van der Aalst et al., 2005, p. 9). The subcontracting
metric counts the number of times when player B executed an activity in between two activities
done by player A. For instance, Player A -> Player B-> Player A. This could indicate that work was
subcontracted from Player A to Player B.
Metrics based on joint cases, as the name suggests, ignore the causality and simply count how of-
ten individuals are performing activities within the same case i.e., sequence of activities (van der
Aalst and Song, 2004). Thus, the metric Working Together, will show which players most often
122
Process Mining
participate or “work together” in the same ball possession sequence. If two individuals often work
together on cases, they are considered to have a stronger relation than individuals rarely working
together (van der Aalst et al., 2005; van der Aalst and Song, 2004).
Metrics based on joint activities consider the activities performed by the individuals in general,
regardless of whether they work on the same case or not (van der Aalst et al., 2005). If two players
are engaged in the same activities, they are considered to be more similar to each other than if they
would execute different activities. One such metric is the Similar Task metric. The way this metric
is calculated is in two steps. First, a matrix of the activities by each individual is created, and sec-
ond, the distances between the individuals are calculated. Various distance metrics are available in
ProM – Euclidian or Hamming distance, correlation or similarity coefficient.
Metrics based on special event types consider the type of event, as sometimes, there can be events
such as reassigning an activity to someone else (van der Aalst and Song, 2004). A reassignment
occurs when i frequently delegates work to j but not vice versa, and thus, it is likely that i is in a
hierarchical relation with j (van der Aalst et al., 2005; van der Aalst and Song, 2004). However, the
information needed for this type of metric is rarely available in an event log (Buijs, 2017b). The
OPTA logs also are not suitable for calculating the reassignment between players. However, it is
possible to calculate the first three categories of metrics. In the current thesis, two metrics are used
in the analysis: the Handover of Work and the Working Together metrics.
Case perspective – focuses on the properties of the cases. It can answers questions, such as (ProM,
2017a):
What are the most frequent paths in the process?
Are there any loop patterns in the process?
What is the distribution of all cases over the different paths through the process?
Can you select a subset of traces where particular paths were executed?
Can you simplify the log by abstracting the most frequent paths?
Some options to answer the above questions in the ProM tool are the Pattern Abstractions, Trace
Variants, Dotted Chart visualizations, and the trace and sequence clustering plugins among others.
To gain further details in the case analysis perspective, it is possible to use the trace/sequence clus-
tering algorithms in ProM. This solution was inspired by a real-world scenarios and issues associ-
ated with it. Specifically, in some domains, the behavior of the agents (i.e. participants in a process)
can be very flexible. This means that the sequences in which they are involved can be quite diverse.
When the log information is analyzed, usually a very uninformative process model is derived,
known as spaghetti-like process model. These models are so complex and the visualizations are
very overwhelming and hard to comprehend (Bose and van der Aalst, 2009). A solution to this
problem is found in trace and sequence clustering.
123
Process Mining
The basic idea of trace clustering is to split the event log into homogeneous subsets and for each
subset to create a process model (Song et al., 2009). What this technique does is basically identifi-
cation and clustering of similar sequences. A high level of the process is depicted in Figure 39. The
similarity is calculated based on a distance metric – usually the Euclidean or Hamming distance,
while the clustering can be performed by using different algorithms, like k-Means or SOM (Veiga,
2009). A list of the algorithms available for clustering in ProM is presented in Table 42.
As seen from Figure 39, trace clustering works by creating a set of profiles, each measuring a num-
ber of features for each case from a specific perspective (Song et al., 2009). In a second step, the
distance between each case is measured by a distance metric; in this case, the Euclidean distance is
used as it is found to be the most reliable. This measure is defined as
Finally in a third step, similar cases are put together by using a clustering algorithm. Clusters can
be analyzed independently from one another which improves the quality of the results for flexible
environments (Song et al., 2009). Considering that football consists of 11 players which do not act
according to a specific pre-defined process but rather based on quite a few distinct factors from
their surrounding environment, one could reasonably assume that football can be considered a flex-
ible environment within the process mining analytics area. Therefore, it would be interesting to see
if and how trace/sequence clustering could be helpful for football performance analysis.
Sequence clustering is based on a similar idea as trace clustering. However, this type of clustering
is performed directly on the input data, i.e. no features are extracted from the sequences (Veiga,
2009). The plugin in ProM 5.7 has been implemented by Veiga (2009) whose algorithm is based on
first-order Markov chains in which case the current state depends only on the previous state (Fer-
reira et al., 2007). The probability that an observed sequence is assigned to a given cluster is the
probability that the observed sequence was produced by the Markov chain associated with that
cluster or simply the assignment of sequences to clusters is based on the probability of each cluster
124
Process Mining
producing the given sequence (Ferreira et al., 2007; Veiga, 2009). Thus, a given sequence will be
assigned to the cluster that is able to produce it with higher probability (Veiga, 2009). In his im-
plementation of the algorithm in the ProM plugin, Veiga also adds two additional dummy states in
the Markov chain – an input and output states. This is necessary in order to represent the probabil-
ity of a given event being the first or the last event in a sequence, which could be useful to distin-
guish between some types of sequences (Veiga, 2009). The algorithm as implemented by Veiga is
included in Appendix G.
In the analysis section, the SOM clustering algorithm and Markov chain clustering are used. SOM
is used because it is one of the main analytics methods in the current thesis, and it is interesting to
use the same method applied on football game sequences. Additionally, SOM is very efficient with
respect to computation time and is also quite robust concerning the results, especially for situations,
where the characteristics of the process underlying an event log are largely unknown (Günther,
2009). The Markov chain clustering is preferred because it also discovers clusters without the ana-
lyst having to predefine the number of clusters.
Table 42. Clustering algorithms in ProM
Algorithm Description
K-Means The most commonly used in practice among partitioning methods, which constructs k
clusters by dividing the data into k groups.
Quality Threshold Clus- It is predictable (i.e., guaranteed to return the same set of clusters over multiple runs).
tering
Agglomerative Hierar- Gradually generate clusters by merging nearest traces, i.e., smaller clusters are merged
chical Clustering into large ones.
Self-Organizing Map The aim of SOM is grouping similar cases close together in certain areas of the value
range.
Markov clustering Allows for identification of normal or exceptional behavior based on the input sequenc-
es. It discovers the clusters rather than requiring to set them beforehand.
Source: Self-compiled based on Song et al., 2009;Hompes et al., 2015
Time perspective – timing and frequency of events. If timestamps are available, it is possible to
detect bottlenecks, monitor the utilization of resources, or predict the remaining processing time of
running cases. On its own, this perspective will most likely not be too interesting in the football
scenario. However, combined with the other perspectives, it can give interesting insights.
Each of these perspectives gives a different view of the process analyzed. The control-flow per-
spective relates to the “How” question, the organizational perspective to the “Who” question, while
the case perspective answers the “What” question (ProM, 2017b). For a proper business under-
standing users typically have to extract several models that describe different perspectives in the
process analyses (Ingvaldsen and Gulla, 2008).
As seen, process mining is not a reporting, but an analysis tool, which is able to model and analyze
complex processes (Rozinat and Gunther, 2015). Even though it works with historical data, it does
not mean that it is limited to offline analysis, as the results can be applied to running cases (van der
125
Process Mining
Aalst et al., 2012). Not all of the process mining types and perspectives can be applied in a football
game scenario. From the three types of process mining mentioned, only the discovery type is appli-
cable in this case, as conformance checking and enhancement require a model to which the discov-
ered model from the OPTA log can be compared to. In football, there is no “perfect” or pre-defined
model of the game process. Therefore, process mining can help with modelling the real-world pro-
cess of what actually happened during the game. As to the perspectives, it is possible to view the
event logs of the matches from all four perspectives discussed above.
126
Process Mining
Assuming that the letters above line 1 depict events (e.g., pass, tackle or shot) which appear in pro-
portion to their occurrence during the game, then line 1 is a visual representation of the temporal
structure of football performance (Borrie et al., 2002; Jonsson et al., 2010). Four events, a, b, c and
d, represent a temporal pattern that appears regularly. However, in the first representation this is not
visible to the eye, as other more randomly occurring events w and k, prevent the observation of the
pattern (Jonsson et al., 2010). This is one of the strengths of the T-pattern detection algorithms.
They allow the separation of random events from temporal patterns (Jonsson et al., 2010). Another
thing that Figure 40 shows is that there is a larger pattern ((a, b) (c, d)) consisting of two simpler
patterns (a, b) and (c, d) (Borrie et al., 2002). Consequently, even in smaller datasets, the amount of
patterns detected can be quite high – if there are 100 event types the potential event patterns, if all
time windows are considered, is greater than 10010 (Borrie et al., 2002). Furthermore, T-patterns
can be cyclical or acyclical, which means that they can occur in regular time intervals but not nec-
essarily. A pattern of play may occur a few times in the first half of the match and not reoccur till
the last 5 minutes of the same match (Jonsson et al., 2010). Another issue is causality – just be-
cause a pattern exists it should not be assumed that the elements in the pattern are causally related
(Borrie et al., 2002; Jonsson et al., 2010). Finally, detected patterns could be used for prediction by
estimating the probability that when A occurs at time t, B will occur within a critical interval de-
fined as [t + d1, t + d2] (Jonsson et al., 2010).
The idea of studying temporal patterns and recurrence of events did not originate in sports perfor-
mance analysis. The inspiration came from studying animal and human behavior in various other
disciplines, based on the premise that repeated patterns of events exist in the behaviors of humans
and animals (Casarrubea et al., 2015). Studies on temporal patterns observed in human behavior
have been done in the area of healthcare and medicine (autism, schizophrenia, dementia), in com-
munication and conversation, stress factors and routine tasks, and in language practice, to name a
few (Casarrubea et al., 2015).
Certainly, this relates to the question of whether football, and thus the behavior of the football
players, is predictable or not. It is a popular debate in sports science with practitioners having some
extreme points of view. Offering a solution to this debate is not a goal of the thesis. However, it
should be pointed out that considering the success of analytics in various sports nowadays, it is
becoming clear that the game is not that unpredictable as it was thought previously. In an interview,
127
Process Mining
the Head of Opposition Analysis of Liverpool FC, Chris Davies, answers a question of the inter-
viewer on whether football players are more predictable than they would like to think. In his opin-
ion, players are predictable, especially under pressure when they revert to habits that are ingrained
in them, possibly since childhood, like for instance, “what foot you take the ball with, and what
movements you make, and what actions you make”. He also states that the best players tend to be
less predictable and offer more variability in their game, but that overall, if one watches a player
20, 30 or more games, it is possible to detect some of their habits. The full interview is available in
Pearce and Vladimirov (2014).
T-pattern analysis has been applied in a few studies in football. Some examples of how a T-pattern
on football data looks like can be seen in Figure 41. Back in 2002, Borrie et al., published one of
the first studies on the topic specifically focusing on football. By coding 13 matches they demon-
strated that many temporal patterns exist in football, and thus, the behavior of a football team is
more synchronized than the human eyes can detect. Their study showed that new kinds of team and
player profiles can be discovered based on the detected patterns of behavior. Almost a decade later,
Jonsson et al. (2010) presented preliminary results of how T-patterns can be used in soccer, basket-
ball, boxing and swimming, for detecting temporal event patterns with a focus on the motor skill
performances. Recently, Castañer et al. (2017) analyzed the playing styles of Ronaldo and Messi,
especially the way they use their motor skills in attacking sequences that result in a goal. They de-
tected patterns characterizing both players in terms of body parts they used the most, foot contact
zone, body orientation and action. Diana et al. (2017) use T-patterns to detect the effect of game
location (home/away) on the structure of play. Their focus is on attacking actions and comparing T-
patterns between home and away matches.
(a) (b)
Figure 41. Representations of T-patterns on football event data
(a) Temporal and hierarchical representation of a T-pattern (Jonsson et al., 2010); (b) Complex
T-pattern with a length of 10 events and 5 levels
Source: Diana et al., 2017
Overall, in football, there is a lack of studies on the temporal structure and interrelationship be-
tween different events. This is especially the case concerning professional tracking data. The re-
viewed papers on T-pattern analysis in football are using an observation method and different event
128
Process Mining
coding techniques, basing their analysis on semi-manual extraction of video recorded football
matches. The results, although valuable, cannot be compared with event data provided by a compa-
ny like OPTA, mainly due to the detail with which events are recorded. Another limitation of pre-
vious studies employing the T-pattern method in football, is that their structure, results and discus-
sion sections are rather repetitive, as they tend to be published by the same group of authors – a
similar case as with the SNA studies in section 6.2. In summary, there does not seem to be signifi-
cant development between the paper published by Borrie et al. (2002) and the newer papers by Di-
ana et al. (2017) and Castañer et al. (2017). A major limitation of previous studies is that they do
not go further to investigate if and how this method can be used by practitioners. Considering the
large amount of potential patterns that can be detected, it is questionable if the method is helpful for
practitioners as some of the visualizations tend to be overwhelming and not very intuitive, at least
for real-time decision support. Finally, all previous studies use a selection of matches and it is not
clear how and if the findings can be generalized and applied in practice. Further investigation to
establish the relevance is imperative.
Beyond T-patterns, another interesting technique used for visualization of temporal event sequenc-
es, called Outflow, is studied by (Wongsuphasawat and Gotz, 2012). They show how Outflow can
be used to give a summarized overview of the way in which specific event pathways are linked to
positive and negative outcomes – in football, for example, a win or a loss. A study with 12 partici-
pants also demonstrates that the users can learn how to use the technique after a 15 minute of train-
ing. However, no other papers have been found that discuss this type of visualization and its rele-
vance for decision making in football.
Taking into consideration the findings so far, it is of interest to explore alternative methods for pat-
tern detection which will give deeper insights into the tactics of players and teams in football. A
reason for using T-patterns, as mentioned by Camerino et al. (2012), is that tactical patterns cannot
be detected directly or through visual inference. However, process mining can offer a visual and
quick approach to this. Finally, more studies are needed to explore the technical and tactical strate-
gies during different times of the match (Cavalera et al., 2015).
129
Process Mining
Figure 42. Process mining steps for analysis of the OPTA event data
In a first step, the original OPTA data needs to be preprocessed and converted in a specific way
required by the process mining algorithms, i.e. the original data need to be converted into an event
log data (see section 8.4). This is followed by analysis of the resulting event logs, discovery of the
process models and interpretation (see section 8.5). The potential of the applied analytics tech-
niques to gain a tactical understanding in football is presented in the discussion section (8.6).
One exemplary game is analyzed as the goal is to demonstrate the techniques. The game between
England and Iceland is chosen, because Iceland won, while the England team showed one of its
worst performances in a tournament. Therefore, it is interesting to explore what process mining will
reveal about the player and team behaviors. A summary of the game is presented in Table 43.
Table 43. Match statistics of England vs. Iceland’s game
Date: June 27, 2016
England Performance indicators Iceland
1 Goals scored 2
63 Possession (%) 37
18 Total attempts 8
5 on target 5
10 off target 3
3 blocked 0
0 against woodwork 0
7 Corners 2
2 Offsides 1
1 Yellow cards 2
0 Red Cards 0
6 Fouls committed 15
14 Fouls suffered 6
525 Passes 243
451 completed 173
Source: (UEFA, 2016b)
130
Process Mining
8.4 Data preparation
The first step of process mining is to pre-process the event log data from OPTA for the analyzed
teams. Depending on the amount of additional information needed on each event, i.e., the attrib-
utes, as described previously, the task can vary in complexity. The main issue, converting the log
data into a format required by the process mining algorithms, is that each event in the OPTA log is
described over several rows. Each row has different types and amount of qualifiers which describe
the event further. For instance, if a pass is analyzed, it can have qualifiers referring to the length of
the pass, the angle, the x and y coordinates, etc. There are 36 qualifiers in total that can be used to
describe the pass in more detail. Not all of them are used for each pass. The situation is similar with
the rest of the 73 event types. Thus, it is a challenge to extract the relevant information in a way
that the attributes of each event are added on a single row. A Python script tackles this challenge.
Below a few key steps executed by the code are introduced:
1. Eliminate unnecessary event types (formation change; deleted event, namely all events not
related to ball possession or loss thereof).
2. Re-sort the data according to the scheme provided by OPTA, so that an accurate sequence
of events can be obtained.
3. Pivot the qualifiers (each tuple of qualifier ID and value is transposed to one column per
qualifier)
4. Summarize data by event IDs (one row per event including all values for qualifiers)
5. Assign case IDs
The output of the pre-processing step is a sequence of all events referring to the game with the ball.
This means that a sequence for team A starts when the team gains ball possession and ends when
the team losses the ball. This is a similar procedure followed in the SNA part. The difference is that
here all events are included and not just the passes. An overview of the final data format is present-
ed in Figure 43.
The minimum requirements for process mining are available via the columns “Seq. Num.”, “Event
type”, and “Timestamp”. Additionally, the Period ID column (1 - first half of the game), the x and
y coordinates of the event in question, and its outcome (1 – successful, 0 – not successful) are
available as attributes.
131
Process Mining
Figure 43. Pre-processed OPTA event log suitable for process mining tasks
Tools used for the analyses are:
ProM – this is an open source process mining software which offers a wide range of algorithms
and techniques to process and analyze event logs. There are also various plugins available to extend
the analytics options further. In the current thesis two versions of the software were used: ProM 5.2
and 6.711. Some useful techniques, like for instance, the SOM trace clustering are missing in the
later version, and this is why both of these versions were used in the analysis.
Disco – this is a proprietary process mining software developed by Fluxicon12. It is far more user
friendly than ProM and it is easier to read-in the event logs and get quick results. Although having
a better learning curve and results that are easy to interpret, it does offer far less analytical options
than ProM. However, some of the techniques are easier for analysis, and therefore it is used in
combination with ProM.
11
http://www.promtools.org/doku.php
12
http://fluxicon.com/disco/
132
Process Mining
actions that characterize both the team and its players. As discussed in 8.1., there are various algo-
rithms that can be used to derive a process model from event log data. The algorithms are initially
run with default settings, as this works well in most cases, at least in giving an initial idea of the
usefulness of the algorithm in each individual case. In Figure 44 the results from the four algo-
rithms, Alpha, Fuzzy, Heuristic and Inductive Visual Miner are presented for the team of England.
a) Alpha Miner
133
Process Mining
b) Heuristic Miner
c) Fuzzy Miner
134
Process Mining
From the figure above, the Alpha Miner is the algorithm that derives the least helpful model. This
can be seen in Figure 44 – a), as there are a lot of events connected to the initial starting point as
well as to the end point. While the Heuristic Miner derives a more process-like model, it lacks the
ability to integrate three of the events (see Figure 44 – b). The model derived from the Fuzzy Miner
shows a more detailed view of the process by adjusting the thickness of the connectors to the like-
lihood that a sub-sequence of two events occurs. Similar to the Heuristic Miner, the Fuzzy Miner is
also not capable to create one holistic process out of all the events. Therefore, these algorithms not
be considered in the analysis. The Inductive Visual Miner (IDV) is able to find sound process mod-
el (Figure 44 – d). Furthermore, it is able to give a detailed overview on the most likely paths in the
process model. Therefore, it is chosen as the best algorithm to generate the process models of both
teams.
Figure 45 and Figure 46 present the process models for England and Iceland respectively.
135
Process Mining
Figure 45. Process model of England (vs. Iceland) mined with the IVM
136
Process Mining
Figure 46. Process model of Iceland (vs. England) mined with the IVM
Both models display all activities and paths for each team during the game. The darker blue color
indicates that those activities (events) occur more often during the game. Not surprisingly, the
event “pass” is usually highlighted in this way. From an initial inspection of the models, it is possi-
ble to gain first impression about the event frequency, e.g., for England it is immediately visible
that the team had 580 passes or 9 corners awarded. But more interestingly, it is possible to visualize
the dependency between events i.e., how often an event was followed by another. For instance, in
England’s team, once, a “foul” (out of 20), was followed by a “card” event. Unfortunately, the
model does not distinguish whether the foul was caused or suffered by England. Therefore, the
process model for Iceland also shows that there are 20 fouls in the match.
137
Process Mining
Therefore, in a next step, various techniques and visualizations from the case perspective are ap-
plied. These are explained below.
One option is to examine closely sequences that are of interest to the coach or his team. For in-
stance, it is possible to inspect the sequences that contain the event ”miss” (any shot on goal which
goes wide or over the goal). With a single click on it the paths that have this event at least once, are
highlighted. In this way, it is possible to inspect each sequence separately by using the highlighting
filters in the IVM ProM window. The sequences containing the event “miss” for England are pre-
sented in Figure 47.
On the left, the sequence number is visible, and on the right, each block is an activity ordered by
the timestamp. Sequences have a start and an end point. Each sequence is clickable, which extends
the view and it is possible to see the timestamps as well as an information on whether the event was
completed or not (see last sequence). Clearly, in 3 out of 11 instances, England missed a scoring
chance after a longer passing sequence. In addition to the possibility to inspect a specific sequence
separately, it is also possible to directly see what the input and output activities are for an event of
interest. For instance, it is possible for the coach to see which events occur before his team is dis-
138
Process Mining
possessed, or which events follow a dispossessed event. This is done by using the sequence pat-
terns options in process mining. One option is presented in Figure 48.
Figure 48. Input and Output patterns for the event „dispossessed“ for Iceland’s team
Figure 48 shows the input patterns for the event “dispossessed” (player is successfully tackled and
loses possession of the ball) for Iceland’s team. In 62.5 percent of the sequences ending in dispos-
session, a pass occurred, while in 37.5 percent a “ball recovery” was also part of the sequence. As
with England’s team, by using the IVM, it is possible to check the exact sequences that end in dis-
possession. This is presented in Figure 49.
139
Process Mining
On the left side in the figure above, a sequence number is visible. In this way, it is possible to refer
directly to that sequence and gain additional information about it. For instance, if the coach or ana-
lyst decides he/she would like to have more information on sequence 453, the Log Visualizer can
give quite a few details about it (see Figure 50). Some of the information available is: this sequence
occurred in the second half of the game (62’10”) and consists of three events in total, in which two
players were involved – the midfielder Gudmundsson recovered the ball, then made a pass to the
striker Sigthorsson, who in turn lost the ball. This happened at coordinates x: 28.3, y:15, which
falls in the center zone, that is, the defense zone for Iceland. Clearly, this represents a dangerous
situation as England regains possession in their offense zone.
From the sequences it is also possible to gain more insights about the value of a player. When as-
signing credit to a player, standard statistics do not give enough credit to players who managed to
keep the ball in possession by successfully getting out of tight situations (Gregory, 2017). One
should not only look at players who shot towards the goal or made the key assist, as sometimes it
can be much more difficult to enable that assist in the first place (Gregory, 2017). Process mining
can be used to gain additional insights into a player’s involvement in such situations. One option is
to filter out all sequences that end in the following events: miss, post, attempt saved and goal.
These are the four events that OPTA records and which mean a shot on goal has been made.
140
Process Mining
All sequences of England’s team that end in one of the mentioned events are filtered out. The re-
sulting process model is presented in Figure 51. Obviously, 11 sequences ended in “miss” and 6 in
“attempt saved”, while one sequence ended in “goal”. The question is, which players mostly started
and ended those sequences. This information is also available from the filtered sequences and it is
presented in Table 44. Based on Table 44 – b and c, it becomes clear that Kane and Alli are both
frequently initiators and end-ers of offensive sequences in England’s team. This makes these two
players very valuable.
Table 44. Overview of players involved in the offensive sequences
The Dotted Chart is another visual analytics technique available in process mining. It is simple, yet
extremely useful for having a quick look at various aspects of the game and the players. It can be
tweaked to present different dependencies between time, events and players. Some options are pre-
sented in the figures below, but more are possible.
141
Process Mining
Figure 52 visualizes the outcome of events in which England’s defenders were involved in the first
half of the game. Red is unsuccessful outcome (for instance, ball lost) while green is successful
outcome (for instance, successful pass). It is possible also to clearly see how often and when the
defenders are engaged in the game. If one is to observe the timeframe between the first two goals,
England’s defenders show little action and the two visible actions in the highlighted part above
have a negative outcome. It is also possible to immediately see which defenders were involved in
an unsuccessful activity, in this case Cahill and Rose.
142
Process Mining
Figure 53 similarly presents the outcome of the activities in which Iceland’s players were involved
during the first half. In this case, it can be seen that Iceland’s defender Skulason (last one in the
Figure above) makes more mistakes on average compared to the rest of the defenders.
Figure 54. Event types and outcomes per player from England’s team
Finally, it is also possible to visualize the exact types of events and their outcomes per player. This
is presented in Figure 54. This type of visualization makes it very easy to immediately know which
player made mistake in which type of event.
Another useful visualization is the Meter Chart under the Basic Performance Analysis option in
ProM 5.2. It displays the events and the players involved in them, by using frequency measure
143
Process Mining
(other measures are also possible, like average, minimum, maximum, among others). This is pre-
sented in Figure 55.
Figure 55. Meter chart displaying event „Clearance” and player involvement from England’s team
From the figure above it can be seen that defenders are mostly involved in the event “clearance” 13,
which is understandable. However, one can also quickly see that the defender Cahill is the one who
has done most clearances, followed by Smalling and Walker.
As a final step in the case perspective analysis, trace and sequence clustering is performed by using
two clustering algorithms, SOM, and Markov Chain clustering.
In order to use a SOM to perform trace clustering, as described in the previous chapter, profiles of
the traces are built based on some features. There are several options that can be chosen in ProM,
and to do this right, one needs to ask what makes two sequences in football similar to each other.
That would be the number and type of events in each sequence, the sequence duration, as well as
the participants in each sequence, i.e. the players. These were all selected in ProM as features based
on which the profiles of the sequences should be built, before the SOM clustering algorithm is
used. There are several parameters for the SOM network which can be fine-tuned in the training
process. These are briefly explained below:
Width and Height: this refers to the number of cells that should be used for the resulting
rectangular grid. Each cell corresponds to one neuron.
Radius: usually set to 2
Random seed
Training epochs
13
OPTA defines „clearance“ as: Player under pressure hits the ball clear of the defensive zone or/and out of
play.
144
Process Mining
In a few publications that have used SOM for trace clustering (Buddhika, 2016; Günther, 2009;
Song et al., 2009), parameter tuning was not discussed in detail. Usually, the Euclidean distance is
used in combination with SOM and this combination is applied in the thesis as well. As to the
width and height, there should not be more cells than there are traces (Günther, 2009). This is cho-
sen usually intuitively after trial and error. The radius value which is used in step 5 of the SOM
algorithm (as described in 7.1) as well as the random seed parameters are usually kept at their de-
fault values of 2 and 999 respectively. This is the choice also for the analysis employed below.
Additionally, the colors in the resulting map, indicate the relationship between the neurons, i.e.
neurons with a similar weight vector will be painted in a similar color (Günther, 2009). Clusters
with many similarities, exhibiting normal behavior are located in “high land” colored in green,
while the clusters with exceptional cases are located at “sea” colored in blue (Buddhika, 2016). The
final results of the SOM clustering are presented in Figures 56 and 57 for England and Iceland re-
spectively.
145
Process Mining
The SOM results are confirmed by the sequence clustering and the generated Markov chains for
recognized clusters. As one must first pre-define the number of clusters that need to be recognized,
a trial and error for England’s team reveals that when choosing a smaller number of pre-defined
clusters (e.g., 2 to 4 clusters) the resulting clusters are of similar size and the Markov chains look
relatively similar to each other. This again confirms the impression from the SOM clustering that
England’s team plays in a rather predictable manner and not much about their behavior is excep-
tional or unique. The Markov chains though show more precisely a summarized overview of the
main behavior of the team. In addition, one can also see the probabilities that one event is follow by
another. Finally, there are a few pre-processing steps that can be used to have better clustering re-
sults, especially because without such preprocessing, the analysis can take more than 24 hours. The
options that ProM offers are:
146
Process Mining
Table 45. Preprocessing parameters for Markov chain sequence clustering for England’s team
Parameter Value
It is chosen that an event should occur minimum 30 percent and that there should be a minimum of
2 events in a sequence, to avoid rare and not interesting sequences of only one event. A sequence
should also occur at least 3 times, while the maximum parameters are left at default. The number of
clusters with these preprocessing steps applied is set to 4. The resulting Markov chains can be seen
in Figure 58 (a, b, c, and d).
a) Cluster 0: 30 instances
b) Cluster 1: 3 instances
c) Cluster 2: 33 instances
147
Process Mining
d) Cluster 3: 35 instances
Figure 58. Markov chains for England’s team
This method gives an opportunity to easily drill down and get a quick overview of not only in
which events the players are mostly involved and how often, but also the exact sequences that oc-
curr the most often. One can also choose a higher percentage and find out if there are some se-
quences that occur 50 or even 80 percent of the time. In England’s case, when the minimum se-
quence occurrence is increased to 10 and the minimum event occurrence is increased to 40 percent,
two clusters are generated with Markov chains in Figure 58, a) and d).
From England’s Markov chains its can be concluded that in roughly 30 percent of their game play,
the ball is lost following just one pass after the ball was out of play. This means that they recover
the ball and then lose it with just one pass (cluster 0). Furthermore, in 3 instances, England’s team
makes an unsuccessful dribble attempt past an opponent (cluster 1); there is a probability of 0.825
that they will lose the ball following a pass after a ball recovery (cluster 2), and finally, following
an aerial duel, the probability for a clearance is 0.583 (cluster 3). This all speaks against England’s
team and shows at least some of the reasons behind their loss.
For Iceland’s team it is more difficult to generate Markov chains that can summarize the behavior
well. One reason is that they are more resourceful than England’s team. Thus, understandably, it is
less likely that their play can be clustered in a meaningful way. As it is the case with England, the
minimum number parameters are modified while the maximum number parameters are kept at de-
fault. Using the same parameter setting for England, only 6 instances are left after the prepro-
cessing steps. The situation is similar when the parameter “min event occurrence” is increased.
Therefore, after a trial and error it is decided not to use the preprocessing parameters in Iceland’s
case and to proceed with the clustering directly. Cluster results by pre-defining a different number
of clusters are presented in Table 46.
148
Process Mining
Table 46. Clusters and number of sequences for Iceland’s team
0 120
1 58
4
2 102
3 49
0 120
1 79
5 2 21
3 34
4 75
0 127
1 45
2 26
6
3 60
4 58
5 13
The Markov chains and the instances are inspected for all the clusters in Table 46. It is decided that
the 5 clusters summarize the behavior in the best way. For instance, cluster 3 is presented in Figure
59.
Figure 59. Markov chain for Iceland’s team: cluster 3 with 34 instances
From the Figure above one can conclude that Iceland’s team often is engaged in passing events
(which is not that informative, as passes are the most frequent events for every team), but also
events such as “ball recovery”, “interception”, “clearance” and “out”. Furthermore, every time
there is an interception, it is most likely followed by clearance, which is then followed by “out”
with a probability of 0.75. This means that Iceland’s team is quite successful in defending their half
and intercepting the ball from the opponent’s team.
149
Process Mining
Figure 60. Markov chain for Iceland’s team: cluster 1 with 79 instances
Figure 60 shows that a “tackle” is most likely followed by a “ball touch” which in turn is followed
by “out” (with a probability of 0.4) or “challenge” (with a probability of 0.6). This means that fol-
lowing a tackle, for Iceland’s players the ball goes out of play for a throw-in or goal kick (out), or a
player fails to win the ball as an opponent successfully dribbles past them (challenge). By using
further analyses which are offered by process mining, for instance, the dotted chart, one can also
check which players are involved in these unsuccessful events.
150
Process Mining
151
Process Mining
In Figures 61 and 62 the colors add a cluster point of view to get a better visual perspective. The
oval shape also has a meaning. The more vertical shaped nodes have a higher proportion of ingoing
arcs, while the more horizontal shaped nodes have more outgoing arcs (ProM, 2017a). In this case,
the clusters do not change significantly by removing more edges, which means that players from
both teams have, on average, good participation over the course of the game, and display balanced
participation. There are a few players that distinguish themselves from the others, however. In Ice-
land’s team, Bjarnason, a midfield substitute player has a distinctly vertical shape which means he
gets more work delegated from the other players. Arnason, a defender and Bodvarsson, a striker,
also gets more work delegated than they themselves did for other players. In general, strikers would
perhaps be players who are expected to have more incoming than outgoing arcs due to the nature of
their position and, thus, the tasks that are required from them. Defenders, on the other hand, would
ideally have more outgoing than incoming arcs. In England’s team, Sterling and Rooney display
slightly more vertical shapes, but overall, all players have a more balanced handover compared to
Iceland’s team.
152
Process Mining
The second metric investigated is the Working Together metric. This gives an insight into which
two players often participate together in the same attacking sequence – for example, they pass the
ball to each other in the same sequence.
Figure 63. Working Together comparison between England (left) and Iceland (right)
The network graphs are generated by using the ISOM Layout and degree centrality. As it is the
case with the HoW, the graphs are not too different if other network metrics are used. This layout
algorithm shows that in the team of England there are two more distinctive clusters of players that
work together during attack and which consist of most of the players in the team – cluster E-1 con-
sists of five players (S-Sterling, S-Kane, S-Sturridge, D-Cahill, M-Wilshere-Sub); cluster E-2 con-
sists of six players (M-Rooney, M-Alli, D-Walker, D-Smalling, D-Rose, G-Hart). Three players
from this team are isolated from the clusters – F-Vardy-Sub, F-Rashford-Sub, and M-Dier. The two
substitute players come in minutes 60 and 86 respectively, so it is not surprising that they are out-
side of a cluster. Dier, on the other hand, plays as a central midfielder, and therefore, has connec-
tions to both the E-1 and E-2 clusters. However, he is substituted at half-time by Wilshere, who did
not perform well in an earlier match against Slovakia (Glendenning, 2016). From this SNA metric,
Wilshere does appear to have stronger relationship with the players from the E-2 cluster as well. In
Iceland’s team, players are closely clustered together, with Traustason connected with the other
substitute player, Bjarnason, the goalkeeper and Bodvarsson. The midfielder, Bjarnasson, appears
153
Process Mining
to have the closest connection to Skulasson and Sigurdsson. The defender Saevarsson works to-
gether occasionally with the rest of his teammates but does not seem to have a stronger relationship
with a particular player. In the case of defenders, this behavior could mean also that the defender,
by the nature of his task, more often interrupts a sequence of the opposite team. Skulasson works
together with Bodvarsson quite often.
8.6 Discussion
This chapter presents an exploratory study with the aim to evaluate the potential and suitability of
process mining for football performance analysis. As seen, process mining is a collection of algo-
rithms and analytics techniques which are widely used in other domains for analyzing all kinds of
business processes. It has never been applied to sports, however. Not all algorithms and visualiza-
tion techniques are demonstrated in this thesis and not all types of process mining can be used for
performance analysis in football. As seen, the discovery type of process mining algorithms makes
the most sense, as they can demonstrate the exact behavior of teams and players. The conformance
checking type of process mining cannot be useful in this scenario because one does not have the
perfect process model according to which players need to behave during the game. Enhancement of
the process model does not seem to be useful in this case either. However, the discovery algorithms
and techniques proved to be very useful for analyzing a football game from a process perspective.
Table 47 presents a summary of the techniques and algorithms used and whether they give infor-
mation about the team or the players.
154
Process Mining
The case-flow perspective with the various types of algorithms for discovering the process model
does not seem to be very useful in a football scenario. On the one hand, one can immediately see
which event types occur the most often and how events are connected, but for some event types,
this perspective is not useful. For instance, the event “foul” will appear in both teams, and it is not
clear from the mined process model which team has made or suffered how many fouls. This is due
to the way in which the sequences are extracted from the original dataset. As each event has differ-
ent qualifiers, if all those are considered when creating the process model, there would be too many
variants. As the idea of the model is to give a quick overview of what actually happened as well as
some dependencies between the activities, such level of detail is omitted in the thesis. It is unfortu-
nately not possible to avoid this issue. However, this is the case only for a limited number of
events. This type of visualization, as the Inductive Visual Miner, can be normally used also to ana-
lyze the process from a time perspective i.e. to check which activities last too long and discover
bottlenecks. However, due to the nature of football, analyzing this process model from the time
perspective would make no sense. The other two perspectives, however, (case and social perspec-
tives) are more useful for performance analysis.
The case perspective offers various useful visual analytic techniques and clustering algorithms
which can give valuable insights into the team and player behavior. For instance, once the process
model has been generated with the Inductive Visual Miner, it is possible to drill down and filter out
specific sequences which are of interest for the analyst. In this case, the offensive sequences are
filtered out, and this gives answers to questions like:
How many times did a team’s action ended up in events leading to shot-on-goal?
Which events are those exactly? E.g., miss, post, attempt saved or goal.
Which players were involved`?
When did these events occur and how long did the sequences last?
By using this option, it is possible to not only visualize the sequences leading to shot on goal for
England’s team but also to find out which players mostly started or ended a sequence. These anal-
yses can be very useful in assessing the value of a player in a game.
Furthermore, by using clustering algorithms like SOM and first order Markov chains, it is possible
to gain a quick insight into the behavior of a team. Such analyses can be useful for example during
the half-time break in order to make tactical readjustments for the second half. Finally, social net-
work analysis is used in this case to gain more player insights. In this case, all event types occur-
ring between the players are used in order to analyze their cooperation patterns. The two metrics
that are applied, Handover of Work, and Working Together, prove to be valuable in revealing im-
portant information about separate players. For instance the Working Together metric can reveal
which two players often cooperate in a sequence of ball possession, which in turn helps to plan
tactical adjustments accordingly, especially concerning the defense of one own team. The Hando-
155
Process Mining
ver of Work metric can show which player is overwhelmed by having more work delegated from
the other players. This could indicate fatigue or for the opponent can mean that that player should
be the focus of their own defense.
To sum up, process mining does seem to offer valuable techniques and algorithms which give
quick insights into players’ and team’s behavior. The results are usually quick and understandable,
i.e. it is not too challenging to understand the results and visualizations, except perhaps for the
SOM trace clustering. This type of analysis can be used for analyzing successful and unsuccessful
sequence outcomes, establishing defensive strategies against specific players, and overall gaining
insights from team and player behaviors. It is more user friendly compared to the T-pattern analy-
sis. The sequences of events are clearer. There are also various options for additional analyses of
the sequences as well as filtering out and focusing on specific types of sequences, e.g., offensive or
defensive, sequences ending in a specific event, or sequences in which a particular player is in-
volved, sequences that last longest, etc.
Process mining offers even more possibilities for analyses of the action sequences. Therefore, fu-
ture research could explore whether the conformance checking type of process mining would be
helpful in a football scenario. For instance, it may be possible to use conformance checking tech-
niques to simulate and test the outcomes of sequences by enhancing the event log with other
events.
156
Process Mining
Interim conclusion of Part III
Chapters 6, 7 and 8 demonstrate how network analytics methods, unsupervised artificial neural
networks, and process mining techniques can reveal valuable information about the players’ and
teams’ behaviors. The results in these chapters show concretely how such methods can be applied
to answer specific questions about performance in football. In the literature, these methods have
always been discussed as tools for pre- and post-match analysis. Here, however, many examples of
applying these methods in real-time scenarios are given.
The social network analysis chapter demonstrates that a) network metrics at the team, player and
sub-group level can support the coach and their teams with decisions regarding which player
should be substituted; b) who the core team members are, and thus players that most likely should
not be considered for replacement; c) the effect that a removal of a player has on the team behavior;
d) change detection and its implications in practice, and e) ways to study team behaviors via vari-
ous sub-group network metrics such as community detection and triad formations.
The chapter on self-organizing maps demonstrates how a neural network can convert a high-
dimensional dataset consisting of 47 different event types in 35 matches into an understandable two
dimensional cluster grid that reveals the dependencies between teams, match outcome and events
they are involved in.
Finally, process mining techniques are used for the first time on sports event data. Although not
every part of process mining is useful or suitable for performance analysis in football, quite a few
of the available algorithms and visualization analytics options prove to be useful. Process mining
allows for a quick inspection of specific sequences of interest for the coach and their team. It is
possible to analyze specific plays that end in an event in which the coach is interested in, e.g., miss,
shot on goal, dispossessed to name a few. One can filter out and inspect sequences in which a play-
er is part of. Furthermore, it is possible to analyze the cooperation among teammates by using met-
rics from SNA with the difference being that in process mining, all events and not just the passes
are included in the analysis. Sequences of play can be clustered together which allows for the be-
havior of a team to be summarized, by revealing the most likely way of play. One can use such
information to prepare against an opponent. A summary of the techniques and metrics applied in
each chapter as well as their purpose is included in Table 48.
157
Process Mining
Table 48. Methods and techniques used in Part III
Method Metric/Technique Useful for
158
Process Mining
Table 48. Summary of methods and techniques used in Part III (continued)
Process Mining Sequence of play clustering with Summary of the team behavior
Self-organizing Maps and Markov Good for revealing typical (normal)
chains and unusual behavior of a team
Suitable for pre- and post-match analy-
sis mostly
Part IV discusses further the application of the results in a real-time decision support use case for
player substitution. It establishes a link between the findings in Part II and Part III.
159
Part IV – Decision Support Application
160
Use Case
9 Use Case
The objectives of this chapter are twofold. First, an analytics framework for real-time decision sup-
port is created which aims to give a concise overview of the data and methods available, factors
relevant for the coaches, and establish a link between them. Such a framework should aid coaches
and their team in practice when deciding on an analytics strategy. Second, a mock-up is created
which demonstrates how a decision support solution based on the data analyses could look like in
practice.
Parts of the results outlined in this chapter have been published in Davcheva et al. (2016). This
chapter extends and discusses the research.
9.1.1 Real-time decision making in football and the necessity of data-driven decision
support
As discussed previously, most of the analytics results in football performance analysis have been
used for various decisions pre- and post-match. The use of analytics for decision making in a real-
time scenarios, i.e. during a live game, has not been thoroughly discussed. However, this is becom-
ing an increasingly relevant topic that needs to be addressed by the research community consider-
ing the recent developments and usage of various technologies during live matches. Additionally,
there are studies in the performance analysis literature discussing the design and usefulness of a
notational analysis system, and the usefulness of feedback by computerized systems in football.
The findings from these studies are used as a basis for the conclusions and recommendations in this
chapter. Moreover, previous chapters present several analytical techniques on event tracking data
and discuss the results and their usefulness for tactical decision support. A few of these methods
and techniques can be useful for real-time decision making during live matches. There are two
main factors that drive the development of real-time decision support in football based on data ana-
lytics.
The first factor is related to the decision making process of football coaches and humans in general.
During live matches, coaches need to make decisions quick and under pressure. Even though
coaches are normally experts who have long experience, making a decision in such circumstances
is not straightforward. Furthermore, several studies in the literature show that the observations of
coaches and their recollection of events are subjective and often in error (Franks and Miller, 1986;
McDonald, 1984); in two additional studies, coaches could recall only around 30 percent of factors
that lead to successful performance in football and were less than 45 per cent correct in their post-
game assessment of the events happening during the game (Franks and Miller, 1986; Franks and
161
Use Case
Miller, 1991b). Another study also found that experienced coaches were more likely to report a
difference in performance when none existed and were very confident in their decisions, even when
incorrect (Franks, 1993). This memory issue is compared to the testimony of witnesses of crime
scenes as these can often be inaccurate or incomplete, depending on the degree of violence, the
number of perpetrators, or focus of attention during the crime (Franks, 2004). Similarly, in a sports
scenario, coaches can have a different excitement levels throughout the game. What is considered
important differs from game to game, and coaches tend to direct their attention to more central fea-
tures of performance, ignoring the non-critical events or their sequence (Franks, 2004). Lastly, per-
sonal biases are also a factor that affects the ability of coaches to give an accurate and objective
account on the events that happened (McDonald, 1984). Thus, it is important that coaches base
their decisions on objective information, as an error in the observation and evaluation of match
performances can have “knock-on” effects on the match outcome and the entire coaching process
(Carling et al., 2005).
To tackle the issues associated with proper recollection of events, back in 1986, Franks and Good-
man suggested that one solution is to record the occurrence of behavioral events in some coded
form. This refers to what is now known as notational analysis. At that time, however, the problem
was to conduct comprehensive, sequential analysis completed in real-time (Franks and Goodman,
1986). This is no longer the case as nowadays there are various tracking companies collecting event
and fitness data.
The second driver behind a real-time data-driven decision support in football is the rapid develop-
ment and adoption of technology in football, especially pre- and post-match. Nowadays, technolo-
gy is being used in live matches to support the referees (still controversial), and the opportunity to
use it for medical purposes and injury prevention are being worked on. There have been a few in-
stances when the use of data-driven decision support during live matches has been discussed, e.g.,
at the MIT Sports Analytics Conference. Recently, the tracking company ChyronHego published
an article on its blog in which it is mentioned that “to have the capability to access player perfor-
mance data from the team bench during the game is the next crucial step…” (Gederman, 2018).
This is one of the rare cases a professional tracking company discusses this publicly, but they do
mention that this has not been done previously and that they are testing the concept with several
European clubs.
162
Use Case
to choosing the right performance indicators and other measures that will provide coaches with the
most relevant information for making crucial decisions during live matches. The challenge in this
case is that there are a lot of performance indicators which vary per player position and which in
the past decade have proven to be not entirely useful for determining success in football. Most no-
table example is the ball possession as an indicator. A team that has higher possession is not neces-
sarily a more successful one.
Thus, an important first requirement is that the solution displays only the most relevant infor-
mation. A wealth of data means that coaches and their staff will have to dedicate more time in
evaluating the information and discussing possible actions (Maslovat and Franks, 2008). Therefore,
one of the first steps in developing a solution is to determine what exactly is relevant for decision
makers, in this case, the coaches. This is a challenging task as coaches can have different percep-
tions on what is relevant and what is not. However, chapter 4 consolidates findings from the litera-
ture and a qualitative study with football experts. As a result, several categories of factors which
are important for coaches for real-time decision making can be identified. Another issue is that
coaches sometimes consider different aspects of the player performance as relevant. This is a
common knowledge and not surprising but it is confirmed in the qualitative study (see chapter 4).
Therefore, another important requirement for a solution is that it is flexible and adaptable to coach-
es’ wishes. This means that it should be possible to add and remove functionalities and relevant
metrics and performance indicators depending on what the coach would like to work with. Similar
conclusion is reached by Davcheva et al. (2016) following a survey with football analysts and
coaches working with various technology solutions to analyze team and player performances.
The decision support solution would be delivered most likely via a graphical user interface by, for
instance, using dashboards. Below are a few requirements for the UI:
Ease of use – it should be straightforward and clear to coaches how to navigate through the
dashboard in order to get the information they want without too much effort.
The charts and graphs should convey a single message and needs to be accompanied by a
descriptive title, annotation and labels (Power, 2013).
More historical data should be available to allow users to compare metrics of interest to
previous matches (Malik, 2005).
There should be no delay in retrieving information from the dashboard (Malik, 2005).
The solution needs to be scalable and allow for more users to simultaneously be logged in
without causing it to crash or delay (Malik, 2005).
A data-driven solution should also include information on the behavior of the players when they
are not in touch with the ball, as for a significant portion of the game players do not have a ball
contact (Maslovat and Franks, 2008). This is not possible to achieve with the Opta data as these
focus on event tracking, which records all events during the game, and thus, mostly reflects the ball
163
Use Case
related actions. It is an important aspect of the performance by all means, but certainly not the only
one.
In this thesis, analytics techniques from three different methods are used in order to transform event
data into a useful information for coaches. They require different preprocessing efforts:
Social network analysis – Compared to the other analytics methods, the SNA metrics re-
quire relatively less data processing. Specifically, successful passes between the players
need to be extracted in order for most calculations to be possible. As soon as that is done,
however, the calculations are extremely fast and reliable. In return, it is a powerful tool that
gives information regarding the communication patterns, and the sources and sinks of in-
fluence in the team, allowing identification of those players that can most effectively influ-
ence the team (Bennet and Bennet, 2008).
Self-organizing maps – This method requires the most data preprocessing effort and is thus
recommended only for decision support during the half-time break. Additionally, it is the
most sophisticated method used in the thesis, at least considering the preprocessing effort
together with the parameters tuning required. Therefore, a specialist in this area is required
to build and test the model before it is implemented in the solution. Some of the questions
that need to be answered in advance are: a) what is the right size of the lattice; b) which are
the input variables and how can they be normalized; c) what should be the setting for each
parameter, especially learning rate and momentum; d) what is the desired error level; e)
how long to train (Delen and Sharda, 2008). It is thus important to hide the complexity of
the model from the end user (Delen and Sharda, 2008). This is equally valid for any other
complex method that can be used for decision support solution during a live match.
Process mining – Preprocessing steps are not too simple and not too complex. However,
once the data is processed, most of the calculations run fast. It is the least useful of the
methods for real-time decision support, with a few exceptions.
Finally, a solution based on data will require a trial and error period, even more so than other deci-
sion support systems and solutions. Besides, the technology will keep developing further and more
sophisticated solutions will make the whole process of data collection, integration and processing
easier.
The original framework was developed by Russell Lincoln Ackoff in 1989 and represented as a
pyramid. Data is at the bottom of the pyramid, while wisdom at the top – to show that there is a lot
of data, and little wisdom (Bernstein, 2009).
Data refers to symbolic representations of some observable properties of items or entities (Frické,
2009). Data are the unprocessed raw representations of reality (Anand and Singh, 2011). An over-
view of the data available in football nowadays, together with the most important applications of
these data, is given in Figure 65.
165
Use Case
While the thesis works with event data and timestamps, positional data via GPS devices is also
being collected in real-time. It allows to have a detailed information on the position of each player
and the ball during any moment of the game. Additionally, wearable devices specifically designed
for football, give important biometric information like heart-rate, distance, velocity speed, or oxy-
gen level. Unfortunately, at the moment, biometric data is only available during training. However,
these data can be a valuable source in calculating the fatigue rates of players during the game. It is
also related to injury prevention – for instance, one might notice that over a period of time a player
is getting increasingly tired in the 2nd half of a game, or more than their usual. A coach might de-
cide to spare him to prevent an injury for a future more important match. The contextual data are
not the most relevant category but are included for completion. Location refers to whether the team
plays home or away, but for a real-time decision scenario this does not necessarily affect the ana-
lytics processes. Weather is information the coaches would require before a match while during the
match it has perhaps minimal influence on the decision making process. Finally, social media is an
interesting source of data that is normally used for fan entertainment rather than decision making,
yet alone in real-time. Arsene Wenger, however, the former manager of Arsenal FC, predicted last
year that social media opinions of fans especially via Twitter, would end up deciding which player
should be substituted (O'Brien, 2018). Such prediction probably is far-fetched at the moment, but
there is a strong tendency towards using data when it comes to deciding on who should leave the
game.
Information is data which have been processed in a meaningful way and make decision making
possible (Anand and Singh, 2011). It is data related to each other through a context and providing a
story, as for instance, the linking of who, what, when, and where data to describe a specific person
at a specific time (Jennex, 2009). In this case, the information results from the three methods and
techniques from network science, SOM neural network clustering and process mining.
As to the information that a decision support solution should give as output, a guiding base are the
factors relevant for substitution identified in chapter 4. Most relevant factors are the performance of
a player, tactics and fatigue. Analyses of the opponent team are also considered important. An
overview of how these factors are derived from an event tracking data as well as the data analysis
methods are outlined in Table 49.
Knowledge further refines information by transforming it into instructions, which makes control of
a system possible and enables it to work efficiently (Bernstein, 2009). Knowledge is construed usu-
ally as a know-how or skill and not as know-that of propositional knowledge (Frické, 2009). In
more simple terms, knowledge explains the why and how of something or provides insight and
understanding of something (Jennex, 2009). In this case, if the CUSUM metric, for example, in-
forms the coach that a change in the betweenness centrality has occurred, and via continuous analy-
166
Use Case
sis and observation it is established that every time a certain degree of change in this metric leads to
decrease in performance – this would be the knowledge obtained.
Wisdom allows the knowledge to be applied in different and not necessarily intuitive situations
(Jennex, 2009). It is achieved by applying the knowledge into practice. It is the highest level of
abstraction, with vision, foresight and the ability to see beyond the horizon (Anand and Singh,
2011). Using the same example with the CUSUM metric, wisdom is achieved when the coach and
his staff know which actions to undertake in order to tackle decreased performance due to the
change in the betweenness centrality metric. Wisdom is not discussed here in detail because it is
reached after the information and knowledge obtained by a decision support solution are actually
implemented in practice by coaches and consequences are observed. It would be only possible to
discuss this category after a data-driven solution has been successfully implemented and used in
practice for a prolonged period of time.
In the thesis, the major focus is on data and information parts of the framework. This is elaborated
more in Table 49.
Not all analysis conducted in the thesis are useful for a real-time decision support scenario and not
all are helpful for player substitution. From Table 49, it is clear that there are various player level
network metrics that are useful in order to assess the importance of a player in a team network.
Additionally, the core network and immediate impact are very useful calculations which give
straightforward information to coaches about the relevance of a player. By using these metrics, the
coach can gain a quick and reliable information during the game, that can help him decide whether
a player should be substituted or not. Change detection is another useful metric that shows when
something in one’s own team has changed and it can help to make some timely adjustments of the
strategy. The positive aspect of this metric is that it can be based on any of the network metrics of
interest (e.g., betweenness centrality, efficiency, etc.). The triads give information on the interac-
tion patterns of the team including the direction of communication. This can be used especially for
opponent analysis during the game. For instance, when the coach knows what the strongest com-
munication triad in the opponent’s team is, he can instruct his players accordingly to be more atten-
tive of those players in the triad and perhaps interrupt the ball flow between them.
Self-organizing maps are a great analytic method which can convert a highly dimensional data into
visualization that is easier to understand. Although not the most intuitive or the quickest method, it
can be used in the half-time break to gain a quick understanding of the opponent’s team tactics.
The process mining techniques are not as useful for real-time decision support. Two visualization
techniques – the dotted and the meter chart can be used for a quick look into the activity profiles of
players. The social network analysis on the process data is the most useful technique in this case,
because it can give additional insight into the way players interact with each other. It can reveal
167
Use Case
valuable information about players from the opposite team and help the coach make tactical ad-
justments during the game.
Table 49. Data and information useful for real-time decision support
168
Use Case
9.3 Decision support solution mockup
A mockup of a decision support solution is developed. It is used to demonstrate how the mentioned
techniques for real-time decision support can be implemented in practice, and how coaches can
benefit from that information. Not all possible analyses for real-time decision support are integrated
in this mockup as the focus is primarily on decision making regarding player substitution. Howev-
er, the mockup gives an idea on how a solution for real-time decision support should look like in
practice. The mockup is presented in Figure 66 (player analysis dashboard), Figure 67 (own team
analysis dashboard), and Figure 68 (opponent analysis dashboard).
169
Use Case
170
Use Case
171
Use Case
172
Use Case
The mockup dashboards contain information about the performances of players (own team) as well
as performance and tactical information at the team level (own and opposition teams).
Figure 66 presents the player view dashboard. This view shows a list of all the players currently in
the game (left side of the dashboard). The coach can click on a player’s name in the list and he can
view information on a) the performance development of that player during the current game (in the
mockup dashboard the performance development is visible for the first 45 minutes of the game); b)
an information on an event that is relevant for that player’s position – in this case clearance as
Rooney is a midfielder (this event can be changed based on what is important for a specific player
position); and c) an overview of the events in which the player has been involved in the current
game as well as the outcome in each of these events (red for negative outcome, and green for posi-
tive). The meter chart and the dotted charts from process mining are used in this dashboard as well
as player level network metrics are used for the performance developemtn chart which shows the
overall performance of the player in the 1st half of the match.
The own team analysis dashboard presented in Figure 67 shows ranking of the players as well as a
tendency of performance improvement or performance drop in the last 15 minutes of play (left side
of the dashboard). The ranking is based on calculation of several network metrics as discussed in
section 6.5 (see Table 21). When the performance of a player is quite low compared to the rest of
his teammates, the coach gets a warning message which suggests that this player should perhaps be
considered for substitution. On the right side of the screen, additional information is displayed.
First, a change detection chart based on the CUSUM metric (see section 6.5) monitors a few net-
work metrics on team level and displays alert messages when a significant change in one of these
metrics occurs. The choice of monitored metrics depends on the coaches and this can be easily
modified. The second chart is focused on substitution analysis by using the “immediate impact”
calculation as discussed in section 6.5. It shows the impact of removing the three worst performing
players currently in the game. It gives coaches an idea of what could happen when one of these
players is removed from the game and is supposed to help them decide, especially when they are
having doubts or considering more than one player for replacement.
The opponent team analysis dashboard in Figure 68, gives information on the dynamic interac-
tions of the players from the opponent’s team. On the left side of the screen an overview of the
opponents’ interactions is given by using the Handover-of-Work metric as discussed in section 8.5.
It shows which players from the opposing team are most overwhelmed by, for instance, receiving a
lot of passes from their teammates. In this context, such information can be very valuable during a
live match because it gives a clear picture of the way the players from that team cooperate with
each other. When the coach knows which player receives more work delegated from his team-
mates, he can instruct his team to pay more attention to that player, and be closer to him. To make
this easier on the coach to interpret, additionally on the right side of the screen the most relevant
173
Use Case
variants of interactions between the players are displayed by calculating the triads as seen in sec-
tion 6.5. These analyses give an even more concise idea of how the players from the opposing team
interact with each other. For instance, based on the information from triad 1 in Figure 68, it is clear
that Gunnarsson and Sigthorsson interact often with each other, and moreover, the direction of this
communication link is primarily from Gunnarsson towards Sigthorsson. Knowing this, the coach
can instruct his players to pay attention to these two players and try to intercept the passes between
them.
The dashboards presented here are all based on part of the results presented in chapters 6, 7 and 8.
Those analyses are done based on the factors relevant for decision making in live matches identi-
fied in chapter 4. When a solution is developed for a club, one of the first requirements is to decide
which information exactly needs to be displayed, i.e. what does the coach need to know to make
tactical adjustments during the game. As soon as this is clear, analytics options can be discussed in
order to be able to derive the needed information. As mentioned in section 9.1.2, it is important to
keep in mind the amount of pre-processing steps the chosen methods would require, as well as the
type and amount of data needed and finally the amount of time needed for the actual analysis. Most
importantly, every data solution needs to be adjusted to the individual wishes of the coach who is
supposed to use the results, and not vice versa (Carling et al., 2005).
In any case, as Carling et al. (2005) mention, errors can occur most likely due to:
Disagreements on match criteria or definitions of game actions. For instance, if the coach
wants to see successful attacks, it has to be clearly defined what this means exactly – and it
can mean different things to the performance analyst or the coach.
Difficulty or misinterpretations in understanding the results. This happens mostly by using
statistics that poorly represent the actual performance of the players.
Inadequate, too many, irrelevant, inaccurate or poorly presented results. When in doubt, it
is best to keep it simple and concise, and not create too overwhelming dashboards.
Finally, all the results in the dashboards need to present the state of the team and player perfor-
mance in specific time intervals. These can be determined by the coaches themselves. For instance,
the player ranking in Figure 67 is based on continuous calculation of several network metrics at the
player level in an interval of 5 minutes. If the coach would like to have the latest status, however,
he can click on the refresh button in the upper right corner. In this way, he can always see the latest
calculations regarding the information he is interested in.
174
Conclusion
10 Conclusion
175
Conclusion
mining. Each of these analytics methods has advantages and disadvantages and takes a different
perspective of the team or player performances.
Social network analysis focuses on the interactions between the players during the game. Specifi-
cally, the passes between the players as the main form of interaction that occurs during a football
match. Network science is a huge field of research and there are quite a few concepts and metrics
that can be tested in a football scenario. The literature on football performance analysis and net-
work science has so far focused on a limited number of network metrics and their usefulness in
football (see section 6.2). The thesis demonstrates concretely the usefulness of new metrics such as
the authority and network assortativity metrics, the CUSUM change detection metric, the concepts
of core network and immediate impact, especially for the decision of player substitution. Addition-
ally, the triadic relationships are discussed as well as the potential of community detection algo-
rithms and their (non)usefulness in a football scenario (see section 6.5). All of these network met-
rics and concepts are used for the first time for football performance analysis and demonstrate con-
cretely how they can be applied for decision making by coaches.
Self-organizing maps (SOMs) are a special type of unsupervised neural network architecture that
has been recommended by several authors in the performance analysis literature in football, but not
widely applied on actual data. One of the reasons is the lack of data availability, as the method usu-
ally requires a lot of data. In this thesis, the method is used on real event tracking data to demon-
strate how tactical differences between favorite and underdog teams can be quickly revealed by
SOM cluster analysis. It is a useful method for converting high dimensional data into understanda-
ble, two dimensional maps. It is not the most convenient method for real-time decision support but
it could be useful, for instance, for quick tactical insights of the opposition team during the half-
time break (see chapter 7).
Various techniques and algorithms from the area of process mining are applied on event data in
chapter 8. These techniques consider the action/event sequences in order to describe the behavior
of the players and teams. Process mining offers useful visual analytics techniques such as the dot-
ted chart, the meter chart as well as filtering out specific sequences that are of interest to coaches.
The thesis demonstrates how by filtering all offensive sequences it is possible to find out immedi-
ately which players are mostly initiating or ending an attack. This is an important information that
reveals the value of a player (see section 8.5). Additionally, self-organizing maps are used for clus-
tering event sequences to reveal the behavior of teams. It gives an impression on whether the team
exhibits a more creative or more similar behavior (i.e. sequences are more similar to each other).
Process mining has not been used previously for analysis of event tracking data in any sport. Thus,
this chapter also has a methodological contribution to the field. It demonstrates a new type of
method for analysis of player and team behavior in football.
176
Conclusion
Finally, the last chapter puts the results from the qualitative and quantitative studies into perspec-
tive and discusses their usefulness for real-time decision support. Not all of the results are useful
for real-time tactical decision support and especially player substitution. Those methods and tech-
niques that are useful, are integrated in a mockup to illustrate a real-time decision support solution.
It demonstrates how results from analytics methods used in the thesis can be integrated into a deci-
sion support system. These kind of solutions have not been thoroughly discussed in the literature.
There are quite a few options for future research in this regard.
10.2 Limitations
One limitation is that only event data is used in the analysis. Even though this is a detailed account
of the game, it does not give a full picture of the performance and behavior of the players and
teams. Combined with biometric data, it can offer a very detailed overview of all facets of the play-
er and team performances. As biometric data is currently not available during live games, there is,
unfortunately, no possibility to demonstrate how all these data can be used for decision support
during the game. Furthermore, the thesis uses methods from network science, neural networks and
process mining to gain insights from event data. There are certainly more analytics methods and
techniques that can be applied on event data in football to gain further insights (see section 10.3).
Finally, the mockup in chapter 9 focuses on player substitution and integrates only parts of the
techniques used in the thesis. Most importantly, it is not tested or evaluated in practice. However,
the mockup in itself is not the main focus of the research project, and is thus not discussed in more
detail. It needs to be addressed in a separate study which will focus entirely on the design and im-
plementation of a decision support solution for real-time analysis.
177
Conclusion
the team that act as hubs, a football team can be considered to have the properties of a scale-free
network. However, there is not enough evidence to support this view, yet.
Self-organizing maps – this method has been applied to movement analysis in football, while the
thesis demonstrates how it can be used to gain a quick overview of the tactical patterns of a favorite
versus underdog teams. In addition, SOMs were used to cluster action sequences in the process
mining chapter. Future studies could cluster teams and players based on variables from social net-
work analysis. One option is to cluster players based on their network metrics values in a game and
try to find out whether there is a connection between the values of some network metrics and win-
ning or losing the game.
Process mining – the thesis shows that the discovery part of process mining offers useful tech-
niques and algorithms to find behavior patterns of players and teams based on action sequences.
The other perspective of process mining is conformance checking. This type of process mining is
not feasible for football performance analysis at it requires a comparison between an event log and
the required process model. In football, however processes are different than in a business organi-
zation. They are more stochastic and cannot be pre-determined. The third type of process mining,
however, called enhancement, can be investigated in more detail in future studies. Enhancement
requires an event log as well as a model and as a result gives a new “enhanced” model. In football,
an event log can be enhanced with a process model that, for instance, has more passes. This can be
used as a simulation analysis to test how different types of models would affect the existing one.
Nonlinear time series analysis – this type of analytics methods is suggested in the literature as
potentially valuable for assessing and explaining performance in football. The dynamic system
theory considers a football team as a nonlinear system – the whole can differ from the sum of its
parts. Thus, nonlinear time series methods are potentially suitable methods for football perfor-
mance analysis based on this theory. Up to now, a limited number of studies have applied these
methods. Kuznetsov et al. (2014) discuss Sample Entropy as a potentially useful method, but the
authors only give a simple example of the calculation and discuss the potential benefits of the
method in a restrictive manner. Silva et al. (2016) describe how Shannon entropy, approximate
entropy and sample entropy can be useful in sports performance analysis. For the most part, the
mentioned entropy measures are used to estimate the variability in players’ movements over space
and time which can provide tactical information about the team (Silva et al., 2016). The limitation
of the existing studies is that they mostly use approximate and sample entropy, while there are oth-
er nonlinear time series methods that can be explored. Recently, the Long Short-Term Memory
(LSTM) networks (a type of neural network architecture) are gaining popularity in the analysis of
sequence data.
178
Conclusion
Design and evaluation of a decision support solution in practice – the thesis demonstrates a first
attempt at how a decision support solution can look like in practice. A few requirements for a solu-
tion are also discussed. However, future studies can provide more detailed requirements by con-
ducting more qualitative studies with football experts. It should be kept in mind, however, that a
one-size-fits-all solution is not feasible as every coach has his own need for a specific type of in-
formation he would like to obtain from the analytics solution. Professional tracking systems do not
offer a detailed view of their dashboards due to confidentiality. Some insights for designing an ana-
lytics solution for a decision support dashboard can be found in the studies by Perin et al. (2013),
Beetz et al. (2005), Rodrigues et al. (2013), and Janetzko et al. (2014). One important observation
from these studies is that while developing a decision support solution it is important to define ex-
actly what is meant by each performance indicator of interest, as well as to limit the amount of var-
iables that are measured and especially displayed on the screen. Complexity will likely lead to a
failure of the solution as the coaches would not be interested in using it, or eventually will feel
overwhelmed by the complex visualizations.
Furthermore, the solution should be developed and tested to work with different data, for example,
biometric in addition to event data. Finally, the developed solution or system needs to be continu-
ously tested in practice over a longer period of time – for instance, throughout a season. The evalu-
ation should focus not only on whether the solution works, but also on the usefulness of the dis-
played results/suggestions.
179
Appendix A
CLXXX
Appendix B
CLXXXI
Appendix B
Date Home Away Home Score Away Score Phase
6/25/2016 Switzerland Poland 1 1 Round of 16
6/25/2016 Wales N. Ireland 1 0 Round of 16
6/25/2016 Croatia Portugal 0 1 Round of 16
6/26/2016 France R. of Ireland 2 1 Round of 16
6/26/2016 Germany Slovakia 3 0 Round of 16
6/26/2016 Hungary Belgium 0 4 Round of 16
6/27/2016 Italy Spain 2 0 Round of 16
6/27/2016 England Iceland 1 2 Round of 16
6/30/2016 Poland Portugal 1 1 Quarter Finals
7/1/2016 Wales Belgium 3 1 Quarter Finals
7/2/2016 Germany Italy 1 1 Quarter Finals
7/3/2016 France Iceland 5 2 Quarter Finals
7/6/2016 Portugal Wales 2 0 Semi Finals
7/7/2016 Germany France 0 2 Semi Finals
7/10/2016 Portugal France 1 0 Final
CLXXXII
Appendix C
Measures the extent to which a node lies on Betweenness does not measure how well-connected
paths between other nodes. (Pena and a player is, but rather how the ball-flow between
Touchette, 2012, p.3). other players depends on that particular player i. It
thus provides a measure of the impact of removing
Betweenness that player from the game, either by getting a red
centrality card or by being isolated by the rival’s defense. A
betweenness score of 0 means, in particular, that a
player is not getting involved in the game, and so
can be removed without much effect. (Pena and
Touchette, 2012, p.3).
A network is considered highly centralized Network centralization refers to how unequally
when one actor is clearly more central than distributed passes are over dyads of players and
all other actors in the network. A network is single individuals. (Grund, 2016, P.1265).
decentralized when all actors have the same The closer the centralization is to 1, the more likely
node centrality. (Grund, 2016, p.1266). is the network to have a star-like topology, thus a
Centralization
tendency to play for the same player. The closer to
0, the more likely it is that the nodes of the network
have on average the same connectivity, thus repre-
senting a more homogenous type of interaction. .
(Clemente, Martins and Mendes, 2016, p.82).
The centroid can be defined as one of the A player with high centroid value compared to the
most highly connected node(s) in the net- average centroid value of the network, will be
work. (Clemente, Couceiro & Mendes, 2014, possibly involved in coordinating the activity of
Centroid
p.266). other highly connected players, altogether devoted
to the regulation of team play. (Clemente, Martins
and Mendes, 2016, p.62).
A clique is a sub-network in which all the A clique in a team represents a subset of players
nodes are linked by an arrow. The analysis of that are all pairwise-connected by direct passes.
cliques is the basis for finding communities A well connected team will present a very large
Clique within networks. (Pena and Touchette, 2012, maximal clique, meaning that almost everybody
p.4) gets to pass the ball to everybody else, whereas the
size will be smaller for more fragmented teams.
(Pena and Touchette, 2012, p.4)
Closeness centrality of a vertex is defined as It shows how close, in terms of passes, a player has
the sum of distances from all other vertices been to all other teammates during the development
presented in a graph, with this distance of the team's attack.
defined as the length of the shortest paths (Clemente, Mendes & Martins, 2014, p.584).
Closeness from one vertex to another.
This network metric provides information on adja-
centrality (Ribeiro et al., 2017, p.6). cency of one player to others, where players with
low closeness scores are adjacent to others, provid-
ing conditions for receiving flows (e.g. receive a
pass or rotate with the nearest player) more rapidly.
(Ribeiro et al., 2017, p.6).
Measures the degree of clustering in a net- Clustering coefficients provide coaches and per-
work by averaging the clustering coefficient formance analysts with knowledge about subgroups
of each node, which is defined as the density of players who coordinate their actions more fre-
of the node's ego network. (ORA Documen- quently. Globally, high values of a clustering coef-
Clustering tation File, 2018) ficient might indicate a team disposition to form
functional clusters, with players tending to create
tightly knit groups comprising high-density ties.
(Ribeiro et al., 2017, p.6).
CLXXXIII
Appendix C
Metric Meaning Interpretation
Cohesion is defined as the number of recip- It gives an indication of how often a player was
rocal connections in the network divided by involved within a network. For example, a recipro-
Cohesion the maximum number of possible connec- cal pass occurs when player A passes to player B
tions. (McLean et al., 2017, p.376). who then passes back to player A. (McLean et al.,
2017, p.376).
Degree centrality consists of the number of Players with larger centrality scores are those who
ties incident upon a node. (Ribeiro et al., contributed more to their team’s offensive attempts
2017, p.6). through their passes to the other players of their
Since in team sports players pass the ball in a team. (Clemente, Martins and Mendes, 2016, p.49).
Degree specific direction from one player to another,
centrality the degree of a vertex can be defined accord-
ing to two types of centrality: ‘indegree’
(number of passes directed to the player) and
‘outdegree’ (number of passes that the player
directs to others). (Ribeiro et al., 2017, p.6).
In graph theory, the density of a (directed) Describes the overall level of coopera-
graph is the proportion of the maximum tion/coordination between teammates. (Ribeiro et
possible links present between nodes. al., 2017)
Density (Clemente, Martins and Mendes, 2016, p.73). It measures the overall affection between team-
mates. (Clemente, Martins and Mendes, 2016,
p.73).
In graph theory, two players are connected if Quantifies the distance between the farthest two
a sequence of players exists and their connec- players in the graph. A small diameter reflects a
tions are adjacent. The diameter of a graph is low maximum distance between teammates, which
the maximum distance (the length of the may reveal that the team’s passing game was dif-
Diameter largest geodesic) between any two connected fused among most of its players (rather than a few
players. (Clemente, Martins and Mendes, acting as central ones).
2016, p.75). (Clemente, Martins, Kalamaras, Wong & Mendes,
2015, p.86).
Computes the shortest path lengths between If distances are great, it may suggest that the ball
all node pairs. If no path exists between two take a long time to move for the teammates. The
Distance nodes, then a distance of zero is given. The players who are closer to others may be able to
distance from a node to itself is also zero. exert more power than those who are more distant.
(ORA Documentation File, 2018) (Clemente, Martins and Mendes, 2016, p.74).
Eigenvector centrality measures the influence A player with a very high Eigenvector is a player
of a vertex in a graph. (Ribeiro et al., 2017). interacting with several important teammates, thus
suggesting a central regulatory role. A player with
Eigenvector
low Eigenvector, can be considered a peripheral
centrality
teammate, interacting with few and not central
players. (Clemente, Martins and Mendes, 2016,
p.57).
CLXXXIV
Appendix C
Metric Meaning Interpretation
Network intensity refers to the total number
Network of passes made by a team in a particular
intensity match (per minute ball possession).
(Grund, 2016, p.1265).
Pagerank centrality is a recursive notion of Page Rank is the probability that each player will
‘popularity’ or importance which follows the have the ball after a reasonable number of passes
PageRank principle that ‘a player is popular if he gets have been made (Peña & Touchette, 2012).
passes from other popular players’. (Clemen-
te, Martins and Mendes, 2016, p.60).
Measures the separation between two verti- It can reveal how many passes are needed for the
Path length ces (e.g. players in team games) in a graph ball to traverse from one particular player to anoth-
(global property). (Ribeiro et al., 2017). er. (Ribeiro et al., 2017).
The degree prestige considers only inbound In the case of football the players with higher de-
links, it is often used as indication of the gree prestige are those to whom their teammates
“prestige” of each node among its peers. preferred to pass the ball more often. (Clemente et
Nodes with high degree prestige are those al., 2016, p.380).
Prestige that receive many inbound links from other These players might possibly be the ones crucial
nodes. (Clemente et al., 2016, p.380) for their team’s offensive development because
they receive the ball more often than other players
during their team’s attempt to attack. (Clemente,
Martins and Mendes, 2016, p.56).
Reciprocity is the tendency for mutuality in Reciprocity measures the tendency of players’ pairs
relations between people in a network. This to form mutual connections between each other.
is a key social process, indicated by the ac- (Clemente, Martins and Mendes, 2016, p.80).
Reciprocity
ceptance of a handshake, or the philosophy
“you scratch my back, I scratch yours”.
(Lusher & Robins, 2010, p. 218).
Transitivity is the tendency to form triadic This measure allows to identify balanced triads and
relations with others. It gives some indication to identify the “equilibrium” or natural state toward
of how the network as a whole may be held which triadic relationships tend. Transitivity allows
Transitivity together and is the social mechanism that identifying the capacity to the triad of players act in
leads to cohesion or clustering in a network. a balance way and not with tendencies such as pass
(Lusher & Robins, 2010, p. 219). for the same player. (Clemente, Martins and
Mendes, 2016, p.80).
CLXXXV
Appendix D
Density The ratio of the number of links versus the maximum possible links for a
network.
The maximum shortest path length between any two nodes in a unimodal
Diameter network. If there exists a node that is not reachable from another node,
then the diameter is technically infinite. In this case, the Diameter re-
turned is V*N where V is the maximum link value in the network.
Computes the degree to which something could be easily diffused
Diffusion (spread) throughout the network. This is based on the shortest path length
between nodes. A large diffusion value means that nodes are close to each
other, and a smaller diffusion value means that nodes are farther apart.
The degree to which a unimodal network exhibits a pure hierarchical
Hierarchy structure, meaning, if there is a path from nodes A to C, there is not also a
path from C to A.
Interdependence The fraction of links in a unimodal network that are Pooled or Reciprocal.
Network Centralization-Betweenness Network centralization based on the betweenness score for each node in a
square network.
Network Centralization-Closeness Network centralization based on the closeness centrality of each node in a
square network.
Network Centralization-Eigenvector Network centralization based on the eigenvector centrality of each node in
a square network.
Network Centralization-Total Degree Network centralization based on the total-degree centrality of each node
in a square network.
The fraction of links in the network that go in both directions.
Reciprocity
Transitivity The fraction of link pairs {(i,j), (j,k)} in the network such that (i,k) is also
an link in the network.
Source: ORA Documentation File, 2018
CLXXXVI
Appendix E
k-Means
This clustering aims to partition n observations into k sets (k ? n) S = {S1, S2, ., Sk} so as to minimize the within-
cluster sum of squares.
Source: ORA Documentation File, 2018
CLXXXVII
Appendix F
CLXXXVIII
Appendix G
where p (xi | xi−1; Ck) is the transition probability from x i−1 to xi in the Markov chain associated
with cluster Ck.
The goal of sequence clustering is to estimate these parameters for all clusters Ck (with k = 1, 2, .
. . , K) based on a set of input sequences. For that purpose, the algorithm relies on an Expecta-
tion–Maximization procedure [11] to improve the model parameters iteratively. For a given
number of clusters K the algorithm proceeds as follows:
1. Initialize randomly the state transition probabilities of the Markov chains associated with
each cluster.
2. For all input sequences, assign each sequence to the cluster that can produce it with high-
er probability according to equation (1).
3. Compute the state transition probabilities of the Markov chain of each cluster, consider-
ing the sequences that were assigned to that cluster in step 2.
4. Repeat steps 2 and 3 until the assignment of sequences to clusters does not change, and
hence the cluster models do not change either.
CLXXXIX
Appendix H
CXC
Appendix H
CXCI
References
References
Anand, A. and Singh, M. D. (2011) ‘Understanding knowledge management: a literature review’,
International journal of engineering science and technology, vol. 3, no. 2, pp. 926–939.
Anderson, C. (2010) ‘Presenting and Evaluating Qualitative Research’, American journal of phar-
maceutical education, vol. 74, no. 8, p. 141 [Online]. Available at https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC2987281/.
Anderson, C. and Sally, D. (2014) The numbers game: Why everything you know about football is
wrong, London, Penguin Books.
Bartlett, R. (2001) ‘Performance analysis: Is it the bringing together of biomechanics and notation-
al analysis or an illusion?’, 19 International Symposium on Biomechanics in Sports. San Francisco,
Exercise & Sport Science Dept., University of San Francisco, pp. 328–331.
Bartlett, R., Button, C., Robins, M., Dutt-Mazumder, A. and Kennedy, G. (2012) ‘Analysing Team
Coordination Patterns from Player Movement Trajectories in Soccer: Methodological Considera-
tions’, International Journal of Performance Analysis in Sport, vol. 12, no. 2, pp. 398–424.
Bartling, B., Brandes, L. and Schunk, D. (2015) ‘Expectations as Reference Points: Field Evidence
from Professional Soccer’, Management Science, vol. 61, no. 11, pp. 2646–2661.
Bedi, P. and Sharma, C. (2016) ‘Community detection in social networks’, Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, vol. 6, no. 3, pp. 115–135.
Beetz, M., Kirchlechner, B. and Lames, M. (2005) ‘Computerized Real-Time Analysis of Football
Games’, IEEE Pervasive Computing, vol. 4, no. 3, pp. 33–39.
Benito Santos, A., Theron, R., Losada, A., Sampaio, J. E. and Lago-Peñas, C. (2018) ‘Data-Driven
Visual Performance Analysis in Soccer: An Exploratory Prototype’, Frontiers in Psychology,
vol. 9, p. 2416.
Ben-Naim, E., Vazquez, F. and Redner, S. (2007) ‘What is the most competitive sport?’, J. Korean
Phys. Soc. 50 [Online]. Available at http://arxiv.org/pdf/physics/0512143v1.
Bennet, A. and Bennet, D. (2008) ‘The Decision-Making Process in a Complex Situation’, in
Burstein, F. and Holsapple, C. W. (eds) Handbook on decision support systems, Berlin, London,
Springer, pp. 3–20.
Bernstein, J. H. (2009) ‘The data-information-knowledge-wisdom hierarchy and its antithesis’,
Proceedings North American Symposium on Knowledge Organization. Syracuse, NY, pp. 68–75.
Bigus, J. P. (1996) Data mining with neural networks: Solving business problems--from applica-
tion development to decision support, New York, McGraw-Hill.
CXCII
References
Bonaccorso, G. (2018) Mastering machine learning algorithms: Expert techniques to implement
popular machine learning algorithms and fine-tune your models, Birmingham, UK, Packt Publish-
ing.
Borrie, A., Jonsson, G. K. and Magnusson, M. S. (2002) ‘Temporal pattern analysis and its ap-
plicability in sport: An explanation and exemplar data’, Journal of sports sciences, vol. 20, no. 10,
pp. 845–852.
Bose, R. P. J. C. and van der Aalst, W. M.P. (2009) ‘Context Aware Trace Clustering: Towards
Improving Process Mining Results’, in Apte, C., Park, H., Wang, K. and Zaki, M. J. (eds) Proceed-
ings of the 2009 SIAM International Conference on Data Mining, Philadelphia, PA, Society for
Industrial and Applied Mathematics, pp. 401–412.
Bradley, P. S., Lago-Peñas, C. and Rey, E. (2014) ‘Evaluation of the match performances of substi-
tution players in elite soccer’, International journal of sports physiology and performance, vol. 9,
no. 3, pp. 415–424.
Bradley, P. S. and Noakes, T. D. (2013) ‘Match running performance fluctuations in elite soccer:
Indicative of fatigue, pacing or situational influences?’, Journal of sports sciences, vol. 31, no. 15,
pp. 1627–1638.
Buddhika, G. (2016) Evaluation of Trace Clustering techniques in Process Mining to detect normal
and exceptional behavior, University of Ruhuna SC/2012/8565 [Online]. Available at https://
www.academia.edu/31068869/
Evalua-
tion_of_Trace_Clustering_techniques_in_Process_Mining_to_detect_normal_and_exceptional_beh
avior.
Buijs, J. (2017a) Heuristics miner in ProM [Online], FutureLearn. Available at https://
www.futurelearn.com/courses/process-mining.
Buijs, J. (2017b) Social Network Analysis in ProM [Online], FutureLearn. Available at https://
www.futurelearn.com/courses/process-mining.
Button, C., Wheat, J. and Lamb, P. (2014) ‘Why coordination dynamics is relevant for studying
sport performance’, in Davids, K., Hristovski, R. and Araújo, D. (eds) Complex systems in sport,
London, New York, Routledge, pp. 44–61.
Camerino, O. F., Chaverri, J., Anguera, M. T. and Jonsson, G. K. (2012) ‘Dynamics of the game in
soccer: Detection of T-patterns’, European Journal of Sport Science, vol. 12, no. 3, pp. 216–224.
Carley, K. M. (2003) ‘Dynamic Network Analysis’, in National Research Council (ed) Dynamic
social network modeling and analysis: Workshop Summary and Papers, Washington, DC, The
National Academies Press, pp. 133–145.
CXCIII
References
Carling, C., Bloomfield, J., Nelsen, L. and Reilly, T. (2008) ‘The Role of Motion Analysis in Elite
Soccer: Contemporary Performance Measurement Techniques and Work Rate Data’, Sports medi-
cine, vol. 38, no. 10, pp. 839–862.
Carling, C., Espié, V., Le Gall, F., Bloomfield, J. and Jullien, H. (2010) ‘Work-rate of substitutes in
elite soccer: A preliminary study’, Journal of science and medicine in sport, vol. 13, no. 2,
pp. 253–255.
Carling, C., Reilly, T. and Williams, A. M. (2005) Handbook of soccer match analysis: A systemat-
ic approach to improving performance, London, Routledge.
Casarrubea, M., Jonsson, G. K., Faulisi, F., Sorbera, F., Di Giovanni, G., Benigno, A., Cresciman-
no, G. and Magnusson, M. S. (2015) ‘T-pattern analysis for the study of temporal structure of ani-
mal and human behavior: A comprehensive review’, Journal of neuroscience methods, vol. 239,
pp. 34–46.
Castañer, M., Barreira, D., Camerino, O., Anguera, M. T., Fernandes, T. and Hileno, R. (2017)
‘Mastery in Goal Scoring, T-Pattern Detection, and Polar Coordinate Analysis of Motor Skills
Used by Lionel Messi and Cristiano Ronaldo’, Frontiers in Psychology, vol. 8, no. 741, pp. 1–18.
Castellano, J., Alvarez-Pastor, D. and Bradley, P. S. (2014) ‘Evaluation of research using comput-
erised tracking systems (Amisco and Prozone) to analyse physical performance in elite soccer: A
systematic review’, Sports medicine (Auckland, N.Z.), vol. 44, no. 5, pp. 701–712.
Cavalera, C., Diana, B., Elia, M., Guldberg, K. J., Zurloni, V. and Anguera, M. T. (2015) ‘T-
pattern analysis in soccer games: Relationship between time and attack actions’, Cuadernos de
Psicología del Deporte, vol. 15, no. 1, pp. 41–50 [Online]. Available at http://revistas.um.es/cpd/
article/view/223061.
Cazabet, R. (2017) Dynamic community detection: state of the art and first empirical comparisons
[Online], Skopje, Macedonia. Available at http://cazabetremy.fr/DyNo_PDF/ECML-PKDD-2017-
talk.pdf.
Cazabet, R. and Amblard, F. (2014) ‘Dynamic Community Detection’, in Alhajj, R. and Rokne, J.
(eds) Encyclopedia of Social Network Analysis and Mining, New York, NY, Springer New York,
pp. 404–414.
ChyronHego (2017) Case Study: TRACAB Player Tracking [Online]. Available at November 28,
2017.
Cintia, P., Rinzivillo, S. and Pappalardo, L. (2015) ‘Network-based Measures for Predicting the
Outcomes of Football Games’, Proceedings of the 2nd Workshop on Machine Learning and Data
Mining for Sports Analytics co-located with 2015 European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2015), Porto, Portu-
gal, September 11th, 2015, pp. 46–54.
CXCIV
References
Clemente, F. M., Couceiro, M. S., Martins, F. M. L. and Mendes, R. (2013) ‘An Online Tactical
Metrics Applied to Football Game’, Research Journal of Applied Sciences, Engineering and Tech-
nology, vol. 5, no. 5, pp. 1700–1719.
Clemente, F. M., Couceiro, M. S., Martins, F. M. L., Mendes, R. and Figueiredo, A. J. (2013)
‘Measuring Tactical Behaviour Using Technological Metrics: Case Study of a Football Game’,
International Journal of Sports Science and Coaching, vol. 8, no. 4, pp. 723–739 [Online].
DOI: 10.1260/1747-9541.8.4.723.
Clemente, F. M., Couceiro, M. S., Martins, F. M. L. and Mendes, R. S. (2014) ‘Using network met-
rics to investigate football team players' connections: A pilot study’, Motriz: Revista de Educação
Física, vol. 20, no. 3, pp. 262–271.
Clemente, F. M., Couceiro, M. S., Martins, F. M. L. and Mendes, R. S. (2015) ‘Using network met-
rics in soccer: A macro-analysis’, Journal of human kinetics, vol. 45, pp. 123–134.
Clemente, F. M., José, F., Oliveira, N., Martins, F. M. L., Mendes, R. S., Figueiredo, A. J., Wong,
D. P. and Kalamaras, D. (2016) ‘Network structure and centralization tendencies in professional
football teams from Spanish La Liga and English Premier Leagues’, Journal of Human Sport and
Exercise, vol. 11, no. 3, pp. 376–389.
Clemente, F. M. and Martins, F. M. L. (2017) ‘Network structure of UEFA Champions League
teams: Association with classical notational variables and variance between different levels of suc-
cess’, International Journal of Computer Science in Sport, vol. 16, no. 1, pp. 39–50.
Clemente, F. M., Martins, F. M. L., Couceiro, M. S., Mendes, R. S. and Figueiredo, A. J. (2014) ‘A
network approach to characterize the teammates’ interactions on football: A single match analysis’,
Cuadernos de Psicología del Deporte, vol. 14, no. 3, pp. 141–148 [Online]. Available at http://
revistas.um.es/cpd/article/view/211401.
Clemente, F. M., Martins, F. M. L., Kalamaras, D., Oliveira, J., Oliveira, P. and Mendes, R. S.
(2015) ‘The social network analysis of Switzerland football team on FIFA World Cup 2014’, Jour-
nal of Physical Education and Sport, vol. 15, no. 1, pp. 136–141.
Clemente, F. M., Martins, F. M. L., Kalamaras, D., Wong, P. D. and Mendes, R. S. (2015) ‘General
network analysis of national soccer teams in FIFA World Cup 2014’, International Journal of Per-
formance Analysis in Sport, vol. 15, no. 1, pp. 80–96.
Clemente, F. M., Martins, F. M. L. and Mendes, R. S. (2016) Social Network Analysis Applied to
Team Sports Analysis, Cham, Springer International Publishing.
Clemente, F. M., Mendes, R. and Martins, F. M. (2014) ‘Applying centrality metrics to identify the
prominent football players’, VIII Congreso Internacional de la Asociación Española de Ciencias
del Deporte: Libro de actas. Universidad de Extremadura, pp. 583–586.
CXCV
References
Coelho, D. B., Coelho, L. G., Morandi, R. F., Ferreira-Júnior, J. B., Bouzas, J. C., Prado, L. S.,
Soares, D. D. and Silami-Garcia, E. (2012) ‘Effect of player substitutions on the intensity of sec-
ond-half soccer match play’, Revista Brasileira de Cineantropometria e Desempenho Humano,
vol. 14, no. 2, pp. 183–191.
Coleman, B. J. (2012) ‘Identifying the “Players” in Sports Analytics Research’, Interfaces, vol. 42,
no. 2, pp. 109–118 [Online]. DOI: 10.1287/inte.1110.0606.
Coscia, M., Giannotti, F. and Pedreschi, D. (2011) ‘A classification for community discovery
methods in complex networks’, Statistical Analysis and Data Mining, vol. 4, no. 5, pp. 512–546.
Cotta, C., Mora, A. M., Merelo, J. J. and Merelo-Molina, C. (2013) ‘A network analysis of the
2010 FIFA world cup champion team play’, Journal of Systems Science and Complexity, vol. 26,
no. 1, pp. 21–42.
Creswell, J. W. and Plano Clark, V. L. (2011) Designing and conducting mixed methods research,
2nd edn, Thousand Oaks, Sage.
Croft, H., Lamb, P. and Middlemas, S. (2015) ‘The application of self-organising maps to perfor-
mance analysis data in rugby union’, International Journal of Performance Analysis in Sport,
vol. 15, no. 3, pp. 1037–1046.
Davcheva, P., Schuster, B., Hille, M., Götz, R. and Zhang, J. (2016) Decision support system for
player substitution in football - prototypical implementation, Lehrstuhl Wirtschaftsinformatik, ins-
bes. im Dienstleistungsbereich. Universität Erlangen-Nürnberg.
Del Corral, J., Barros, C. P. and Prieto-Rodríguez, J. (2008) ‘The Determinants of Soccer Player
Substitutions’, Journal of Sports Economics, vol. 9, no. 2, pp. 160–172.
Delen, D. and Sharda, R. (2008) ‘Artificial Neural Networks in Decision Support Systems’, in
Burstein, F. and Holsapple, C. W. (eds) Handbook on decision support systems, Berlin, London,
Springer, pp. 557–580.
Diana, B., Zurloni, V., Elia, M., Cavalera, C. M., Jonsson, G. K. and Anguera, M. T. (2017) ‘How
Game Location Affects Soccer Performance: T-Pattern Analysis of Attack Actions in Home and
Away Matches’, Frontiers in Psychology, vol. 8, no. 1415, pp. 1–11.
Duarte, R., Araújo, D., Correia, V. and Davids, K. (2012) ‘Sports teams as superorganisms: Impli-
cations of sociobiological models of behaviour for research and practice in team sports perfor-
mance analysis’, Sports medicine (Auckland, N.Z.), vol. 42, no. 8, pp. 633–642.
Duch, J., Waitzman, J. S. and Amaral, L. A. (2010) ‘Quantifying the performance of individual
players in a team activity’, PloS one, vol. 5, no. 6, e10937.
Dutt-Mazumder, A., Button, C., Robins, A. and Bartlett, R. (2011) ‘Neural network modelling and
dynamical system theory: Are they relevant to study the governing dynamics of association football
players?’, Sports medicine (Auckland, N.Z.), vol. 41, no. 12, pp. 1003–1017.
CXCVI
References
Ferreira, D., Zacarias, M., Malheiros, M. and Ferreira, P. (2007) ‘Approaching Process Mining
with Sequence Clustering: Experiments and Findings’, in Alonso, G., Dadam, P. and Rosemann,
M. (eds) Business Process Management, Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 360–
374.
FIFA (2016a) FIFA/Coca-Cola World Ranking (April 07, 2016) [Online]. Available at https://
www.fifa.com/fifa-world-ranking/ranking-table/men/rank/id11419/.
FIFA (2016b) Wearable technology industry visits FIFA to showcase their systems [Online].
Available at https://football-technology.fifa.com/en/media-tiles/wearable-technology-industry-
visits-fifa-to-showcase-their-systems/ (Accessed 3 November 2017).
FIFA (2017a) About the IMS standard for wearable tracking devices [Online]. Available at https://
football-technology.fifa.com/en/media-tiles/about-the-ims-standard-for-wearable-tracking-devices/
(Accessed 3 November 2017).
FIFA (2017b) EPTS Electronic performance and tracking systems [Online]. Available at https://
football-technology.fifa.com/media/1031/epts_english.pdf (Accessed 29 November 2017).
FIFA (2017c) Player stats tablets to be tested live at Russia 2017 final [Online]. Available at http://
www.fifa.com/confederationscup/news/y=2017/m=7/news=player-stats-tablet-to-be-tested-live-at-
fifa-confederations-cup-2017--2899741.html (Accessed 3 November 2017).
Forouhar, A. S., Kellogg, M. M., Ohiomoba, K. and Akhmetgaliyev, E. (2015) Methods, systems
and software programs for enhanced sports analytics and applications [Online], Google Patents.
Available at https://www.google.com/patents/US20150131845.
Franks, I. M. (1993) ‘The Effects of Experience on the Detection and Location of Performance
Differences in a Gymnastic Technique’, Research Quarterly for Exercise and Sport, no. 64,
pp. 227–231.
Franks, I. M. (2004) ‘The need for feedback’, in Hughes, M. and Franks, I. M. (eds) Notational
analysis of sport: Systems for better coaching and performance in sport / edited by Mike Hughes
and Ian M. Franks, 2nd edn, London, Routledge, pp. 9–16.
Franks, I. M. and Goodman, D. (1986) ‘A systematic approach to analysing sports performance’,
Journal of sports sciences, vol. 4, no. 1, pp. 49–59 [Online]. DOI: 10.1080/02640418608732098.
Franks, I. M. and Miller, G. (1986) ‘Eyewitness testimony in sport’, Journal of Sport Behavior,
vol. 9, no. 1, pp. 38–45.
Franks, I. M. and Miller, G. (1991a) ‘Training coaches to observe and remember’, Journal of
sports sciences, vol. 9, no. 3, pp. 285–297.
Franks, I. M. and Miller, G. (1991b) ‘Training Coaches to Observe and Remember’, Journal of
sports sciences, no. 9, pp. 285–297.
CXCVII
References
Fraunhofer IIS (2017a) RedFIR: EchtzeIt Performance Analyse [Online]. Available at https://
www.iis.fraunhofer.de/content/dam/iis/de/doc/LN/Referenzprojekte/
chip%20im%20ball%20bei%20redfir.pdf (Accessed 28 November 2017).
Fraunhofer IIS (2017b) RedFIR: Real Time High Precision Wireless Tracking [Online]. Available
at https://www.iis.fraunhofer.de/en/ff/lv/lok/proj/redfir.html (Accessed 28 November 2017).
Frické, M. (2009) ‘The knowledge pyramid: a critique of the DIKW hierarchy’, Journal of Infor-
mation Science, vol. 35, no. 2, pp. 131–142.
Galletta, A. (2013) Mastering the semi-structured interview and beyond: From research design to
analysis and publication, NYU Press.
Gama, J., Passos, P., Davids, K., Relvas, H., Ribeiro, J., Vaz, V. and Dias, G. (2014) ‘Network
analysis and intra-team activity in attacking phases of professional football’, International Journal
of Performance Analysis in Sport, vol. 14, no. 3, pp. 692–708.
Garganta, J. (2009) ‘Trends of tactical performance analysis in team sports: Bridging the gap be-
tween research, training and competition’, Revista Portuguesa de Ciências do Desporto, vol. 9,
no. 1, pp. 81–89 [Online]. Available at http://www.scielo.mec.pt/
scielo.php?script=sci_arttext&pid=S1645-05232009000100008&lng=pt&nrm=iso.
Gederman, M. (2018) Real-Time Decision Making: Live Data On The Bench [Online]. Available
at https://chyronhego.com/bench-side-data/.
Géron, A. (2017) Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools,
and techniques to build intelligent systems, O'Reilly.
Geyer, H. (2009) Auswechselverhalten im Fußball: Eine theoretische und empirische Analyse,
Institut für Ökonomische Bildung, IÖB-Diskussionspapier 5/08 5/08 [Online]. Available at https://
www.wiwi.uni-muenster.de/ioeb/sites/ioeb/files/downloads/IOEB_Diskussionspapiere/ioeb_5-
08.pdf (Accessed 15 March 2015).
Given, L. M. (2015) 100 questions (and answers) about qualitative research, Thousand Oaks,
SAGE Publications, Inc.
Glendenning, B. (2016) ‘England vs. Iceland. Minute-by-minute report’, The Guardian, 2016
[Online]. Available at https://www.theguardian.com/football/live/2016/jun/27/england-v-iceland-
euro-2016-live.
GlobeNewswire (2017) Worldwide $3.97 Billion Sports Analytics Market 2016-2022: Major Play-
ers are Stats, Catapult Sports, SportRadar, SAP, IBM, SAS, Tableau and Accenture [Online],
GlobeNewswire. Available at https://globenewswire.com/news-release/2017/01/12/905449/0/en/
Worldwide-3-97-Billion-Sports-Analytics-Market-2016-2022-Major-Players-are-Stats-Catapult-
Sports-SportRadar-SAP-IBM-SAS-Tableau-and-Accenture.html (Accessed 7 August 2017).
CXCVIII
References
Goldschmied, N. P. and Vandello, J. A. (2012) ‘The Future is Bright: The Underdog Label, Avail-
ability, and Optimism’, Basic and Applied Social Psychology, vol. 34, no. 1, pp. 34–43.
Gomez, M.-A., Lago-Peñas, C. and Owen, L. A. (2016) ‘The influence of substitutions on elite
soccer teams’ performance’, International Journal of Performance Analysis in Sport, vol. 16, no. 2,
pp. 553–568.
Gregory, S. (2017) ‘How we assign credit in football’, OPTA Pro Blog, 2017 [Online]. Available
at http://www.optasportspro.com/about/optapro-blog/posts/2017/blog-how-we-assign-credit-in-
football/.
Gréhaigne, J.-F. and Godbout, P. (1995) ‘Tactical Knowledge in Team Sports From a Constructiv-
ist and Cognitivist Perspective’, Quest, vol. 47, no. 4, pp. 490–505.
Gréhaigne, J.-F., Godbout, P. and Bouthier, D. (1999) ‘The Foundations of Tactics and Strategy in
Team Sports’, Journal of Teaching in Physical Education, vol. 18, no. 2, pp. 159–174.
Grund, T. U. (2012) ‘Network structure and team performance: The case of English Premier
League soccer teams’, Social Networks, vol. 34, no. 4, pp. 682–690.
Grund, T. U. (2016) ‘The Relational Value of Network Experience in Teams’, American Behavior-
al Scientist, vol. 60, no. 10, pp. 1260–1280.
Grunz, A., Memmert, D. and Perl, J. (2009) ‘Analysis and Simulation of Actions in Games by
Means of Special Self-Organizing Maps’, International Journal of Computer Science in Sport,
vol. 8, no. 1, pp. 22–36.
Grunz, A., Memmert, D. and Perl, J. (2012) ‘Tactical pattern recognition in soccer games by means
of special self-organizing maps’, Human movement science, vol. 31, no. 2, pp. 334–343.
Gudmundsson, J. and Horton, M. (2017) ‘Spatio-Temporal Analysis of Team Sports’, ACM Com-
puting Surveys, vol. 50, no. 2, pp. 1–34.
Günther, C. W. (2009) Process Mining in Flexible Environments, Dissertation, Eindhoven Univer-
sity of Technology.
Heckman, S. and Williams, L. (2011) ‘A systematic literature review of actionable alert identifica-
tion techniques for automated static code analysis’, Information and Software Technology, vol. 53,
no. 4, pp. 363–387.
Hirotsu, N., Ito, M., Miyaji, C., Hamano, K. and Taguchi, A. (2009) ‘Modeling Tactical Changes
of Formation in Association Football as a Non-Zero-Sum Game’, Journal of Quantitative Analysis
in Sports, vol. 5, no. 3.
Hirotsu, N. and Wright, M. (2002) ‘Using a Markov process model of an association football match
to determine the optimal timing of substitution and tactical decisions’, Journal of the Operational
Research Society, vol. 53, no. 1, pp. 88–96 [Online]. DOI: 10.1057/palgrave.jor.
CXCIX
References
Hirotsu, N. and Wright, M. (2003) ‘Determining the Best Strategy for Changing the Configuration
of a Football Team’, The Journal of the Operational Research Society, vol. 54, no. 8, pp. 878–887.
Hirotsu, N. and Wright, M. B. (2006) ‘Modeling Tactical Changes of Formation in Association
Football as a Zero-Sum Game’, Journal of Quantitative Analysis in Sports, vol. 2, no. 2.
Holme, P. (2015) ‘Modern temporal network theory: A colloquium’, The European Physical Jour-
nal B, vol. 88, no. 234.
Holme, P. and Saramäki, J. (2012) ‘Temporal networks’, Physics Reports, vol. 519, no. 3, pp. 97–
125.
Hompes, B. F.A., Buijs, J. C.A.M., van der Aalst, W. M.P., Dixit, P. M. and Buurman, J. (2015)
‘Discovering deviating cases and process variants using trace clustering’, 27th Benelux Conference
on Artificial Intelligence (BNAIC 2015), 5 - 6 November, 2015. Hasselt, Belgium.
Hristovski, R., Serre, N. B. and Schöllhorn, W. (2014) ‘Basic notions in the science of complex
systems and nonlinear dynamics’, in Davids, K., Hristovski, R. and Araújo, D. (eds) Complex sys-
tems in sport, London, New York, Routledge, pp. 3–17.
Hughes, M. (2008) ‘An Overview of the Development of Notational Analysis’, in Hughes, M. and
Franks, I. M. (eds) The essentials of performance analysis: An introduction, Milton Park, Abing-
don, Oxon, New York, Routledge, pp. 51–84.
Hughes, M. and Franks, I. M. (2004) ‘Notational analysis—a review of the literature’, in Hughes,
M. and Franks, I. M. (eds) Notational analysis of sport: Systems for better coaching and perfor-
mance in sport / edited by Mike Hughes and Ian M. Franks, 2nd edn, London, Routledge, pp. 57–
101.
Hughes, M. D. and Bartlett, R. M. (2002) ‘The use of performance indicators in performance anal-
ysis’, Journal of sports sciences, vol. 20, no. 10, pp. 739–754.
Hunziker, A. (2017) Package "GrowingSOM" (0.1.1) [Computer program]. Available at https://
mran.microsoft.com/snapshot/2017-05-21/web/packages/GrowingSOM/GrowingSOM.pdf.
IETFaraday (2008) Analysing The Game (Opta Sportsdata), YouTube.
Ingvaldsen, J. E. and Gulla, J. A. (2008) ‘Preprocessing Support for Large Scale Process Mining of
SAP Transactions’, Business process management workshops. BPM 2007., Springer, Berlin, Hei-
delberg, pp. 30–41.
James, N. (2006) ‘Notational analysis in soccer: Past, present and future’, International Journal of
Performance Analysis in Sport, vol. 6, no. 2, pp. 67–81.
Janetzko, H.'o., Sacha, D., Stein, M., Schreck, T., Keim, D. A. and Deussen, O. (2014) ‘Feature-
driven visual analytics of soccer data’, 2014 IEEE Conference on Visual Analytics Science and
Technology (VAST). Paris, IEEE, pp. 13–22.
CC
References
Janković, A. and Leontijević, B. (2006) ‘Substitution of players in function of efficiency increase
of tactic play plan in football’, Fizička kultura, vol. 60, no. 2, pp. 165–172.
Jennex, M. E. (2009) ‘Re-Visiting the Knowledge Pyramid’, 42nd Hawaii International Confer-
ence on System Sciences. Waikoloa, Hawaii, USA, IEEE, pp. 1–7.
Jonsson, G. K., Anguera, M. T., Sanchez-Algarra, P., Olivera, C., Campanico, J., Castaner, M.,
Torrents, C., Dinusova, M., Chaverri, J., Camerino, O. and Magnusson, M. S. (2010) ‘Application
of T-Pattern Detection and Analysis in Sports Research’, The Open Sports Sciences Journal, vol. 3,
pp. 95–104.
Joslyn, L. R., Joslyn, N. J. and Joslyn, M. R. (2017) ‘What Delivers an Improved Season in Men's
College Soccer?: The Relative Effects of Shots, Attacking and Defending Scoring Efficiency on
Year-to-Year Change in Season Win Percentage’, Sport Journal, vol. 19 [Online]. Available
at http://thesportjournal.org/article/what-delivers-an-improved-season-in-mens-college-soccer-the-
relative-effects-of-shots-attacking-and-defending-scoring-efficiency-on-year-to-year-change-in-
season-win-percentage/.
Kempe, M., Grunz, A. and Memmert, D. (2015) ‘Detecting tactical patterns in basketball: compari-
son of merge self-organising maps and dynamic controlled neural networks’, European Journal of
Sport Science, vol. 15, no. 4, pp. 249–255.
Kitchenham, B. (2004) Procedures for Performing Systematic Reviews: Joint Technical Report
[Online], Software Engineering Group, Department of Computer Science, Keele University
(TR/SE-0401). Available at https://pdfs.semanticscholar.org/d36f/
e9d7839596d58fa008121db57dc7cadda338.pdf?_ga=2.108699457.215578761.1512665710-
1501334832.1512665710 (Accessed 7 December 2017).
Kitchenham, B. and Charters, S. (2007) Guidelines for performing Systematic Literature Reviews
in Software Engineering [Online], Keele University and Durham University Joint Report (EBSE
2007-001) (Accessed 7 December 2017).
Kolaczyk, E. D. and Csárdi, G. (2014) Statistical Analysis of Network Data with R, New York,
Springer New York.
Kröckel, P. (2017) ‘Decision Support Enhancement for Player Substitution in Football: A Design
Science Approach’, in Abramowicz, W., Alt, R. and Franczyk, B. (eds) Business Information Sys-
tems Workshops: BIS 2016 International Workshops, Leipzig, Germany, July 6-8, 2016, Revised
Papers, Cham, Springer International Publishing, pp. 357–366.
Kröckel, P. and Piazza, A. (2017) ‘Tactical Insights from an Underdog Team: Network analysis of
Iceland in the Euro 2016 against the teams of Portugal and England’, European Conference on
Social Networks (EUSN) [Online]. Available at https://www.eusn2017.uni-mainz.de/files/2016/08/
EUSN2017_Book-of-Abstracts_25_09.pdf.
CCI
References
Kröckel, P., Piazza, A. and Neuhofer, K. (2017) ‘Dynamic Network Analysis of the Euro2016 Fi-
nal: Preliminary Results’, 2017 5th International Conference on Future Internet of Things and
Cloud Workshops (FiCloudW). Prague, 21-23 August, IEEE, pp. 114–119.
Kuper, S. and Szymanski, S. (2014) Soccernomics: Why England loses, why Germany and Brazil
win, and why the U.S., Japan, Australia, Turkey and even India are destined to become the kings of
the world's most popular sport, London, HarperSport.
Kuznetsov, N., Bonnette, S. and Riley, M. A. (2014) ‘Nonlinear time series methods for analyzing
behavioural sequences’, in Davids, K., Hristovski, R. and Araújo, D. (eds) Complex systems in
sport, London, New York, Routledge.
Laird, P. and Waters, L. (2008) ‘Eyewitness Recollection of Sport Coaches’, International Journal
of Performance Analysis in Sport, vol. 8, no. 1, pp. 76–84.
Lames, M. and McGarry, T. (2007) ‘On the search for reliable performance indicators in game
sports’, International Journal of Performance Analysis in Sport, vol. 7, no. 1, pp. 62–79 [Online].
Available at http://www.ingentaconnect.com/content/uwic/ujpa/2007/00000007/00000001/
art00008.
Larose, D. T. and Larose, C. D. (2015) Data mining and predictive analytics, Hoboken, New Jer-
sey, John Wiley & Sons.
Leemans, S. J.J., Fahland, D. and van der Aalst, W. M.P. (2014) Process and Deviation Explora-
tion with Inductive visual Miner [Online]. Available at http://www.processmining.org/_media/
blogs/pub2014/bpmdemoleemans.pdf.
Lees, A. (2002) ‘Technique analysis in sports: A critical review’, Journal of sports sciences,
vol. 20, no. 10, pp. 813–828.
Lenahan, T. and Solari, S. (2002) ‘The right players, right system: Choosing lineups, changing sys-
tems, making substitutions in Real Madrid's title run’, Soccer Journal, vol. 47, no. 4, pp. 17–21.
Lerman, K., Ghosh, R. and Kang, J. H. (2010) Centrality Metric for Dynamic Networks [Online].
Available at http://arxiv.org/pdf/1006.0526v1.
Lewis, T. (2014) ‘How computer analysts took over at Britain's top football clubs’, The Guardian,
9 March [Online]. Available at https://www.theguardian.com/football/2014/mar/09/premier-league-
football-clubs-computer-analysts-managers-data-winning (Accessed 7 August 2017).
Li, S., Xie, Y., Farajtabar, M., Verma, A. and Le Song (2017) ‘Detecting Changes in Dynamic
Events Over Networks’, IEEE Transactions on Signal and Information Processing over Networks,
vol. 3, no. 2, pp. 346–359.
Li, W., Guo, D., Steeg, G. V. and Galstyan, A. (2017) Unifying Local and Global Change Detec-
tion in Dynamic Networks [Online]. Available at http://arxiv.org/pdf/1710.03035v1.
CCII
References
Liu, H., Hopkins, W., Gómez, A. M. and Molinuevo, S. J. (2013) ‘Inter-operator reliability of live
football match statistics from OPTA Sportsdata’, International Journal of Performance Analysis in
Sport, vol. 13, no. 3, pp. 803–821 [Online]. DOI: 10.1080/24748668.2013.11868690.
Lusher, D., Robins, G. and Kremer, P. (2010) ‘The Application of Social Network Analysis to
Team Sports’, Measurement in Physical Education and Exercise Science, vol. 14, no. 4, pp. 211–
224.
Mackenzie, R. and Cushion, C. (2013) ‘Performance analysis in football: A critical review and
implications for future research’, Journal of sports sciences, vol. 31, no. 6, pp. 639–676.
Macrae, D. (2017) STATS Edge Offers Football Clubs AI-Assisted Data Analytics On Opponents
[Online]. Available at http://www.silicon.co.uk/data-storage/bigdata/football-big-data-
218319?inf_by=5979c6c0671db807348b46b6 (Accessed 7 August 2017).
Magnusson, M. S. (2000) ‘Discovering hidden time patterns in behavior: T-patterns and their de-
tection’, Behavior Research Methods, Instruments, & Computers, vol. 32, no. 1, pp. 93–110.
Malik, S. (2005) Enterprise dashboards: Design and best practices for IT, Hoboken, N.J., Wiley.
Malyon, E. (2016) ‘Euro 2016 power rankings: All 24 teams rated ahead of the March international
friendlies’, Mirror, 23 March [Online]. Available at https://www.mirror.co.uk/sport/football/news/
euro-2016-power-rankings-24-7600082.
Marchette, D. (2012) ‘Scan statistics on graphs’, Wiley Interdisciplinary Reviews: Computational
Statistics, vol. 4, no. 5, pp. 466–473.
Mariano, D. C. B., Leite, C., Santos, L. H. S., Rocha, R. E. O. and Melo-Minardi, R. C. d. (2017) A
guide to performing systematic literature reviews in bioinformatics [Online]. Available at https://
arxiv.org/abs/1707.05813.
Maslovat, D. and Franks, I. M. (2008) ‘The need for feedback’, in Hughes, M. and Franks, I. M.
(eds) The essentials of performance analysis: An introduction, Milton Park, Abingdon, Oxon, New
York, Routledge, pp. 1–7.
Mayring, P. (2014) Qualitative content analysis: Theoretical foundation, basic procedures and
software solution [Online], Klagenfurt. Available at http://nbn-resolving.de/urn:nbn:de:0168-ssoar-
395173.
McCulloh, I. (2009) Detecting Changes in a Dynamic Social Network, Doctoral dissertation, Pitts-
burgh, PA, USA, Carnegie Mellon University [Online]. Available at https://dl.acm.org/
citation.cfm?id=1714092.
McCulloh, I. and Carley, K. M. (2008a) Dynamic Network Change Detection [Online]. Available
at http://www.casos.cs.cmu.edu/publications/papers/2008DynamicNetworkChange.pdf (Accessed 3
January 2018).
CCIII
References
McCulloh, I. and Carley, K. M. (2008b) ‘Social Network Change Detection’, SSRN Electronic
Journal [Online]. DOI: 10.2139/ssrn.2726799.
McCulloh, I. and Carley, K. M. (2011) ‘Detecting Change in Longitudinal Social Networks’, Jour-
nal of Social Structure, vol. 12, no. 3, pp. 1–37 [Online]. Available at https://www.cmu.edu/joss/
content/articles/volindex.html.
McDonald, M. (1984) ‘Avoiding the pitfalls of player selection’, Coaching Science Update,
pp. 41–45.
McGann, R. (2014) ‘The use of GPS in sport’ [Blog], Online, Metrifit. Available at http://
metrifit.com/blog/gps-in-sport/ (Accessed 29 November 2017).
McGarry, T. (2009) ‘Applied and theoretical perspectives of performance analysis in sport: Scien-
tific issues and challenges’, International Journal of Performance Analysis in Sport, vol. 9, no. 1,
pp. 128–140 [Online]. Available at http://www.ingentaconnect.com/content/uwic/ujpa/2009/
00000009/00000001/art00011.
McGarry, T., Anderson, D. I., Wallace, S. A., Hughes, M. D. and Franks, I. M. (2002) ‘Sport com-
petition as a dynamical self-organizing system’, Journal of sports sciences, vol. 20, no. 10,
pp. 771–781.
McLean, S., Salmon, P. M., Gorman, A. D., Naughton, M. and Solomon, C. (2017) ‘Do inter-
continental playing styles exist?: Using social network analysis to compare goals from the 2016
EURO and COPA football tournaments knock-out stages’, Theoretical Issues in Ergonomics Sci-
ence, vol. 18, no. 4, pp. 370–383.
Memmert, D., Lemmink, K. A. P. M. and Sampaio, J. (2017) ‘Current Approaches to Tactical Per-
formance Analyses in Soccer Using Position Data’, Sports medicine (Auckland, N.Z.), vol. 47,
no. 1, pp. 1–10.
Memmert, D. and Raabe, D. (2017) Revolution im Profifußball: Mit Big Data zur Spielanalyse 4.0,
Berlin, Springer.
Mendes, R. S., Clemente, F. M. and Martins, F. M. L. (2015) ‘Network analysis of Portuguese
team on FIFA WorldCup 2014’, Revista de Ciencias del Deporte, vol. 11, no. 2, pp. 225–226.
Moher, D., Liberati, A., Tetzlaff, J. and Altman, D. G. (2009) ‘Preferred reporting items for sys-
tematic reviews and meta-analyses: The PRISMA statement’, PLoS medicine, vol. 6, no. 7,
e1000097.
Moody, J., McFarland, D. and Bender‐deMoll, S. (2005) ‘Dynamic Network Visualization’, Amer-
ican Journal of Sociology, vol. 110, no. 4, pp. 1206–1241 [Online]. DOI: 10.1086/421509.
Myers, B. R. (2012) ‘A Proposed Decision Rule for the Timing of Soccer Substitutions’, Journal of
Quantitative Analysis in Sports, vol. 8, no. 1.
CCIV
References
Neil, J., Hash, C., Brugh, A., Fisk, M. and Storlie, C. B. (2013) ‘Scan Statistics for the Online De-
tection of Locally Anomalous Subgraphs’, Technometrics, vol. 55, no. 4, pp. 403–414.
Nevill, A., Atkinson, G. and Hughes, M. (2008) ‘Twenty-five years of sport performance research
in the Journal of Sports Sciences’, Journal of sports sciences, vol. 26, no. 4, pp. 413–426.
O'Brien, S. (2018) ‘FUTURISTIC Arsene Wenger predicts robotic managers and substitutions be-
ing decided on social media in the future’, TalkSports, 2018 [Online]. Available at https://
talksport.com/football/447960/arsene-wenger-robotic-managers-substitutions-social-media-future/.
O'Connor, D. (2013) ‘Coaching practice: turning the camera on yourself’, in Nunome, H., Drust, B.
and Dawson, B. (eds) Science and Football VII: The proceedings of the Seventh World Congress
on Science and Football, Hoboken, Taylor and Francis, pp. 397–402.
O'Donoghue, P. (2009) ‘Interacting Performances Theory’, International Journal of Performance
Analysis in Sport, vol. 9, no. 1, pp. 26–46.
O'Donoghue, P. (2010) Research methods for sports performance analysis, London, New York,
Routledge.
Ogden, M. (2011) ‘Manchester United's Ryan Giggs and Rio Ferdinand buy into cutting-edge
methods at Carrington’, The Telegraph, 21 January.
OPTA Sports (2017a) Introduction to Opta's Classic Data feeds [Online] (Accessed 28 November
2017).
OPTA Sports (2017b) Introduction to Opta's Core Data feeds [Online] (Accessed 28 November
2017).
OPTA Sports (2017c) Introduction to Opta's Performance Data feeds [Online]. Available at http://
www.optasports.com/services/media/data-feeds/performance-data-feeds.aspx (Accessed 28 No-
vember 2017).
OPTA Sports (2017d) Opta Overview [Online]. Available at http://www.optasports.com/media/
94007/gd-274_opta-overview-v3.pdf (Accessed 3 December 2017).
OPTA Sports (2017e) Partnerships [Online]. Available at http://www.optasportspro.com/about/
partnerships/ (Accessed 30 November 2017).
Orchard, J. (2012) ‘More research is needed into the effects on injury of substitute and interchange
rules in team sports’, British journal of sports medicine, vol. 46, no. 10, pp. 694–695.
Passos, P., Davids, K., Araújo, D., Paz, N., Minguéns, J. and Mendes, J. (2011) ‘Networks as a
novel tool for studying team ball sports as complex social systems’, Journal of science and medi-
cine in sport, vol. 14, no. 2, pp. 170–176.
Paul, S. (2016) ‘Euro 2016: Roy Hodgson defends controversial substitution in England draw’,
caughtoffside, 2016 [Online]. Available at https://www.caughtoffside.com/2016/06/12/euro-2016-
roy-hodgson-defends-controversial-substitution-in-england-draw/.
CCV
References
Pearce, B. and Vladimirov, M. (2014) ‘Interview with the Liverpool FC Head of Opposition Anal-
ysis’, Tomkins & Times, 2014 [Online]. Available at http://tomkinstimes.com/2014/09/interview-
with-lfcs-head-of-opposition-analysis/.
Peña, J. L. and Touchette, H. (2012) A network theory analysis of football strategies [Online].
Available at http://arxiv.org/pdf/1206.6904v1.
Perin, C., Vuillemot, R. and Fekete, J.-D. (2013) ‘SoccerStories: a kick-off for visual soccer analy-
sis’, IEEE transactions on visualization and computer graphics, vol. 19, no. 12, pp. 2506–2515.
Perl, J. (2001) ‘Artificial Neural Networks in Sports: New Concepts and Approaches’, Internation-
al Journal of Performance Analysis in Sport, vol. 1, no. 1, pp. 106–121.
Perl, J. (2002) ‘Game analysis and control by means of continuously learning networks’, Interna-
tional Journal of Performance Analysis in Sport, vol. 2, no. 1, pp. 21–35.
Perl, J. (2004) ‘A neural network approach to movement pattern analysis’, Human movement sci-
ence, vol. 23, no. 5, pp. 605–620.
Perl, J. and Dauscher, P. (2006) ‘Dynamic Pattern Recognition in Sport by Means of Artificial
Neural Networks’, in Nguyen, N., Begg, R. and Palaniswami, M. (eds) Computational Intelligence
for Movement Sciences, IGI Global, pp. 299–319.
Perl, J., Grunz, A. and Memmert, D. (2013) ‘Tactics Analysis in soccer: an advanced approach’,
International Journal of Computer Science in Sport, no. 12, pp. 33–44 [Online]. Available at http://
www.iacss.org/index.php?id=143.
Perl, J. and Memmert, D. (2016) ‘Soccer analyses by means of artificial neural networks, automatic
pass recognition and Voronoi-cells: An approach of measuring tactical success’, in Chung, P.,
Soltoggio, A., Dawson, C. W., Meng, Q. and Pain, M. (eds) Proceedings of the 10th International
Symposium on Computer Science in Sports (ISCSS), Cham, Springer International Publishing,
pp. 77–84.
Pfeiffer, M. and Perl, J. (2006) ‘Analysis of Tactical Structures in Team Handball by Means of
Artificial Neural Networks’, International Journal of Computer Science in Sport, vol. 5, no. 1,
pp. 4–14.
Power, D. J. (2013) Decision Support, Analytics, and Business Intelligence, Second Edition, New
York, Business Expert Press.
Premier League (2017) How clubs use GPS to find top level [Online]. Available at https://
www.premierleague.com/news/436544 (Accessed 30 November 2017).
PrNewsWire.com (2017) WellPlayed Research Predicts Global Market for Technologies that
Boost Professional Sports Performance to Reach $1 Billion USD in Five Years [Online] (Ac-
cessed 15 November 2017).
CCVI
References
ProM (2017a) Questions Answered Based on an Event Log Only [Online]. Available at http://
www.promtools.org/doku.php?id=tutorial:answers.
ProM (2017b) Tutorial on ProM 6 [Online]. Available at http://www.promtools.org/
doku.php?id=tutorial:introduction.
Purnomo, H. D. and Wee, H.-M. (2015) ‘Soccer game optimization with substitute players’, Jour-
nal of Computational and Applied Mathematics, vol. 283, pp. 79–90.
Rampinini, E., Bishop, D., Marcora, S. M., Ferrari Bravo, D., Sassi, R. and Impellizzeri, F. M.
(2007) ‘Validity of simple field tests as indicators of match-related physical performance in top-
level professional soccer players’, International journal of sports medicine, vol. 28, no. 3, pp. 228–
235.
RapidProM (2017) Social Network Miner RapidProM - Description [Online].
Reed, D. and Hughes, M. (2006) ‘An Exploration of Team Sport as a Dynamical System’, Interna-
tional Journal of Performance Analysis in Sport, vol. 6, no. 2, pp. 114–125 [Online]. Available
at http://www.ingentaconnect.com/content/uwic/ujpa/2006/00000006/00000002/art00011.
Reep, C. and Benjamin, B. (1968) ‘Skill and Chance in Association Football’, Journal of the Royal
Statistical Society. Series A (General), vol. 131, no. 4, p. 581.
Reilly, T. (1998) ‘Introduction to science and soccer’, in Reilly, T. (ed) Science and soccer, Lon-
don [u.a.], E & FN Spon, pp. 1–7.
Rein, R. and Memmert, D. (2016) ‘Big data and tactical analysis in elite soccer: Future challenges
and opportunities for sports science’, SpringerPlus, vol. 5, no. 1410.
Rey, E., Lago-Ballesteros, J. and Padrón-Cabo, A. (2015) ‘Timing and tactical analysis of player
substitutions in the UEFA Champions League’, International Journal of Performance Analysis in
Sport, vol. 15, no. 3, pp. 840–850.
Ribeiro, J., Silva, P., Duarte, R., Davids, K. and Garganta, J. (2017) ‘Team Sports Performance
Analysed Through the Lens of Social Network Theory: Implications for Research and Practice’,
Sports medicine, vol. 47, no. 9, pp. 1689–1696.
Rodrigues, P., Belguinha, A., Gomes, C., Cardoso, P., Vilas, T., Mestre, R. and Rodrigues, J. M.
(2013) ‘Open Source Technologies Involved in Constructing a Web-Based Football Information
System’, in Rocha, A., Correia, A., Wilson, T. and Stroetmann, K. (eds) Advances in Information
Systems and Technologies: Advances in Intelligent Systems and Computing, Berlin, Heidelberg,
Springer, pp. 715–723.
Rossetti, G. and Cazabet, R. (2017) Community Discovery in Dynamic Networks: A Survey
[Online]. Available at http://arxiv.org/pdf/1707.03186v2.
Rozinat, A. (2010) ProM Tips — Which Mining Algorithm Should You Use? [Online]. Available
at https://fluxicon.com/blog/2010/10/prom-tips-mining-algorithm/.
CCVII
References
Rozinat, A. (2011) How Big Data Relates to Process Mining – And How It Doesn’t [Online], Flux-
icon Blog. Available at https://fluxicon.com/blog/2011/12/how-big-data-relates-to-process-mining-
and-how-it-doesnt/.
Rozinat, A. (2012) Data Requirements for Process Mining [Online]. Available at https://
fluxicon.com/blog/2012/02/data-requirements-for-process-mining/.
Rozinat, A. and Gunther, C. W. (2015) Data Science of Process Mining – Understanding Complex
Processes [Online]. Available at https://www.kdnuggets.com/2015/09/data-science-process-
mining-understanding-complex-processes.html.
Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H. and
Jinks, C. (2017) ‘Saturation in qualitative research: Exploring its conceptualization and operation-
alization’, Quality & Quantity, pp. 1–15 [Online]. DOI: 10.1007/s11135-017-0574-8.
Schrapf, N., Alsaied, S. and Tilp, M. (2017) ‘Tactical interaction of offensive and defensive teams
in team handball analysed by artificial neural networks’, Mathematical and Computer Modelling of
Dynamical Systems, vol. 23, no. 4, pp. 363–371.
Shi, L., Wang, C., Wen, Z., Qu, H., Lin, C. and Liao, Q. (2015) ‘1.5D Egocentric Dynamic Net-
work Visualization’, IEEE transactions on visualization and computer graphics, vol. 21, no. 5,
pp. 624–637.
Si, J., Nelson, B. J. and Runger, G. C. (2003) ‘Artificial Neural Network Models for Data Mining’,
in Ye, N. (ed) The handbook of data mining, Mahwah, N.J., London, Lawrence Erlbaum Associ-
ates, pp. 41–66.
Silva, P., Duarte, R., Esteves, P., Travassos, B. and Vilar, L. (2016) ‘Application of entropy
measures to analysis of performance in team sports’, International Journal of Performance Analy-
sis in Sport, vol. 16, no. 2, pp. 753–768.
Silva, R. M. and Swartz, T. B. (2016) ‘Analysis of substitution times in soccer’, Journal of Quanti-
tative Analysis in Sports, vol. 12, no. 3, pp. 113–122.
Sloot, P. M. A., Kampis, G. and Gulyás, L. (2013) ‘Advances in dynamic temporal networks: Un-
derstanding the temporal dynamics of complex adaptive networks’, The European Physical Journal
Special Topics, vol. 222, no. 6, pp. 1287–1293.
Smith, R. (2017) ‘How Arsenal and Arsène Wenger Bought Into Analytics’, The New York Times,
3 February, B10 [Online]. Available at https://www.nytimes.com/2017/02/03/sports/soccer/arsenal-
arsene-wenger-analytics.html (Accessed 7 August 2017).
Song, M., Günther, C. W. and van der Aalst, W. M. P. (2009) ‘Trace Clustering in Process Min-
ing’, in Ardagna, D., Mecella, M. and Yang, J. (eds) Business Process Management Workshops,
Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 109–120.
CCVIII
References
Sporis, G., Jukic, I., Ostojic, S. M. and Milanovic, D. (2009) ‘Fitness profiling in soccer: Physical
and physiologic characteristics of elite players’, Journal of strength and conditioning research,
vol. 23, no. 7, pp. 1947–1953.
Stein, M., Janetzko, H., Seebacher, D., Jäger, A., Nagel, M., Hölsch, J., Kosub, S., Schreck, T.,
Keim, D. and Grossniklaus, M. (2017) ‘How to Make Sense of Team Sport Data: From Acquisition
to Data Modeling and Research Aspects’, Data, vol. 2, no. 1, p. 2.
Travassos, B., Araujo, D., Correia, V. and Esteves, P. (2010) ‘Eco-Dynamics Approach to the
study of Team Sports Performance’, The Open Sports Sciences Journal, vol. 3, no. 1, pp. 56–57.
Travassos, B., Davids, K., Araújo, D. and Esteves, P. T. (2013) ‘Performance analysis in team
sports: Advances from an Ecological Dynamics approach’, International Journal of Performance
Analysis in Sport, vol. 13, no. 1, pp. 83–95 [Online]. Available at http://www.ingentaconnect.com/
content/uwic/ujpa/2013/00000013/00000001/art00008.
Trequattrini, R., Lombardi, R. and Battista, M. (2015) ‘Network analysis and football team perfor-
mance: A first application’, Team Performance Management: An International Journal, vol. 21,
1/2, pp. 85–110.
Trier, M. (2008) ‘Research Note—Towards Dynamic Visualization for Understanding Evolution of
Digital Communication Networks’, Information Systems Research, vol. 19, no. 3, pp. 335–350.
Tümer, I. (2016) Determinants of player substituion in football: A qualitative study, Bachelor the-
sis, Germany, FAU Erlangen-Nuremberg.
UEFA (2016a) England vs. Russia - Official match statistics [Online]. Available at https://
www.uefa.com/uefaeuro/season=2016/matches/round=2000448/match=2017879/postmatch/
index.html.
UEFA (2016b) Match statistics of England vs. Iceland in the European Championship 2016
[Online]. Available at https://www.uefa.com/uefaeuro/season=2016/matches/round=2000744/
match=2018003/postmatch/statistics/index.html.
UEFA (2016c) UEFA EURO 2016 technical report [Online]. Available at http://www.uefa.com/
MultimediaFiles/Download/TechnicalReport/competitions/EURO/02/40/26/69/
2402669_DOWNLOAD.pdf.
van der Aalst, W., Adriansyah, A., Medeiros, A. K. A. de, Arcieri, F., Baier, T., Blickle, T., Bose,
J. C., van den Brand, P., Brandtjen, R., Buijs, J., Burattin, A., Carmona, J., Castellanos, M., Claes,
J., Cook, J., Costantini, N., Curbera, F., Damiani, E., Leoni, M. de, Delias, P., van Dongen, B. F.,
Dumas, M., Dustdar, S., Fahland, D., Ferreira, D. R., Gaaloul, W., van Geffen, F., Goel, S., Gün-
ther, C., Guzzo, A., Harmon, P., ter Hofstede, A., Hoogland, J., Ingvaldsen, J. E., Kato, K., Kuhn,
R., Kumar, A., La Rosa, M., Maggi, F., Malerba, D., Mans, R. S., Manuel, A., McCreesh, M.,
Mello, P., Mendling, J., Montali, M., Motahari-Nezhad, H. R., Zur Muehlen, M., Munoz-Gama, J.,
CCIX
References
Pontieri, L., Ribeiro, J., Rozinat, A., Seguel Pérez, H., Seguel Pérez, R., Sepúlveda, M., Sinur, J.,
Soffer, P., Song, M., Sperduti, A., Stilo, G., Stoel, C., Swenson, K., Talamo, M., Tan, W., Turner,
C., Vanthienen, J., Varvaressos, G., Verbeek, E., Verdonk, M., Vigo, R., Wang, J., Weber, B.,
Weidlich, M., Weijters, T., Wen, L., Westergaard, M. and Wynn, M. (2012) ‘Process Mining Man-
ifesto’, in Daniel, F., Barkaoui, K. and Dustdar, S. (eds) Business Process Management Workshops,
Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 169–194.
van der Aalst, W. M. P. (2011) Process Mining, Berlin, Heidelberg, Springer Berlin Heidelberg.
van der Aalst, W. M. P., Reijers, H. A. and Song, M. (2005) ‘Discovering Social Networks from
Event Logs’, Computer Supported Cooperative Work (CSCW), vol. 14, no. 6, pp. 549–593.
van der Aalst, W. M. P. and Song, M. (2004) ‘Mining Social Networks: Uncovering Interaction
Patterns in Business Processes’, in Kanade, T., Kittler, J., Kleinberg, J. M., Mattern, F., Mitchell, J.
C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., Sudan, M., Terzopoulos, D., Tygar, D.,
Vardi, M. Y., Weikum, G., Desel, J., Pernici, B. and Weske, M. (eds) Business Process Manage-
ment, Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 244–260.
Varela-Quintana, C., del Corral Cuervo, J. and Prieto-Rodríguez, J. (2016) ‘The effect of an addi-
tional substitution in association football.: Evidence from the Italian Serie A’, Revista de Psi-
cología del Deporte, vol. 25, no. 1, pp. 101–105.
Veiga, G. M. (2009) Developing Process Mining Tools Developing Process Mining Tools: An Im-
plementation of Sequence Clustering for ProM, Lisbon, Portugal, IST – Technical University of
Lisbon [Online]. Available at https://fenix.tecnico.ulisboa.pt/downloadFile/395139104449/
Dissertacao_54276.pdf.
Venkatesh, V., Brown, S. A. and Bala, H. (2013) ‘Bridging the Qualitative-Quantitative Divide:
Guidelines for Conducting Mixed Methods Research in Information Systems’, vol. 37, no. 1,
pp. 21–54 [Online]. Available at https://misq.org/bridging-the-qualitative-quantitative-divide-
guidelines-for-conducting-mixed-methods-research-in-information-systems.html (Accessed 17
October 2016).
Vilar, L., Araújo, D., Davids, K. and Button, C. (2012) ‘The role of ecological dynamics in analys-
ing performance in team sports’, Sports medicine (Auckland, N.Z.), vol. 42, no. 1, pp. 1–10.
Wang, X. F. and Chen, G. (2003) ‘Complex networks: Small-world, scale-free and beyond’, IEEE
Circuits and Systems Magazine, vol. 3, no. 1, pp. 6–20.
Wäsche, H., Dickson, G., Woll, A. and Brandes, U. (2017) ‘Social network analysis in sport re-
search: An emerging paradigm’, European Journal for Sport and Society, vol. 14, no. 2, pp. 138–
165.
CCX
References
Wehrens, R. and Buydens, L. M. C. (2007) ‘Self- and Super-organizing Maps in R: The kohonen
Package’, Journal of Statistical Software; Vol 1, Issue 5 (2007) [Online]. Available at https://
www.jstatsoft.org/v021/i05.
Wongsuphasawat, K. and Gotz, D. (2012) ‘Exploring Flow, Factors, and Outcomes of Temporal
Event Sequences with the Outflow Visualization’, IEEE transactions on visualization and comput-
er graphics, vol. 18, no. 12, pp. 2659–2668.
CCXI