[ENG] Integrating ELO rating system and Inflation

Lucas Ferrantelli
Nov 7, 2020
11 min read

The ELO system, which originated in chess, has now taken the world of competitive video games by storm. We are going to see how it can be integrated into the game, what are the freedoms that can be taken in relation to the original system, how to adapt it to match the Game Systems of the game, and what are the mistakes that should not be made.

Why the ELO

Some competitive games don't use the ELO system at all, in favor of systems that are often much simpler to understand, so why complicate yourself with such a complex system?

First of all, the ELO system is actually not very complex, especially compared to the advantages it offers.

Indeed, the ELO system is used because it is efficient. It allows to easily :

A. Classify all players

Indeed, it may sound silly, but there are actually very few effective ranking systems that can include all the players that make it up.

Often, when I make this statement, I am told that the ELO can be replaced by a silly and mean comparison of victory rates, for example. However, performance comparisons are not an effective way of ranking. Indeed, a player who has played 2 games can have a 100% win rate, and thus be at the top of the ranking.

Another Ranking system that players love is a very simple system: victory is worth 1 point, defeat is worth -1 point. I love this system, but it has a lot of problems in this context: first of all, it doesn't allow to efficiently rank players: two players can easily have the same score. Whereas the ELO uses a much wider scale. Secondly, this system does not take into account the player you played against. Thus, a victory against a very good player is rewarded as much as a novice.

I could go on presenting bad systems, but I think the ELO is interesting enough that we decide to spend some time presenting it.

B. Predicting progression

The advantage of the ELO is that because it follows simple mathematical rules, it is easy to simulate, allowing the player's experience to be managed.

C. Managing Ties

The ELO manages 3 outcomes in a match: victory, defeat, and equality, which are the 3 outcomes of a chess game. The advantage of the ELO system is that "equality" does not correspond to "cancel the game" like other systems of classification: in the ELO system, equality can make lose or gain points.

Let us notice that the ELO manages on the other hand badly other outcomes. Perhaps you are looking for a system that takes into account other parameters than simple victory, defeat, or equality. In this case, you will have to work a little and create a variant of the ELO.

D. A modular system

Because yes, the ELO is just a few formulas, with a lot of variables that can be changed, and the opportunity to change it without too much trouble.

E. A system neither too fast nor too slow

The ELO does not imply a set number of games to get to the bottom or top of the rankings. If you don't change the basic formula too much, the ELO handles the rise and fall in the ranking quite well.

In addition, the ELO allows in very few games to determine the player's ranking.

F. A balance in a closed system

In the ELO, nothing is lost, nothing is created. If the sum of the ELO of all your players was worth 3 million 5 years ago, it will still be worth 3 million.

If you pay attention, you will have noticed that unfortunately, a video game is rarely a closed system. Indeed, new players arrive regularly, and others leave. We will talk about solutions to this problem later.

G. Flow and matchmaking

Finally, the main advantage of ELO in video games is the possibility to easily integrate a matchmaking system (which, contrary to the belief, is not directly integrated into the ELO system).

What is the ELO system

In this section, we will look at the mathematical formulas of the ELO. I promise, although they can be scary, these formulas don't eat. If, on the other hand, you don't like math at all, you can skip this part.

The principle of the ELO system is really simple: estimate the probability of victory of each party in a game, then compare this theoretical value to the actual outcome. If the outcome was predictable, the players will win and lose a little number of points. The more unlikely the outcome was, the more points the players will exchange.

The term point exchange here is important: a player will always win as many points as he has caused his opponent to lose. This is why ELO works so well in a closed system: there is no creation or destruction of ELO, just exchanges.

Without further ado, let's introduce the ELO system

A. Initialization

Each member of the system is given a basic ELO value.

(traditionally between 800 and 1200).

In a confrontation between two players (named A and B), Ra is the current ELO of player A, and Rb is the current ELO of player B.

B. Calculation of the chances of victory

It is then time to calculate the expected outcome of this confrontation (between 0 and 1).

0 means that a player has no chance to win, 1 means that the player cannot lose).

The expected outcome for player A is called Ea:

Ea = 1 / ( 1 +10^((Rb-Ra)/400) )

Screenshot of the formula

And, of course, we note Eb the expected outcome for Player B, which is the same formula, just inverting the term A by B and B by A.

Let's try to understand what this formula implies:

Without doing the equation to prove it, I ask you to believe me on the fact that the sum of Ea and Eb is equal to 1, which is quite legitimate, since the more a player has a chance to win, the more the other has a chance to lose, and this, in a proportional way. Thus, Eb = 1 - Ea and vice versa.

Moreover, let us notice that if Rb and Ra are equal, we end up with 1/(1+10^0), which is one out of two. This means that if the players have the same ELO, they have the same chance to win as to lose.

You can then come up with numbers as arbitrary as 10 or 400. These are arbitrary variables that will define the power ratio within the ELO: for every 400 ELO points of difference, the power ratio is multiplied by 10.

Example: if we make a player with 200 ELO against a player with 1000 ELO, the

chances of winning for one of the players is about 1%, compared to 99% for the

other. If on the other hand, you make a player play 600 against 1000, you get a

10% chance to win against 90%.

In fact, many ELO systems consider a difference of more than 400 points to be no longer representative: the gap is too big to expect a player to win.

Note: I'm talking here about the "chance to win". This term is a bit dishonest because that's not exactly what it means, but I can't find a simple term to express what it means. For lack of a better term, this term fulfills its function rather well.

C. Theoretical versus real result

Now that we have the theoretical result of the confrontation, all that remains is to really make A and B confront each other. There are 3 possible outcomes: A wins, A loses, or A ties.

Either Sa is the real outcome of A.

If A wins, Sa = 1

If A loses, Sa = 0

If A equals, Sa = 0.5

So now we just have to see if player A "surprised" us. It is very simple to do, we compare the theoretical score Ea (between 0 and 1) with the real score Sa (between 0 and 1). To do so, we just have to subtract one from the other: Sa - Ea

If the result is higher than 0, it means that the actual score is higher than the expected score, and therefore that A surprised us in good. Conversely, if the result is less than 0, A surprised us in bad.

We just have to add this result to the ELO to the player, and to take away the same value from the player B: our players have just exchanged ELO!

Often, we multiply this result by a coefficient (called K), which allows avoiding that the players win only 1 point by 1 point. The value of K is often 32. Chess uses this value of K = 32, but the pass has K = 16 for players with a lot of ELO.

Globally, the higher K is high, the easier it is to exchange a lot of ELO in a short period of time, conversely, the higher K is, the more matches will have to be made to win/lose a lot of ELO.

How to integrate ELO into your system

Well, now that the ELO has no more secrets for you, it's time to think about implementing it in your system. Indeed, the raw system is probably not the best solution for you, it may even not be implementable as is.

A. Teamplay

This is obviously the first problem that comes to mind: in a team game, where several players compete against each other, how is the ELO calculated?

Note that the solution will necessarily be debatable. In any case, one of the members of a team will feel aggrieved if he or she knows the underlying system.

Note also that the more players have different ELOs, the greater the injustice. In the perfect case, all players on both teams have exactly the same ELO. The more we move away from this situation, the less perfect the solution will be, hence the interest of good matchmaking.

Without further ado, let's look at some possible solutions:

The most intuitive one, which probably came to your mind, is to average the ELO of each team and make them compete against each other. This is indeed a very good answer. We calculate the ELO of Team A and Team B. We have them compete against each other, compare the theoretical result to the real result, and add/subtract this value to the personal ELO of each player. This solution is very good in the sense that it doesn't create an ELO nor does it suppress one: it respects the stability rule. On the other hand, the injustice of the players can be great: no matter the ELO of each player of the team, they win and lose as many points. As if the player with 2500 ELO points and the player with 800 ELO points who are part of the same team were equally responsible for their defeat/win.

Another system is to disregard the team. All the players on each team are classified according to their ELO. We then compare the player's score with the highest ELO of each team, the same for the second-highest ELO of each team, and so on. It's as if each player has only played against one other player.

This system, once again, respects the principle of stability. On the other hand, I don't find it fully satisfactory, in fact, it only takes into account one player.

The third system that I intend to present to you is less intuitive. It consists of confronting the average ELO of the opposing team against the personal ELO of a player. Imagine a 2vs2 game, so we have 4 different ELOs. Noted Ra1, Ra2, Rb1, and Rb2. In addition to that, we add the average ELO of the two Ra, noted Rta, and Rtb for the other team.

We then apply the original formula Ra1 against Rtb. Ra1 against Rtb, Rb1 against Rta, and Rb2 against Rta.

Although it is not obvious at first glance, this system also has the merit of respecting the principle of stability. The advantage of this system is that the average ELO of the opposing team is taken into account. Obviously, the problem is that the ELO of the allies is not taken into account at all.

Here, I have presented you 3 simple methods that I recommend. As you have seen, none of them are fully satisfactory. All three have their flaws. It's up to you to see which one best suits your game. Note that the best system for your game probably also depends on your matchmaking system, the number of players in each team, and the number of players in your community.

B. Changing constants

A subject that is a little less headache: constants!

In reality, there are 3 modifiable constants in the ELO: the 10 and the 400 of the first formula (Ea = 1 / ( 1 +10^((Rb-Ra)/400) ) ) . I don't recommend that you touch it. Indeed, changing them just makes the final value of the ELO against intuitive. Your system will work just as well, but the scale will no longer have anything to do with that of the classic system. Anyone who has ever played a game with ELO knows that 2000 is very good, and 3000 is out of the ordinary. By modifying these two variables, you'll just have a scale that's not very readable.

On the other hand, the last variable, K, is very important: it's the number of points played during a game. As said before, it is often 16 or 32. Nothing prevents you from changing it. The higher K is, the more important a game will be, the lower K will be, the less important the game will be.

Also, within your system, you can integrate different values of K. Often, K is given a very high value for new players, allowing them to be quickly ranked at their real level.

C. Cumulating ELO has another system

If you were unfamiliar with the ELO, there is a good chance that for you there was a question of division in the ELO (Silver, Gold, Platinum, Diamond, etc.). However, there is never such a ranking in the ELO. The existence of these divisions is a way to make progression in the ranking more attractive. Indeed, a simple number like the ELO score is often not motivating enough for players. Coupling it with another system is often a good idea. This system can be cumulative or complementary.

Many systems use a hidden ELO, which then allows them to define the ranking of their other system.

D. Mixed Victory or Minimal Loss

As seen previously, the ELO manages by default only 3 outcomes: defeat, victory or tie. In reality, it is possible to add other outcomes, the importance is that the total of the outcomes has a value of 1 (currently, Sa is equal to 1 if Sb is equal to 0, and vice versa. If Sa is equal to 0.5, the same is true with Sb). Thus, it is possible to create a mixed victory case, which would be worth 0.75 for example. Or even a system that adds +0.1 for each remarkable action performed.

I specify once again that the sum of the two parties must be equal to 1, otherwise, you risk to create or destroy ELO points, leading to situations of inflation or deflation.

E. Inflation, balance management and voluntary exchange of ELOs

It is now time for the warnings. The perfect theoretical system has been presented, let's see how it can be circumvented, manipulated, and deceived.

First of all, let's remember that in a closed system, the ELO is never destroyed or created, it is simply exchanged by the players. Unfortunately, the arrival of a player adds ELO. The exit of a player comes to remove ELO. The most problematic is the creation of ELO, especially if it is done intentionally to inflate another player's ELO. One effective solution is the creation of barriers to entry. These are prerequisites for offering ELO to a player (i.e., a prerequisite for them to enter the system). This barrier can be monetary (e.g., there is a fee for your game), or it can be a deterrent, forcing the player to play several games before entering the rating system.

Another problem with the ELO in its simplest form is that there are no restrictions on the exchange of points. Indeed, two players with 1000 ELO can theoretically approach a total exchange, bringing one player close to 2000 points and 0 for the other (if you have the formulas in your head, you realize that this will be very long and tedious, but you should never underestimate the determination of the players!) To counteract this problem, it is necessary to prohibit exchanges of ELO between players with a too high level (often set to 400).

Finally, the last advice I can give you is to often reset the ELO of your players (or at least partially). This is the best way to make sure your system doesn't end up breaking. Moreover, creating this security allows you to take liberties on some fundamental rules of the ELO, like the stability rule. For example, you could punish players who leave a game by removing more ELO from them. Eventually, this penalty should deregulate the ELO of your game, but if it is often reset, the impact will not have time to be perceived on a large scale.

That's it, that's the end of the ELO article. I tried to keep it as simple as possible. If like me, this system fascinates you, I advise you to have a look at the complete ELO analysis found on the Internet.

Finally, if you feel comfortable with mathematics, and are looking for another system than the ELO, I advise you to have a look at the Glicko 2, which is a revised ELO looking to reduce the shortcomings of this system.

Lucas Ferrantelli

Game Designer

[ENG] Integrating ELO rating system and Inflation

Recent Posts

Comments