Design Philosophy: Game Balance

I'm Jules, and one of my jobs is balancing Legion TD 2. In this dev blog, I share our design philosophy on game balance and some of our secret sauce.

[h3]Principles[/h3]

Trust the data
Every time you play the game, your decisions and the resulting outcomes generate valuable data. Data captures information from hundreds of thousands of games and tens of thousands of players, not just from players who give feedback. It's also always recording, not just when players choose to give feedback (after being tilted by a loss ). For balance data, we recommend Drachbot, which closely matches our internal data.
Listen to the community
Community feedback fills in the gaps when data is lacking or if a unit's design is problematic. If there's feedback that conflicts with data, that might mean a unit is frustrating or situationally unbalanced, even if it's balanced on average. We often make systemic changes based on community feedback, but not everyone speaks English or chooses to give feedback, and it's unrepresentative of the player base at large.
Balance for Ranked
Most of our effort goes towards balancing Ranked because it is our competitive mode with skill-based matchmaking. In Classic, we only balance outliers if they are heavily degrading the meta.
Balance for top ladder
One great thing about Legion is that the meta is similar at all levels of play. However, some units are stronger in top ladder because they are more decision-making-intensive (e.g. Pack Rat, Eggsack) or positioning-dependent (e.g. Priestess of the Abyss). We default to balancing for top ladder, which means that some of these high skill cap units are balanced or slightly strong in top ladder but slightly weak otherwise.
Balance for the present
Anyone can theorize about how strong a unit, combo, or strategy should be or will be once the community learns it. While their prediction might be correct, our philosophy is to balance for the current state of the game. If the prediction becomes reality, it will be reflected in the data and subsequently addressed.
Game balance is one of many tools for making the game fun
More balanced doesn't always mean more fun. It's more fun having exciting moments where you occasionally high roll a strong combo or win the game on wave 7, mixed with low moments where you have no tanks in your Yolo roll, than to always have a neutral experience.

[h3]Rolls[/h3]

When balancing fighters, we primarily look at a unit's roll data - how often players pick and win with a unit in their roll, regardless of whether they build it. That's because when a unit is built, its win rate becomes confounded by the state of the game. For example, building a Doomsday Machine has a >50% win rate, not because it is overpowered, but because players that build it tend to already be winning. Because Doomsday Machine is the most expensive unit in the game, only players who have a lot of gold (and who are likely already in a winning position) can afford it.

Win rate and pick rate are both important variables.

In low-rated games, win rate is predictive of how strong a unit is (in that rating bracket). As you climb the ladder, players are more informed and play optimally. They are deliberate about what units they pick. This makes pick rate the more predictive variable and win rate less predictive.

One way to think about it: On the far left of the rating distribution, players are effectively picking randomly or based on what art looks most exciting. On the far right of the rating distribution, players are sweatily drafting the comp that will maximize their chance of winning and counter-picking against the meta.

Because of this, win rates flatten out towards 50% once the meta has stabilized. Here's a good example:

Take the game of rock-paper-scissors, but instead of rock always beating scissors, scissors always beating paper, and paper always beating rock, imagine rock beats scissors 75% of the time, scissors beats paper 75% of the time, and paper beats rock 90% of the time.

In low-rated games, where players pick randomly, paper has the highest win rate, scissors in the middle, and rock the lowest.

However, in high-rated games, something interesting happens. Players take into account the different win rates and adjust the frequency at which they throw rock, paper, and scissors. At equilibrium, the win rate of all three choices is 50%. Win rate loses all predictive power.

This example doesn't perfectly describe Legion, as Legion is a much more complex game with an ever-evolving meta, but it illustrates why we can't only consider win rate and why we must also consider pick rate for game balance.

If a unit is high appeal or especially fun, this will affect pick rate. We do our best to factor that in. For example, Sacred Steed is fun to use, so we tolerate her having a higher pick rate than normal.

[h3]Upgrades[/h3]

Roll data helps us determine which rolls are strong or weak, but it doesn't tell us about the balance of individual units (base unit vs. upgrades). To determine that, we consider two factors:

Unit stats
Consider Sand Badger. If roll data shows Sand Badger is too strong, we can compare the HP/gold and DPS/gold of Sand Badger and its upgrade Iron Scales, given they have identical typing, range, and utility. If their HP/gold and DPS/gold are roughly the same, it's likely that Sand Badger and Iron Scales are both strong and in need of a nerf. When a base unit has different typing, range, or utility than its upgrade, we can make reasonable approximations about the power budget of those differences or compare against other similar units. Additionally, we can consider unit usage and win rates.
Usage and win rates
In this context, usage rate is how often a unit is built, and win rate is how often players win when they build that unit. For example, compare Devilfish and its counterpart Seraphin. If the data shows Devilfish is used more than Seraphin and has a higher win rate when built, we'd consider shifting power from Devilfish to Seraphin. Remember that with unit usage and win rates, we run into the confounder problem (where the ability and choice to build a unit reflects your position in the game), so we have to be smart about adjusting for this.

[h3]Openings[/h3]

We don't worry specifically about opening balance too much. That's because if rolls are balanced, it usually means openings are also balanced. If an opening is strong, players will pick and win with that unit more often, which is captured in roll data.

If too much of a unit's power budget is in its strength as an opening, we'll consider damage threshold adjustments to nerf it as an opening, while buffing or maintaining its power elsewhere. An example of that is Fire Elemental, which used to perfectly 2-shot wave 3, making it a powerful and low interaction opening. By lowering its damage slightly and compensating it with increased attack speed, we kept its DPS the same while making it a weaker opening.

[h3]Mercenaries[/h3]

For mercenary balance, we primarily consider usage and win rates, then make adjustments for the following:

Cheaper mercenaries are used more often
More expensive mercenaries have higher win rates (the same confounder we talked about earlier when building Doomsday Machine also applies when sending Kraken)
Power mercenaries have higher win rates (players that are already in a winning position are more likely to send power mercenaries)

[h3]Waves[/h3]

In Ranked, our target end-wave distribution is a bell curve centered around wave 15 or 16. This accomplishes the following:

Makes average game time 20 minutes and 95% of games between 10-30 minutes
Avoid spikes in wave power and makes all waves viable to send on
Makes the lethality of a wave (the conditional probability that the game ends, given that you reached that wave) monotonically increasing, which means tension increases as the game goes on. For example, even if more games end on wave 16 than 17, the probability of the game ending on 17 (given you reached wave 17) is higher than then probability of the game ending on 16 (given you reached wave 16).

[h3]Closing Thoughts[/h3]
We care a lot about game balance in Legion TD 2. It's why every major patch has balance changes and why we respond quickly with hotfixes if something is breaking the meta.

When we buff or nerf a unit, the magnitude of the change is usually small, on the order of 1-3%. That's because the game's balance has been finely tuned over the years, and tiny changes make a big difference in a strategy game like Legion. For example, one patch, Sakura went from one of the top performing units to one of the bottom performing units after a 1.5% nerf to her attack speed.

On a personal note, game balance is one of my greatest joys. I grew up being obsessed with game balance and numbers and would often rebalance board games and theorycraft about how to improve game balance in video games. In 2013, I graduated with a degree in statistics from Harvard. From 2012-2015, I worked as a data scientist and game designer at Riot Games, focusing primarily on game balance for League of Legends. I built most of League's game balance frameworks and dashboards, some of which are still in use today 10 years later. After leaving Riot Games, I began development on Legion TD 2 with Lisk, and the rest is history.

Sincerely,
Jules from the Legion TD 2 Team