Co-op when? A couple (thousand) words on the subject

Hello Riftbreakers!

Today’s article is one that many of you have been eagerly waiting for. We will discuss the progress of the promised co-op mode for The Riftbreaker. In this article, you will find out what you can expect from our multiplayer module, what challenges we’ve overcome so far, and what still needs to be done. Strap yourselves in. This is going to be a long one!

There are some screenshots and gifs in this article. Most of them contain bugs and glitches and they do not represent the final quality of the co-op mode. We've only included them to show you what we are working with. Things will look a lot better when we give you access!

HAMMER TIME

[h2]INTRODUCTION[/h2]

Let’s start with the basics. The Riftbreaker runs on our proprietary game engine - The Schmetterling 2.0. We have always enjoyed working with our own technology, as it gives us an unparalleled degree of freedom when it comes to choosing the right tools for the project. You can implement basically any feature you want, which is why we were able to significantly improve our game with technologies such as raytraced shadows and ambient occlusion, which were not available when we first started working on the game. The one caveat is that you need to implement all the changes yourself, and while some are pretty straightforward, others can be slightly more challenging.

The Riftbreaker project officially began in February 2018. Since we are an entirely self-funded Small Indie Company™, we knew that implementing co-op was just beyond our reach. We decided to limit the scope of the game to single-player only. However, thanks to The Riftbreaker’s popularity and your support, we were able to pull the trigger and start working on the multiplayer mode after the game’s 1.0 launch. We know that making this decision this late in the development cycle wasn’t the perfect solution. It would come at the cost of refactoring a substantial portion of game code and restructuring the engine, but we believe it will be well worth it.

Building is more efficient when there's two of you!

Before doing any kind of development work, we had to settle on the sort of architecture the multiplayer module was going to run on. We were faced with two alternatives:

Client-server - in this case, one computer is the session's host. It stores all data regarding the game state, conducts the majority of calculations, and relays the relevant information to other computers - the clients.
Determinism (or peer-to-peer) - all computers involved in the multiplayer session carry out the same operations, resulting in exactly the same game state on every client (in the case of this technique, there is no client-server division).

The latter option would require us to make absolutely sure that the game would always behave in the same, 100% predictable manner. For example, we would need to guarantee that a group of creatures attacking your base would always follow the same path and behave the same way (provided that all other conditions were the same in all cases). We were unsure whether we could pull this off or even if it was possible in the first place. A lot of things are simulated in The Riftbreaker every second, and ensuring determinism seemed very difficult. This is why we decided on the client-server architecture, eliminating that issue. However, this method comes with many problems we have to deal with anyway - but more on that later. If you would like to dive even deeper into the world of multiplayer game architecture, here’s an article we can recommend - https://gafferongames.com/post/what_every_programmer_needs_to_know_about_game_networking/.

Let me solo her.

[h2]THE ROAD SO FAR[/h2]

After settling on the client-server architecture, it was time to start laying the groundwork - the process began in February 2022. The first step was to port and implement the Valve Networking Sockets library into the Schmetterling engine. It’s an open-source solution that you can check out here: https://github.com/ValveSoftware/GameNetworkingSockets. Don’t be afraid of the name ‘Valve’ there - this library is cross-platform and is not tied to the Steam ecosystem. We made sure that it would support all platforms. This library allowed us to establish a connection between two game instances - we first achieved that in mid-April 2022. We started small - simply by running two Riftbreaker apps on the same PC. Each ran independently, which was a significant milestone but still far from our goal. As soon as that was one, we started the process of synchronizing entities between the two worlds.

The first layer of the game that we started transferring was the visuals. Step by step, we began synchronizing models, particles, and animations. That alone revealed how much work we had ahead of us. Synchronizing an empty map with just two mechs running around took as long as 400 milliseconds per update! This was, of course, the first “brute force” prototype. That’s when the first optimizations started materializing. Instead of synchronizing the entire client’s world with the server, we limited the number of entities by including just the ones currently visible on the screen. We reduced the frequency of updates. We switched from synchronizing snapshots of the entire world to updating only the entities that changed in a significant way.

You look smashing today.

The method of only synchronizing changes in the game world is much more efficient but requires us to solve a couple of interesting problems. What kind of change in state for any given entity is significant enough? How often do we need to synchronize the client with the server? How do we monitor and track changes in the game world? There are no clear-cut answers to these questions. We will likely be tweaking all of these aspects until the end of the development cycle. At present, detecting and synchronizing changes in the world state is by far the most significant performance overhead for The Riftbreaker. No effort we put into optimizing this task can be too much.

After we had the foundations for a co-op game in place, we could finally start making attempts at creating a session between two PCs. To do that, we had to give the client a very significant upgrade - the ability to control the mech. It might seem simple at first, but it’s one of the most important systems when it comes to the title's playability, so there is no room for mistakes here. It would seem logical to send the inputs from the client to the server, make the server calculate the results, and send the info back to the client.

Instead of transferring raw inputs, we have a couple of systems to handle controlling the mech on the client PC in a more elegant way. When the player presses any key on the keyboard or gamepad, that input is received by the ActionSystem. The purpose of this system is to translate the button press to a game action according to the key bindings. For example, pressing ‘w’ is translated by the ActionSystem onto ‘move_up “1”’ command. Now that’s an understandable command that we can send to the server and call it a day, right?

Not exactly. To turn a ‘move_up’ command into an actual movement that you can see on the screen, we need to use a specialized system - something that we call the MechSystem (not the most original name, we know). The MechSystem is the manager of all events concerning the player-controlled mech. It’s responsible for movement, using weapons and items, health management, and all other things. The MechSystem will decide whether a client’s player can move their mech and their final position. Since the data came from the server itself, it is guaranteed to be correct and doesn’t need to be verified further. The action is carried out, and the mech moves up. But wait, isn’t this exactly like ‘sending inputs’ that you said wouldn’t be good enough a couple of paragraphs earlier? Again, not exactly. By sending the information about the command to the server instead of inputs, we avoid any ambiguity. Pressing ‘W’ might mean different things on the client than on the server, but this way, we ensure consistent results.

We will have to adjust our lighting system to this new client-server situation. By the time we're done, co-op will offer the same graphics as the single-player part of the game.

Another area that we realized we could easily optimize was the BuildModeSystem. The system is responsible for displaying the building menu for the player and allowing them to place structures on the map. It quickly became clear that we do not need to inform the server every time one of the players enters the build mode. Instead of doing that, we only send commands such as ‘build a solar panel at coordinates X, Y’ with simple events, just like we handle mech movement and abilities. Not only did this solution work great, but it also proved to be easily done in the case of multiple systems. We plan to adapt many more systems to work similarly. The first candidates for that are our menu screens, such as research, crafting, and inventory. More optimizations like these will undoubtedly have to be added to the list.

Naturally, we have done a lot more work behind the scenes - so much that if we were to describe all of it, we would end up with a book instead of an article. However, that brings us, more or less, to the point where we are today. To sum things up:

We decided to follow the client-server architecture for the co-op mode in The Riftbreaker.
We created a very rudimentary, brute-force prototype of the multiplayer mode. It connected several instances of the game launched on the same PC. It served as a basis for further development and indicated some early issues.
We moved attachment creation from the SkeletonComponent to the Skeleton resource itself and refactored the Model Editor to reflect that change. This removes the necessity of synchronizing skeleton attachment creation and removal between client and server.
We extracted the UniformComponent out of the MeshComponent. Uniforms hold parameters for shaders, and those change quite often, unlike materials and names that are parts of the MeshComponent too. By extracting this element, we eliminated unnecessary synchronizations.
For similar reasons as above, we extracted the information about animations from the SkeletonComponent to the AnimationComponent.
We had to create a division between client and server entity IDs. It was necessary to avoid conflicts between clients and the server. For example, if one of the clients created a local entity, such as a particle effect or a giblet, and gave it the same ID as one already on the server, the server could overwrite client data.
We have successfully implemented Valve Networking Sockets - a library that allows us to establish a client-server connection between computers over a network.
The server can run a regular game client or in ‘headless mode’ - without any visual representation.
We can play the game on two different PCs with two players controlling their own mechs.
Some of the systems run independently on both the client and the server PCs, only synchronizing key elements. The ones we’ve already re-worked for this architecture are:
- BuildModeSystem - responsible for handling the building menu, as well as building, upgrading and deconstructing structures in your base,
- ActionComponent - handles most of the players interactions with the game,
- HealthBarSystem - displays the percentage value of hitpoints of any given entity in the form of a colored bar,
- HudSystem - responsible for displaying the GUI elements on the screen during gameplay,
- DisplayRadius - shows the working range of towers in the game,
- GuiTimerSystem - counts the time and progress of building construction, upgrades, and repairs,
- ResearchClientSystem - responsible for everything related to the Research Screen,
- GridRenderableSystem - displays the terrain grid and marks grids as empty, occupied, or filled with resources,
- SelectableSystem - allows you to highlight objects in the game and check their stats,
We have started optimizing game systems to ensure acceptable data transfer levels and reasonable calculation times. Minimizing both these costs will result in significant performance improvements.
We have identified some of the problems we are going to have to face when synchronizing the two game worlds. We will describe those in the next chapter.

[h2]WHERE TO GO NEXT[/h2]

Even though we have already made a lot of progress, there is still much to be done. At every step of the way, we learn about new potential problems and take measures to prevent them from becoming severe issues in the first place. Here’s what we are currently working on and some of the issues we still have to solve:
Separating events and their visual effects to reduce data transfer and optimize performance. We aim to adapt most of our systems to work this way. Up next, in no particular order, are the following:
- DestructionSystem - responsible for changing textures on models as they get damaged, spawning effects and creating gibs and debris,
- FogOfWarSystem - covers the unexplored parts of the minimap and keeps track of the area that is visible to the player,
- VegetationSystem - handles vegetation growth cycle and dynamic foliage reactions to external forces such as wind and shockwaves,
- AnimationGraphSystem - handles animation playback, transitions, and events connected to them,
- MechSystem - responsible for all actions of the player’s avatar.
Introducing a lag compensation system, which will require us to rework our PhysicsSystem.
Synchronizing our large data structures to prevent the need to transfer the entire component.
Adding support for disconnecting players from the session.
Making sure that the game considers players to be a part of one team.
Disabling bullet time effects in multiplayer.
Moving the simulation of VegetationSystem to the client PC.
Introducing measures that will compensate for packet losses.
Adding in-game chat options.
Adding a player lobby screen.
Creating a robust save system. This will also require us to decide who can actually save the game. Should the clients be able to do so?
Fixing rendering issues. We are currently experiencing severe problems with shadows and lighting. Raytracing does not work at all.
Updating the game’s overall design to encompass mutiple players - should resources be shared or separated? How should biome travel work? How do balance game difficulty?
…and many more.

The list above is just a simple generalization of the work that still needs to be done. Each of these bullet points is a very broad task on it’s own. Let’s go a bit deeper on a few of the grander topics.

Why did you destroy them all by yourself?! You should have waited for me!

Step by step, we will separate more systems into client and server parts. Our end goal is to simulate all the events that do not directly affect gameplay on the client side. Let’s take a grenade explosion as an example. One of the players throws a grenade. The server has to calculate where the grenade will land, when it will explode, how much damage it will deal, and which entities will be affected by the explosion. That information is relevant to all players, so it has to come from the server side. However, all the rest can be easily simulated on a client’s machine. We don’t need any additional information to spawn the explosion particle effect, the sound, the gibs from destroyed enemies, or debris from destructible props. This will further reduce the strain on the server and the amount of data that needs to be synchronized.

As we adapt more and more systems to be suitable for use in the multiplayer context, we will have to support an increasing number of interactions between systems. The world of The Riftbreaker is very dynamic - hundreds of entities can change at any given second. We will have to take that into account and introduce a lag-compensating prediction algorithm. You can learn more about prediction in the article we linked before - https://gafferongames.com/post/what_every_programmer_needs_to_know_about_game_networking/. This will allow us to prevent many latency-related issues. For example, let’s say that you are using a railgun to take out a group of enemy creatures. You aim at them and press ‘fire.’ We send the ‘fire’ command to the server - it checks that you have the weapon, the ammo, and your mech is alive. The shot is fired. Thanks to a prediction algorithm running on the client PC, you can clearly see your weapon firing. The railgun projectile hits enemies instantly. You can see that a group of creatures you were aiming for has taken damage. Yet, they didn’t die, even though they should. Why?

Got your back, buddy.

The answer is simple - they weren’t at the spot you were aiming for anymore. They moved a little bit to the side on the server, and when the information about you firing your railgun reached the server, there was no one there to kill anymore. We will have to introduce a lag compensation system to solve issues like these. we plan to keep a record of entity positions and their states as a reference for the server. Thanks to the historical data, the server will be able to make corrections and compensate for the delay in communication between the client and the server. Naturally, this will require us to change some systems. We need to allow our PhysicsSystem to ‘travel’ back in time, which will require us to refactor and simplify many of its components to ensure consistent results. The WeaponSystem will require significant refactoring as well.

Some problems stem from the fact that The Riftbreaker was initially designed as a single-player game. Several components worked perfectly fine in that setting but did not make much sense when it came to multiplayer. Many of those are what we call data components: the mission log, the journal, the database, and research trees. Currently, the only way to synchronize those components between the server and the clients is to send a copy of the entire component, which can get very big over time. Let’s take the mission log as an example. There are many timed missions in The Riftbreaker. We display a countdown to a certain event, ticking down every second. That needs to be reflected on the client’s side as well, so we need to synchronize it with every game logic tick. Every 33 milliseconds. The entire 3 MB of it. The problem is quite clear here - this is not a reasonable amount of data, and there is no way this could work. Therefore, data components are next in line for a rework.

Another example of a problem that we are going to have to face is the case of the TerrainGridComponent. Each level is mapped onto a simplified game logic representation that we call the “Grid” - it is comprised of small 2x2m squares that define the smallest chunk of terrain that can have an individual game logic state. Those squares can be empty, blocked, populated by resources, or have a building on top. In the base version of the game, the entire terrain grid information was held in one data structure. That is another 3 MB of data on just a basic survival map, which is 1280x1280 meters. Maps in The Riftbreaker can get much bigger than that - up to 3072x3072 meters, exponentially increasing the data amount. We have already introduced a basic optimization for that. The multiplayer version of the TerrainGridSystem is divided into 64x64 meter chunks. The system only synchronizes those chunks which are relevant to the gameplay state, however, this solution is far from perfect, and there is still a lot of room for improvement here.

This is the grid that is the base of many important elements of the game, such as building and resource mining.

We said earlier that we would like to run all the visual aspects of the game locally on a client’s PC. Unfortunately, it is not always possible. Such was the case with animations. Every animated model in the game has a skeleton - a structure of interconnected points, fittingly called ‘bones.’ The animation graph is programmed to move certain bones within the model. Without the bone structure, it has no reference point and can’t move the model.

Animating objects in the game in the client-server architecture requires us to carefully consider the balance between performance and the amount of data transferred over the network. Since The Riftbreaker was designed to be a single-player game, a large part of the game logic and object behavior had been tied to the animation state. Let's take one of the most basic units in the game as an example - the Arachnoid. The unit's primary attack is shooting an acid projectile from the tip of its tail. What happens in the backend is quite simple - when the tip of the tail reaches a certain point in the animation (also known as an 'event' trigger), the game receives the signal to create a projectile. There are thousands of instances where animations are an integral part of object behavior in the game. This means that animations are not only a part of the visual representation but also an integral part of game logic. This forces us to simulate most of those on the server, which poses a question: how do we synchronize the animation states between the server and the clients?

One of the funnier bugs with the AnimationGraph at the moment. When a corpse leaves the screen and appears in the game view once more after a while, it will play the death animation again. You can do it over and over.

We can't simply synchronize the state of all skeleton bones of the simulated object. Each skeleton can consist of dozens of bones. Each bone stores information about its position, scale, and orientation. That's way too much data to transfer in the case of a game that often displays thousands of units on the scene. Another approach to this is to reconstruct the state of the skeleton based on metadata, for example, movement speed, aiming direction, is the unit attacking or not, and what kind of attack it is using. Based on that information, the client can attempt to reconstruct the animation state from the server. This reduces the amount of data transferred but doesn't come without its own fair share of issues. The animation system, or more specifically the AnimationGraph that we currently use in The Riftbreaker, will have to be simulated both by the server and the clients. If the server is also being used to play the game (not running in the 'headless mode' described earlier), it will have to pay the performance cost of that. Another issue is that animation states between the server and clients can drift and desynchronize over time. This is one of the most significant issues we are trying to fix now. Compensating for animation drift requires us to implement a system to detect desynchronizations and take steps to bring things back in order.

We can also solve the problem of calculating the AnimationGraph both on the server and clients by removing the AnimationGraph from the server altogether. This drastic measure will require us to completely separate the game logic from the animation states. In order to do that, we will need to rework all the units in the game. This effort will take a couple of months but bears the promise of performance gains and further reduction of data transfer. This will also allow us to replace animation metadata with some higher-level instructions. Instead of synchronizing the animation state between PCs, we will be able to instruct units on what we expect them to do - for example, start eating grass in 2 seconds, fire a projectile in 0.5 seconds, or play the idle animation version 3 in 5 seconds.

Player 2 reduced to the role of a human-driven mini-miner.

As you can probably guess from the above explanation, solving entity animation update problems within the client/server architecture is one of the most significant problems that we’re facing. There are a few solutions to this problem, and we’re currently researching the best path to take.

[h2]CONCLUSION[/h2]

One thing is obvious - there is still a lot of work to be done. We must rework multiple systems before the game can be played through a network connection. The good news is that we have created solid foundations to build on. Another aspect worth mentioning is that a rework of our systems is also an excellent opportunity to introduce even more optimizations and improvements to the entire game, so even if you’re not interested in playing The RIftbreaker online, your single-player experience will benefit from the additional layers of polish that we’re introducing.

One thing that you might have noticed missing from this article are timelines and specific features, like the maximum player count, cross-platform availability, which game modes co-op is going to be available in, etc. The reason is very simple - we don’t know these things yet. It will all depend on how much of our plans we manage to implement. Our first goal is to get the game playable online in its most basic state. All of our plans and estimates up to that point will have a considerable margin of error, and we don’t want to promise any specific dates or features until we are confident that we can achieve them.

It's alive! Well, most of it. The lighting disagrees.

All in all, we are pretty happy with what we have managed to achieve already when it comes to implementing the co-op mode in The Riftbreaker. We have a clear plan and a list of things that need to be done, and we’re going through them one by one. We hope this article will provide more transparency into how our work on online multiplayer is moving forward.

As always, we are waiting for your feedback and questions! Post the here or on our Discord: www.discord.gg/exorstudios. We would also like you to know that as soon as we have a playable version of the co-op mode, we will start running closed beta tests. We will take applications on our social media channels, Discord server, and Steam Forums. Follow us not to miss any news on that!

A gift we got from one of the fans. We know you want it!

See you next time!
EXOR Studios