Things that can muddle your replay feature

Sept. 30, 2016
protect

Introduction

Replay features (in this text term replay means that player can see a replay of recent game event after it has happened) are very useful for games. They can be used to drive retention, to guide players, to encourage social gameplay, to fast track bug reporting, to help performance testing etc.

Implementing a replay feature to your game isn't usually a trivial task, since nowadays games have so many moving components that should be tracked for replays. Because replay feature might require work to multiple parts of the game engine, it is much easier to implement replay feature to game while it is in development, instead of patching replay feature to game after it has been released.

As rule of thumb, you should decide early on if you want the replay feature to your game. If you choose to implement one then I would suggest that you test it out multiple times during the project to make sure it works as intended. Poorly working replay feature implementation can lead to e.g. game crashes or cheating accusations.  

 

Types of replay

There are three different types of replays in computer/video games:

1. Record player or players input while game is happening and recreate gameplay from that input when player watches the record.

e.g. player 1 pushes Up button in frame 456, Down button in frame 531 and Attack button in frame 556.

 

2. Record needed states of objects in game during different frames and recreate gameplay from those states (interpolate if needed)

e.g. in frame 32 box has position x:145 y: 32 and player character has position x: 86 y: 54 with animation sprite #7 shown

 

3. Capture video from gameplay and replay that video file

e.g. in iOS device use ReplayKit to record gameplay

 

In this blog post I am talking about types of 1. and 2. since video playback doesn't require recreating the gameplay from game engine. Type 1. replay is most error prone to implement but those input based replays also take least amount of storage space and are the best option for bug replication. Type 2. replays take more storage space than type 1. replays but are usually easier to implement. 

Naturally you can combine replay types together (e.g. with AR game you can store video for background, and frame specific player inputs from controller to get your replays) but it is usually easier to stick with one replay type.

 

Deterministic system

As the topic says, if you want to implement proper type 1. replay feature for your game, you need to have Deterministic system. With deterministic system I mean that same input should always produce same output. Your game engine doesn't have to be completely deterministic to support replay feature, but it has to support some kind of override mechanism to playback those recorded replays.

In most games you don't need pixel accurate replay of video output, and you don't need sample accurate audio output replay, so you can easily cut some corners while you implement replay feature to your game. Nobody is going to notice if e.g. your particles drop a bit differently every time the replay is played. But if the replay sometimes ends in different gameplay state (e.g. red team wins the round during the replay while blue team actually won the round during gameplay) then you are in trouble.

Naturally this requirement for deterministic system also applies to type 2. replays if you don't store every state of every object during every frame (e.g. if you only store every Nth frame as key frame and interpolate between them). 

So if you have to calculate game states during replay playback, make sure those calculations are exactly same as they were when that game event actually happened.

 

Problems:

1. Random number generation (RNG)

If your game uses random number generator, it is possible that it will cause problems for your replay feature. You should always use a deterministic random bit generator (DRBG) when you need random numbers for your gameplay. And when you use DRBG, make sure you store the seed of RNG initialization to replay file, and that you query numbers from it in same order. That way you can guarantee that RNG provides same numbers during the replay as it did during gameplay.

 

- Examples of RNG that will break your replay -

* During gameplay archer shoots swordman with bow and RNG rolls damage of 6 points (RNG range in damage is 4-8). Since RNG state wasn't stored to replay file, RNG uses default or current state of random number generation. That mistake leads to situation where the replay RNG spits out damage of 8 points.

* During gameplay archer's AI script is handled first, and it hits the swordman with critical hit (roll 1 from range 1-100), and the swordman gets stunned and cannot attack. During the replay swordman's AI script goes first (because order is wrong) and this time it rolls 1 and archer get stunned.

 

While some systems might give you hardware based random number generators, those are bad for replays since they usually don't guarantee determinism. Also, if you have to code your own RNG, then make sure it is DRBG by NOT using things like CPU usage percentage, system thread count, uninitialized variables or process ID in number generation.

It is easier to detect and fix RNG related problems if you create one single class for RNG related functions, and make sure that only it is used for generating random numbers. Also you should create unit test cases for those RNG functions to test out their determinism.

You might also want to separate RNG for gameplay logic and for visual effects. That way playback device specific visual changes (e.g. less particles on low quality settings) won't alter the replay playback.

 

2. Floating point math

Hardware based floating point values and functions that operate on floating point values can easily break your replays. The first big problem is related to optimizations, since output of floating point operation can produce different results when done with optimized binary vs. unoptimized binary. This means that replay from release version of the game can produce different playback in debug version of the game. 

Second big problem with hardware based floating point math is portability. If you run your replay on different hardware (or use different compiler) you might get different results. This might become an issue if you have to verify client replays in server, because server could flag some of those replays as cheated ones if their states don't match.

Floating point related differences between original gameplay and replay might be super hard to spot with naked eye. This is because those differences are usually very small and/or they might even cancel out each others. So it is possible that some of the replays work correctly while others have minor differences between original gameplay and replay.

In some cases you might be able to design your game in such way that it doesn't use floating point math for gameplay (e.g. all positions are integers). Or if you need real numbers in your gameplay and you must be sure that they work always on same manner with every supported platform then you can choose to use Fixed-point arithmetics.

 

3. Script/code execution order

This part doesn't apply to all the games, since some game engines have hard coded code execution order (which guarantees that events are always played out in same manner). But if e.g. processing order of your NPCs can be different between runs, then you should make sure that during the replay the order is exactly same as it was during gameplay.

 

- Examples of script execution order that will break your replay -

* During gameplay archer's AI script is handled first, and it hits the swordman with critical hit (roll 1 from range 1-100), and the swordman gets stunned and cannot attack. During the replay swordman's AI script goes first (because order is wrong) and this time it rolls 1 and archer get stunned.

* During gameplay swordman gets two hits from arrows (archer #1 shot first arrow, archer #2 shot second arrow) and dies. Since archer #2 shot the latter one, he/she gets kill score. During the replay order of archers is reversed, so this time archer #1 gets the kill score.

 

Keeping the right order becomes more difficult in situations where scripts are dynamically added/removed to execution engine during gameplay. And other thing that can bite you is multithreading when it is used to handle multiple scripts of same type at the same time.

 

4. Replay serialization/deserialization

There are many things during serialization/deserialization that can cause problems for your replays, but here I have listed three that are somewhat common.

 

First one is endianness which isn't that big of deal nowadays since most gaming platforms are little-endian. Endianness issue is usually very easy to spot, because e.g. little-endian uint32 value 1 turns into 16777216 with big-endian platform. If you have to support both endians then you should choose one of those and save all the replays using that format (I would go with little-endian).

 

Second one is serialization of floating point values. e.g. if you call following code in C#


float fValue = 123.44251f; 
Console.WriteLine(fValue.ToString());

you get 123.4425 and some accuracy is lost. Right way is to use round-trip formatting when storing floats as strings that have to keep their accuracy


float fValue = 123.44251f;
Console.WriteLine(fValue.ToString("R"));

and this gives output of 123.442513. This is also one of those issues that might be difficult to spot since small differences in floating point values could still lead to end result that seems to be correct. 

 

Third one is order of serialization/deserialization. This means that if event A happened before event B during gameplay then replay data structure should also keep same order of events and replay playback should play events in that order. 

 

5. Physics engines and other third party plugins

"The first problem here is that the PhysX SDK is not deterministic. Especially when running different hardware setups, bus latencies can vary between runs, or on different machines. Even without hardware in the machine, we do not guarantee any type of determinism." PhysX Knowledge Base/FAQ

Since most physics engines aren't deterministic they can cause big problems when you try to implement a replay feature. If you want to use physics engine then you cannot in most cases get accurate replication of outcome with pure input based replays (type 1.) which usually means that you have to go with state recording (type 2.) and partially/completely disable the physics engine while the replay is played.

In some situations this partial disable of physics engine might be a hard problem, since you could still need e.g. triggers and collision matrices but not the gravity. This might lead to situation where you have to process certain objects in game engine differently while replay is played back and restore "normal" behavior to those objects in regular gameplay.

Same applies to all plugins (e.g. AI and path finding), since they might not be deterministic and because of that they might also need to be partially/completely disabled. 

With all the plugins the engines are using, it is easy to miss the problems they might cause in certain setups (in some cases it is obvious that certain component does not behave as it should), because you can get exactly same results on every run when you are only testing replay feature on a single device. 


(50 cubes dropped from same height one after another via script, 4 runs and 4 different outputs)

 

6. Game updates (logic or values)

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>