Monday, June 1, 2009

Optimizing 30,000+ Ships In Realtime In C#

My latest game, AI War: Fleet Command, has some of the highest unit counts in the RTS genre. Most campaigns against moderately-hard AIs have over 30,000 ships being simulated at once by the time they are done, and many multiplayer campaigns have upwards of 60,000 or more ships. The game is coded in C#, a JIT-compiled programming language that some people think doesn't have the power/efficiency of C++.

I don't have any numbers to really compare the processing power of one versus the other in an empirical sense, but looking at AI War I think this is a sizeable victory for the advocates of C# as a major language for heavy-duty applications. Some people have been wondering how on earth I managed to support so many ships at once in realtime, and this article pulls back the curtain.

The Game Must Be Load-Balanced In Multiplayer, Right?
Load-balancing sounds good in theory, but in most RTS games that just wouldn't be feasible except maybe on LAN games because of the frequency at which state of each of the units is changing. There is just too much unit data to pass back and forth, so all of the processing for the game simulation itself has to be run on all the computers in the game. Years ago there was a great article on Gamasutra called 1500 Archers on a 28.8: Network Programming in Age of Empires and beyond. That's a rather technical article, but it basically outlines the approach that pretty much all RTS games use for their networking and simulation. AI War is no different, and in fact as the unit counts go up we are even more dependent on this approach because there is even more data in the simulation.

Multithreaded AI On The Host Only
The big difference in AI War from any other RTS game that I know of is how the AI is handled. Essentially, what we did was separate out the real "thinking" portion of the AI into a separate thread. Most non-programmers don't know this, but that actually requires an entire copy of all of the relevant game data for the secondary thread (two threads cannot easily share data, or you get all sorts of "locking" and "blocking" issues that very much adversely affect performance). Anyway, so the main game thread passes data to the secondary AI thread, which then analyzes it on its own time and passes commands back to the main thread.

So far, most of that is done by other recent RTS games, such as for example Supreme Commander. The key difference here is that our AI thread is only run on the host computer. If you read the 28.8 article above, you already know that player-issued commands are requested during one 200ms "turn" and then simultaneously executed in the next turn by all computers in the game. My reasoning was: why can't the AI issue commands in the same way? Even though the commands often have different content from human-issued commands (since the AI has some differences in how it manages its economy, etc), the same basic approach works without increasing network load too heavily, and without requiring an AI simulation on any machine other than the host.

Do All AI War Players Need Multiple Cores, Then?
Short answer: no. All of this multithreading basically gets us to the point where the AI is on the host only, so the host greatly benefits from multiple cores, and yet the rest of the simulation is run on all of the clients in a game, and thus the primary requirements for the main processor in a game is the same all computers in the game. In early versions of the game, we topped out at around 10,000 units in a game before lag set in, but through a series of optimizations on our simulation code I got that up to 60,000 units, and now that number has increased to theoretically around 120,000 units (though I have never run more than 80,000 units in one game, since that just doesn't happen very much).

The requirements for the host are heavier because of the AI, but the clients won't benefit much at all from multiple cores. The host probably needs to be a dual core 2.4Ghz machine for (possibly) the highest difficulties and (definitely) for multiplayer games, but a 2.4Ghz single-core machine works reasonably well as the host of a single-player game. I've even had it running with minimal lag on a 1.8Ghz single-core, but that was a slightly below-average experience so we consider that under our minimum requirements (even though it is playable).

Optimizations vs. Inherent Advantages Of AI War's Design
Mainly the optimizations were something that just required thought, code profiling, and lots of time and testing (more on that in a minute), but there were also a few key advantages that I had based on the design of the game itself:

Advantage 1: Faux-2D Space
The game is set in space, and I am treating it as pseudo-3D space even though the game is 2D (aka, ships can pass one another because they are presumed to be on slightly different Z axes). This meant that collision-detection between ships is not really a major focus of the simulation, since the only thing that really stops movement is force fields. There are still shot-collisions, and collisions with mines, but I was able to get a lot of optimizations out of this aspect of setting the game in space in the manner that I did.

Advantage 2: Limited Pathfinding Needs
The game is set in space, and so the straightest path between two points is always a line. So that means that pathfinding, which is a huge part of the CPU load in terrestrial RTS games, was a nonfactor for AI War. That freed up some CPU cycles to do something else -- in my case, I decided to have much more intelligent auto-targeting of ships in the game, using that CPU that the lack of pathfinding freed up. This ultimately started using even more CPU than pathfinding typically does, but later optimizations resolved that.

Side note: My other game, Alden Ridge (which is not yet released, but actually predated AI War in terms of development), has very heavy pathfinding as part of the core of it. I actually found some awesome ways to optimize pathfinding for that other game, but scaling that up to an RTS game would have been a challenge.

Advantage 3: Multiple Planets
Like many space-based games, AI War is divided up into individual planets. This is a huge benefit to processing the simulation, because when two ships are on different planets, you know there can't possibly be any interactions between them. This basically acts as a massive pre-sorting of all the ships, so that when you are doing things like target-selection or collision detection or whatever, you are working with smaller batches of ships at a go (which can still be in excess of 5,000 or 6,000 ships at once, but it's not likely there would ever be 60,000 ships all in one system).

And Now For The Optimizations
Okay, so those were the advantages that the overall design created for me, and I decided to make full use of them. But even with those advantages, some serious optimization was needed in order for the game to support even 20,000 units (which is about double what you see in the largest versions of most other games in the genre). These optimizations were integrated on an ongoing process through our alpha and beta development, and even now occasionally one occurs to me and I put it in (such as with the 1.004 release).

Here's a list of a few key optimizations that were made:

Optimization 1: Fixed-Int Math
Fixed-int math is used instead of floating-point. This was needed to insure consistent simulations between differing CPU architectures, but it is also faster. A lot of handheld consoles, like the Nintendo DS for instance, do not even have a floating-point system and so fixed-point math is always used there. There were not many good examples of fixed-point math that I could find in C# (the language AI War is coded in), so there was some discussion on StackOverflow on how to do this best. If you are curious, here it is, along with the working code I ended up with for that: http://stackoverflow.com/questions/605124/fixed-point-math-in-c

Optimization 2: Distance Approximation
Through profiling, I discovered that the single-biggest CPU eater in alpha versions of the game was range checks. As in, "how far apart are these two objects?" Range checks are needed for all sorts of things, from collision detection to target selection to moving to a destination point. The range checks were particularly slow because the standard formula involves taking a square root, which is expensive in terms of processor (it is an iterative function in programming -- it makes 16 guesses, each of which gets closer to the real square root). With so many ships in the game, I was often having half a million range checks or so per second, and that was really limiting with the square root in there.

The solution turned out to be to move to a distance approximation method that avoids square roots (which unfortunately I don't have the link to, but it was a C implementation that I converted to use in C# -- if another dev or anyone else wants my code for that, I'll post it). Anyway, the realization was that most ranges don't need to be super precise, because it's more a question of "which ships are relatively closer." But even for things like movement and collision detection, the less precise range checks work until units are very close together, and then I switch to the more precise, more CPU-intensive range check. That really mitigates the CPU load.

Optimization 3: Cycles Per Second
Like most games, AI War has a game loop. It runs at 22 iterations per second, which through testing was determined to be the sweet spot for having a reasonable framerate but the best possible performance on midrange PCs. So this was a bit of a cheat, but no one has commented or has even seemed to notice. When given the choice between a 60fps game and a game that has huge numbers of units making complex decisions, I opted for the latter.

Optimization 4: Movement Calculation Caching
One sizable optimization involved calculating movement more efficiently. Units move at all sorts of angles in AI War, like in most RTS games, and I'm not using a sub-pixel position scheme, so that means that floating-point style math (fixed point math in my case) is needed. Basically, to move from point A to point B, a ship must move at a certain angle, but as the ship moves that angle changes slightly because of the integer nature of the pixel coordinates. So every cycle the game must recalculate every ship's movement angle (and thus the x/y components of that movement by taking the sin and cos), and that gives smooth, perfect lanes of travel.

That works great, but optimizations are sorely needed when you start talking about 30,000+ moving objects (which can happen if you play against two of the Special Forces Captain AI types, which have very few non-moving ships). The trick I am using is to only recalculate the angle every 1 turn (so roughly 4 times per second) instead of once per cycle (which would be 22 times per second). The visual difference is imperceptible to the players, but I'm doing much less processing for movement.

Optimization 5: Scheduled Collision Detection
Another big optimization was to do with collision detection. Ships are not allowed to stop on top of one another, so whenever a ship reaches a destination point it must check to see if it is on top of other ships, and if so then move to an open space. This is iterative, and as you can imagine eats up a lot of CPU on planets with many ships on them. If a lot of ships are collision-detecting at once, too much CPU is required, which in alpha versions would cause a lot of lag when big fleets came out of a wormhole, etc.

The solution was to spread out the collisions of this sort, since there's not really a practical reason that they have to be done right away. The game thus only processes 25 collisions of this sort per CPU cycle, or 550 per second. This means that sometimes you see ships bunch up when coming out of a wormhole or all moving to a big point, and then they fly out of the center in waves, which is actually a very cool visual effect that was completely unintentional to start with. Bear in mind that this is just referring to collision detection for ships that might have stopped on top of one another -- when talking about collisions with force fields or mines, that happens in closer to realtime, but there are fewer cross-checks to do with those, so that works out CPU-wise.

Optimization 6: Ship Rollups And Indexes
Pre-sorting of ships was a huge improvement across the board for AI War, and I don't think most RTS games bother with this because their unit counts are often too low to really see much of a benefit from it. I come from a SQL database background, though, where indexing tables is a key to good performance. That's another programming environment with vast numbers of data points (a table with 100,000 rows is a pretty small one), and I took a lot of ideas from my experience there. Basically, whenever I need to make decisions about ships, I pre-sort them into smaller per-purpose collections that can be looped over much more efficiently. These sorts of rollups cause more load when a new ship is created or when it moves from one planet to another, but that's a minuscule difference that is not compounded by looping, and it makes for huge gains in later processing.

Optimization 7: Compressed, Batched Network Commands
Commands that are sent across the network are batched and compressed before being sent. All of the requested commands (from human players and the AI) are batched on the host computer and then compressed as one big chunk and sent via TCP to the clients for decompression, parsing, and execution. This works well with the 200ms turn-based approach outlined in the 28.8 article above. Having commands to 4,000 units actually compresses pretty well, so you tend not to see any network lag on broadband connections even with 4 or more players. But you do need to turn off any background upload/download processes going on, because those can be hefty. And if the game host's connection is less than ideal, or there are a lot of players in the game, then one of the clients should be the voice chat host (using Skype or Teamspeak or whatever).

It might surprise other devs that we are using TCP instead of UDP, but we wanted robustness and reliability over speed. With compression, we already had all the performance we needed and more, even with so many units in the game. I actually started out with a UDP solution in early alpha versions, but it was problematic in many ways and so I just recoded everything using TCP sockets, and I've been much more pleased with the results of that.

Optimization 8: Scheduled Target Selection
The last major optimization I'll talk about is to do with target selection. The AI thread and the human players of course do a lot of the target selection themselves -- choosing targets to chase, or to move over to, etc. But one of the nice things about spaceships is that they can typically fire while moving. So if ships don't yet have a target, or are not yet in range of their AI/human-given target, they should shoot at whatever they pass on the way. And if they are left unattended somewhere by the player, they should shoot at whatever comes in range. And if players put them in attack-move mode or the new "free roaming defender" mode from version 1.004, then the ships need to respond to threats in an automated fashion. Plus then you have engineers which need to find targets to repair or assist, and you've got tractor beams that need to look for targets to grab, and tachyon beam emitters that need to find targets to tag, etc. All of this happens as part of the main simulation thread, because it needs to happen on an interval shorter than once per 200ms turn, which is the interval on which human/AI commands are issued.

Target selection thus became the single biggest bottleneck for having more units in the game, and especially for having thousands of ships from multiple teams on a single planet. There were a lot of small optimizations that I made with this process (the range-check efficiency improvement that was already mentioned being one key one), but the biggest thing was to also stagger these target selection checks. The game currently processes around 1,100 target-selection-checks for ships per second per player. In a game with 60,000 ships, that might not sound like a lot, but never fear -- there are rarely more than 10,000 ships all in battle at one time, and that's even a rare case with most battles being between 1,000 and 3,000 ships. The other things is that since that is per player, if you have 60,000 ships in your game you probably have at least 4 players, so that would only take it around 13-14 seconds to do a full target selection loop for all ships -- but in reality, as I've said already, it takes much less time since not all ships are in battle and they don't require target selection checks if there are no enemies on their planet.

Optimization 9: Doubling Ship Capacity In Version 1.0004
Related to the target-selection optimization, I'll talk briefly about the optimization in version 1.004 that lets the game make the leap to almost double capacity. Basically, I came to the realization that if there were 50,000 ships sitting on enemy planets that I am not attacking, and there are only 10,000 ships that I am actively engaged with attacking or defending against at the moment, then I can completely ignore those other ships unless they have specific orders (like moving or collision-detecting, etc). This lets the CPU be concentrated on ships that actually need it, and those other ships just don't do anything. This is an easy state to detect because of all the rollups in Optimization 6 above. This simple change has let me run the game at almost double-speed with 80,000 ships in the game -- previously the game would start lagging on a 2.4Ghz CPU well before that point.

What About Graphics Performance?
Everything so far has pretty much addressed CPU efficiency, but what about the graphics? It takes a lot to draw thousands of onscreen ships at once, even if the game is 2D.

Well, the game is 2D, but it uses Direct3D9 under the hood. That lets us do 3D effects like the zoom (the parallax effect of the planets is just a simple parallax effect, it's not true Z depth), and it also lets us take advantage of hardware acceleration. There were a lot of optimizations we had to make in that part of our engine (our entire engine is custom from the ground up, by the way), but those are fairly standard and prosaic, so I won't bore you with those.

For lower-end graphics cards, showing a few thousand ship sprites at once can still be a challenge because of the way that the graphics card has to do texture state changes. So we have a settings option that lets the game do more efficient sorting for the graphics card, but which does cause some strange/incorrect graphical overlap on occasion when you are all the way zoomed in. Most players don't even notice the difference, and the increase in performance is notable if you are on a laptop card (as two of our alpha testers were) or a card that is more than three or four years old.

In Conclusion
Optimization is important for any game -- or any other high-performance or high-throughput application -- but when you have such a vast number of ships doing expensive calculations, the need for optimization becomes even greater. The techniques outlined in this article are a great place to start, but there were even more other, smaller techniques used in AI War, and the degree to which any given technique is appropriate will vary by game. Optimization on the game Alden Ridge has been an entirely different experience, for instance.

Most of all, I hope that this article has shown not only the importance of optimization when creating games of massive scale, but also that C# is a very viable language for applications of this CPU intensity. The conventional wisdom is that a realtime game of this scope would have to be in a different language (ASM, C, or C++, usually), but that is clearly not true. C# is a much faster language to program in than C++, generally speaking, and this enabled me to create a game of this sort with only seven months of development. For developers large or small, that sort of speedup in the development process can be quite significant!

8 comments:

E said...

Great article!
I'm interested in the distance approximation method you used, could you possibly find it and post?

Cheers :)

Christopher M. Park said...

Hi E,

Glad you enjoyed the article! Here's another one that looks at distance approximation: Range Checks - Approximation vs Accurate

Unknown said...

Had fun reading.

Have you considered "Fast inverse square root" from Quake III in your distance approximation? How do you think would it affect the performance?

References here:
http://en.wikipedia.org/wiki/Fast_inverse_square_root
http://www.codemaestro.com/reviews/9

Christopher M. Park said...

I actually had not heard of that, but the fact that I am using fixed-int math instead of floating-point math probably complicates that. I took at look at some examples of that method, but I'm not comfortable enough with them to try translating them to fixed-in style. The distance approximation method that I'm linking to above does fewer overall multiplies and divides to get the result I need, anyway, though. But thanks for the links, those were very interesting!

Unknown said...

depending on exactly how you're using the distance-between-objects, a lot of times it's easier to simply compare the squares, so instead of:

dx = x1-x2;
dy = y1-y2;
if (sqrt(dx*dx + dy*dy) < dist)

do:

if (dx*dx + dy*dy < dist*dist)

I'm sure you've thought of that, but I'm just giving a suggestion.

Christopher M. Park said...

Kevin,

Believe it or not, I didn't think of that -- I've tried that out in a few specific places, and it really helps a lot! The main thing that is problematic with it is that because of the huge scale of distances in AI War, I need to use int64s instead of int32s, but the performance of that is still superior to even the distance approximation method. So as of version 1.014A, AI War will have three different methods of calculating distance -- one very precise, one precise-enough-for-all-but-close-ranges, and then your suggestion for precision comparisons.

Thanks very much for the note!

etherjoe said...

just installed AI Wars myself. I was also surprised for a second at the prerequisites. But I figure, #1 this is an indie game from a small shop, and #2 a .net update is pretty understandable.

You would think that a magazine focused on independent games would have a similar kind of sympathy.

Christopher M. Park said...

Ah, well. So it goes!