Sponsored: Improving Android game performance with Intel INDE GPA

Aug. 13, 2015
protect

This tutorial presents a step-by-step guide to performance analysis, bottleneck identification, and rendering optimization of an OpenGL ES 3.0 application on Android. The sample application, entitled “City Racer,” simulates a road race through a stylized urban setting.  Performance analysis of the application is done using the Intel INDE Graphics Performance Analyzers (Intel INDE GPA) tool suite.

Acknowledgements

This tutorial is an Android and OpenGL ES 3.0 version of the Intel Graphics Performance Workshop for 3rd Generation Intel Core Processor (Ivy Bridge) (PDF) created by David Houlton.  It ships with Intel INDE GPA.

Tutorial Organization

This tutorial guides you through four successive optimization steps.  At each step the application is analyzed with Intel INDE GPA to identify specific performance bottlenecks.  An appropriate optimization is then toggled within the application to overcome the bottleneck and it is analyzed again to measure the performance gained.  The optimizations applied are generally in line with the guidelines provided in the Developer’s Guide for Intel Processor Graphics (PDF).

Over the course of the tutorial, the applied optimizations improve the rendering performance of City Racer by 83%.

City Racer Icon

The combined city and vehicle geometry consists of approximately 230K polygons (690K vertices) with diffuse mapped materials lit by a single non-shadow casting directional light.  The provided source material includes the code, project files, and art assets required to build the application, including the source code optimizations identified throughout this tutorial.

Prerequisites

City Racer Sample Application

City Racer is logically divided into race simulation and rendering subcomponents.  Race simulation includes modeling vehicle acceleration, braking, turning parameters, and AI for track following and collision avoidance.  The race simulation code is in the track.cpp and vehicle.cpp files and is not affected by any of the optimizations applied over the course of this tutorial.

The rendering component consists of drawing the vehicles and scene geometry using the OpenGL ES 3.0 and our internally developed CPUT framework.  The initial version of the rendering code represents a first-pass effort, containing several performance-limiting design choices.

Mesh and texture assets are loaded from the Media/defaultScene.scene file.  Individual meshes are tagged as either pre-placed scenery items, instanced scenery with per-instance transformation data, or vehicles for which the simulation provides transformation data.  There are several cameras in the scene:  one follows each car and an additional camera allows the user to freely explore the scene.  All performance analysis and code optimizations are targeted at the vehicle-follow camera mode.

For the purpose of this tutorial, City Racer is designed to start in a paused state, which allows you to walk through each profiling step with identical data sets.  City Racer can be unpaused by unchecking the Pause check box in the City Racer HUD or by setting g_Paused = false at the top of CityRacer.cpp.

Optimization Potential

Consider the City Racer application as a functional but non-optimized prototype.  In its initial state it provides the visual result desired, but not the rendering performance.  It has a number of techniques and design choices in place that are representative of those you’d find in a typical game-in-development that limits the rendering performance.  The goal of the optimization phase of development is to identify the performance bottlenecks one by one, make code changes to overcome them, and measure the improvements achieved.

Note that this tutorial addresses only a small subset of all possible optimizations that could be applied to City Racer.  In particular, it only considers optimizations that can be applied completely in source code, without any changes to the model or texture assets.  Other asset-changing optimizations are excluded here simply because they become somewhat cumbersome to implement in tutorial format, but they can be identified using Intel INDE GPA tools and should be considered in a real-world game optimization.

Performance numbers shown in this document were captured on an Intel Atom processor-based system (codenamed Bay Trail) running Android.  The numbers may differ on your system, but relative performance relationships should be similar and logically lead to the same performance optimizations.

The optimizations to be applied over the course of the tutorial are found in CityRacer.cpp. They can be toggled through City Racer’s HUD or through direct modification in CityRacer.cpp.

CityRacer.cpp

CityRacer.cpp

1

boolg_Paused = true;

2

boolg_EnableFrustumCulling = false;

3

boolg_EnableBarrierInstancing = false;

4

boolg_EnableFastClear = false;

5

boolg_DisableColorBufferClear = false;

6

boolg_EnableSorting = false;

They are enabled one by one as you progress through the optimization steps.  Each variable controls the substitution of one or more code segments to achieve the optimization for that step of the tutorial.

Optimization Tutorial

The first step is to build and deploy City Racer on an Android device.  If your Android environment is set up correctly, the buildandroid.bat file located in CityRacer/Game/Code/Android will perform these steps for you. 

Next, launch Intel INDE GPA Monitor, right click the system tray icon, and select System Analyzer.

System Analyzer will show a list of possible platforms to connect to. Choose your Android x86 device and press “Connect.”

System Analyzer - Choose your Android x86 device

When System Analyzer connects to your Android device, it will display a list of applications available for profiling. Choose City Racer and wait for it to launch.

System Analyzer - a list of applications available for profiling

While City Racer is running, press the frame capture button to capture a snapshot of a GPU frame to use for analysis.

Capture a snapshot of a GPU frame to use for analysis

Examine the Frame

Open Frame Analyzer for OpenGL and choose the City Racer frame you just captured, which will allow you to examine GPU performance in detail.

Open Frame Analyzer for OpenGL* to examine GPU performance

The timeline corresponds to an OpenGL draw call

The timeline at the top is laid out in equally spaced ‘ergs’ of work, each of which usually corresponds to an OpenGL draw call.  For a more traditional timeline display, select GPU Duration on the X and Y axis. This will quickly show us which ergs are consuming the most GPU time and where we should initially focus our efforts.  If no ergs are selected, then the panel on the right shows our GPU time for the entire frame, which is 55ms.

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>