## Freescale Semiconductor White paper

## Document Number: IMX31GRAPHICSWP Rev. 0.0, 06/2005

# 2D/3D Graphics Support in the i.MX31 and i.MX31L Multimedia Applications Processors

by: Roman Mostinski and David Yoder

# **1** Executive Briefing

As multi-functional information devices such as smartphones and PDAs become ubiquitous, there is an increased demand for more and more functionality in these devices. A rapidly emerging consumer demand is the ability to use the device to play 3D games. Success in this marketplace requires graphics performance that is comparable with the user experience with 3D graphics game performance in computer applications and game consoles. The ability to deliver 3D game performance may represent the next killer application for the next generation of applications processors.

Currently, even the most powerful ARM<sup>®</sup> processors do not have the ability to produce the necessary high quality interactive 3D graphics, which require a high screen resolution (half-VGA and above), a color depth of more than 16 bits-per-pixel (bpp) at a redraw rate of at least 30 frames per second (fps), and the ability to perform all of this with very low power consumption. To achieve the desired performance levels, it is necessary to add a Graphics Processing Unit (GPU) to the ARM CPU.

## Contents

| 1 | Executive Briefing 1            |
|---|---------------------------------|
| 2 | The i.MX31 Graphics Solution 3  |
| 3 | Overview 4                      |
| 4 | GACC 2D/3D Graphics Features 4  |
| 5 | Integration in the i.MX31 SoC 5 |
| 6 | i.MX31 Graphics Performance7    |
| 7 | Expected Performance 8          |
| 8 | Software 10                     |
| 9 | Conclusion 10                   |



### **Executive Briefing**

Freescale's i.MX31 adds this by integrating a dedicated 3D graphics accelerator (GACC) as seen in Figure 1.

## NOTE

The *integrated* GACC is not available in the i.MX31L processor.



Figure 1. i.MX31 with Internal GACC

After deciding to integrate a GPU on the processor, a second consideration for the design engineer is how to provide the memory resources required for 3D graphics. A shared system memory (or unified memory architecture) uses the same memory for graphics operations and other system functions, whereas a dedicated graphics memory is available exclusively for graphics operations. Table 1 shows the advantages and disadvantages of using unified and shared memory.

|                                     | Unified Memory Architecture                                                                                                                                                                                                                                                                                                                                                                                                | Dedicated Graphics Memory                                                                                                                                                    |
|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Internal<br>Graphics<br>Accelerator | <ul> <li>The cost impact of using an integrated GACC is relatively low, resulting in an increase in the applications processor die size and small increase in price.</li> <li>There is flexibility in memory usage.</li> <li>Graphics Accelerator works directly with existing memory. Consequently, screen resolution and color depth, are virtually unlimited. It may even be possible to support VGA@24 bpp.</li> </ul> | <ul> <li>Memory resources wasted when current applications are not using graphics acceleration.</li> <li>Additional pins required for Graphics memory connection.</li> </ul> |
| External<br>Graphics<br>Accelerator | • Additional system production costs could be significant. In addition to the cost of the external chip cost, additional system board area is required for the external GACC. Other factors that impact the total cost are the requisite board-level tests, lowering of MTBF, and other factors.                                                                                                                           | <ul> <li>Significant additional cost and board area.</li> <li>Flexible for selection of GACC performance<br/>and features.</li> </ul>                                        |

| Table 1. Graphics Subsystem | Architecture Comparison |
|-----------------------------|-------------------------|
|-----------------------------|-------------------------|

# 2 The i.MX31 Graphics Solution

The i.MX31 and i.MX31L processors provide the designer the ability to connect an external graphics accelerator via the Image Processing Unit (IPU) asynchronous port, and the i.MX31 also has its own integrated GACC providing hardware acceleration for 2D and 3D graphics algorithms. The internal GACC has sufficient performance to run desk-top quality interactive graphics applications on displays with the screen resolution of VGA and above and color representation up to 32 bits per pixel. The GACC is built around the ARM MBX R-S graphics accelerator. The GACC high level block diagram and processing data flow are shown in Figure 2. This white paper is, in part, based on *MBX R-S 3D Graphics Core (GX20) Technical Reference Manual*, rev r1p2, ARM DDI 0295E, and describes the ARM MBX R-S customization and integration in the i.MX31 SoC.

Figure 2 on page 4 shows the GACC high level block diagram and processing data flow.

### Overview



Figure 2. GACC Block Diagram and Data Flow

# 3 Overview

The GACC operates on 3D scene data sent as batches of triangles. Triangles are written directly to the Tile Accelerator (TA) on a First In First Out (FIFO) basis so that the CPU is not stalled. In addition, the Enhanced DMA (eDMA) of the i.MX31 can be used to perform batch transfers with very low CPU involvement. The TA performs advanced culling on triangle data by writing the tiled non-culled triangles to the external memory.

The event manager uses SmartBuffer<sup>TM</sup> technology to ensure that any level of scene complexity is handled in a fixed display list buffer size.

The Hidden Surface Removal (HSR) engine reads the tiled data and implements per-pixel HSR with full Z-accuracy. The resulting visible pixels are textured and shaded in Internal True Color (ITC, 24 bits per pixel) before rendering the final image for display buffer.

# 4 GACC 2D/3D Graphics Features

The GACC offers the following features necessary to perform high quality 3D graphics:

- Deferred texturing
- Screen tiling
- Flat and Gouraud shading
- Perspective correct texturing

### Integration in the i.MX31 SoC

- Specular highlight
- Floating-point Z-buffer
- 32-bit ARGB internal rendering and layer buffering
- Full tile blend buffer
- Z-load and store mode
- Per-vertex fog
- 16-bit RGB textures, 1555, 565, 4444, 8332, 88
- 32-bit RGB textures, 8888
- YUV 422 textures
- PVR-TC compressed textures
- One-bit textures for text acceleration
- Point, bilinear, trilinear and anisotropic filtering
- Full range of  $OpenGL^{(R)}$  and  $Direct3D^{(R)}$  (D3D) blend modes
- Dot3 bump mapping
- Alpha test
- Zero-cost full-scene anti-aliasing
- 2Dvia3D (R) 2D graphics acceleration.

# 5 Integration in the i.MX31 SoC

Figure 3 on page 6 shows GPU connectivity in the i.MX31 SoC:

Integration in the i.MX31 SoC



Figure 3. GPU System Connectivity

## NOTE

The Integrated GACC is not available in the i.MX31L processor.

The GPU offers two interface configurations:

• The register block interface of the GPU is an AMBA<sup>TM</sup> Advanced High-performance Bus (AHB) slave interface allowing access to the control registers and also serves as the input to the tile accelerator. A high performance scenario would involve 10k triangles per frame, using 32-byte geometry data per vertex and a 30-fps redraw rate. This would result in an approximate 30-Mbyte/s average throughput. The TA input port is equipped with FIFO buffer to prevent the CPU from being stalled while awaiting data. Also, the eDMA could be used for transferring the geometry data from the system memory to the GACC, with low CPU involvement.

The AHB interface is a fully compliant ARM/AMBA/AHB bus interface. It has been designed to comply with Rev 2.0 of the AMBA specification and it implements the Retry-capable-Slave part of the specification. This interface has a maximum clock frequency of 133 MHz.

• The second interface allows the GACC to access pixel and texture data in external memory. All read and write requests from the GACC serve through the Graphics Port (GXP). This is a custom point-to-point interface, not AMBA, with the GACC acting as GXP master. This interface supports a simple handshake protocol and splitted (pipelined) transactions. The GX Port must be connected to a matching slave interface port on the External Memory Interface controller. This interface has a maximum clock frequency of 133 MHz.

# 6 i.MX31 Graphics Performance

To produce meaningful data about the performance of the i.MX31 GACC, Section 6.1, "Real World Case Definitions" provides detailed data using real world applications.

# 6.1 Real World Case Definitions

Several real world use cases are defined for target architecture evaluation (Table 2) based on in-simulation game statistics:

| Use Case  | Triangles<br>Count<br>[Tri/frame] | Depth<br>Complexity<br>[PixDrawn/Pixel] | Frames Rate<br>[Frame/sec] | Display<br>Size | Triangles<br>Rate<br>[Tri/sec] | Pixel<br>Rate<br>[Pix/sec] |
|-----------|-----------------------------------|-----------------------------------------|----------------------------|-----------------|--------------------------------|----------------------------|
| FPS 1 (2) | 2092                              | 3.4                                     | 30                         | QVGA            | 62760                          | 7.8M                       |
| FPS 2 (2) | 4525                              | 3.4                                     | 30                         | QVGA            | 135750                         | 7.8M                       |
| FPS 2 (1) | 5221                              | 9.7                                     | 30                         | QVGA            | 156630                         | 22M                        |
| FPS 1 (1) | 6358                              | 2.2                                     | 30                         | VGA             | 190740                         | 20.3M                      |
| FPS 2 (2) | 4525                              | 3.4                                     | 30                         | VGA             | 135750                         | 31.3M                      |
| FPS 2 (1) | 5221                              | 9.7                                     | 30                         | VGA             | 156630                         | 89.4M                      |

 Table 2. Real Game Use Case Definition

## 6.2 Calculating Real World Performance

The ideal performance specified for the ARM MBX R-S 3D graphics core does not take into account limitations, such as memory latency and host load, and is specified for a case when all system delays and limitations are removed. In this case, MBX R-S parameters are (assuming 66 MHz GPU core clock):

- Geometry throughput, 0.8 MTri/sec
- Rendering throughput, about 100 MPix/sec

### Expected Performance

# 7 Expected Performance

Table 3 shows expected maximum performance in terms of re-draw frame rate and performance of graphics core and memory bandwidth utilization for each of the use cases defined in the Table 2. The targeted performance is 30 fps and is evaluated using the following assumptions:

- The graphic core runs at 66 MHz.
- Only external memory is used as frame buffer and work area.
- The external memory operates at 133 MHz. Table 3 show both 16-bit DDR and 32-bit DDR SDRAM memory configurations.
- The memory bandwidth available for graphics is limited to 70% of total memory bandwidth.
- There is enough the bandwidth needed for LCD controller (in Image Processing Unit) reads of the front buffer.
- The ARM CPU with vector floating point support runs at 532 MHz and 3D graphics calculations utilize less than 70% of ARM's capacity.
- Quantifiable information is based on simulation.

The evaluation presented in Table 3 shows that, in the all cases, the 30 fps target is met by the i.MX32, and that maximum performances are limited by the external memory bus bandwidth.

|                      |              | Max. Frame rate     | e available (fps)   | Graphics                          | External<br>Memory                                 |  |
|----------------------|--------------|---------------------|---------------------|-----------------------------------|----------------------------------------------------|--|
| Use Case Description | Display Size | 16-bit DDR<br>SDRAM | 32-bit DDR<br>SDRAM | core<br>utilization<br>@30fps (%) | Utilization<br>16-bit DDR<br>SDRAM @ 30<br>fps (%) |  |
| FPS 1 <b>(2)</b>     | QVGA         | 92                  | 142                 | 24                                | 23                                                 |  |
| FPS 2 <b>(2)</b>     | QVGA         | 56                  | 66                  | 56                                | 37                                                 |  |
| FPS 2 (1)            | QVGA         | 51                  | 57                  | 64                                | 42                                                 |  |
| FPS 1 <b>(1)</b>     | VGA          | 39                  | 78                  | 60                                | 54                                                 |  |
| FPS 2 <b>(2)</b>     | VGA          | 31                  | 61                  | 60                                | 68                                                 |  |
| FPS 2 (1)            | VGA          | 30                  | 57                  | 66                                | 69                                                 |  |

## **Table 3. Expected Graphics Performance**

Figure 4 on page 9 shows the GACC operation modes and appropriate clock gating.



Figure 4. i.MX31 Relative Graphics Performance Compared to Other Systems

## 7.1 Maximum Power Management

Running at 66 MHz, the GPU consumes approximately 40 mW. To reduce average power dissipation, the GPU continuously monitors its idle status by examining various internal signals for activity. If these signals indicate that the MBX R-S 3D Graphics Core Sub-module is idle (either 3D or 2D + 3D), then after a count expires (determined by the value programmed into the Startup Time-out Clock Cycle Count Register), the clock gate signals are asserted (driven HIGH) instructing the MBX-wrapper (submodule of i.MX31 Clock Control Module) that the clocks can be safely disabled. Table 4 shows the GPU operation modes and appropriate clock gating.

| Operation Mode        | BusCLK    | MBXCLK    | MBX3DCLK  |
|-----------------------|-----------|-----------|-----------|
| Deep PowerDown        | Gated     | Gated     | Gated     |
| Idle BusCLK operating | Operating | Gated     | Gated     |
| 2D operations only    | Operating | Operating | Gated     |
| 3D operation          | Operating | Operating | Operating |

The clocks are enabled when an instruction is detected by the GACC, the instruction is decoded, and the appropriate clocks enabled based on what data is decoded. A small period of time is then permitted for the

## Software

clocks to settle. After this settling period, a special counter, initially set by the Time-out Clock Cycle Count value, is decremented. This counter is reset when additional activity is detected. If, however, there is no activity and no indication of any instruction requiring any more clocks, at or before the point that the counter terminates, the clocks are again disabled and detection of activity is resumed.

# 8 Software

The software packages available for the GPU include OpenGL ES libraries and optimized drivers, low level graphics libraries, and tools (MGL) provided by ARM. Open GL ES is OS neutral and provides high portability for applications. Other software components will be available on-demand.

# 9 Conclusion

Today's mobile society desires wireless devices that serve as both a business tool and an on-the-go entertainment console. Manufacturers seeking to differentiate their wireless mobile devices demand an applications processor with the capability to deliver the raw processing power needed to provide a rich, eye-popping 3D gaming experience along with the long play times consumers crave. To achieve the desired performance levels, it is necessary to add a Graphics Processing Unit (GPU)—a dedicated 3D graphics accelerator—to the ARM CPU. This white paper presents a compelling case for the highly integrated silicon architecture that is found in the i.MX31 multimedia applications processor. The on-chip 2D/3D graphics support in this powerful multimedia chip delivers graphics performance that can provide consumers the 3D gaming experience they seek. The support allows mobile platforms to provide gaming console capabilities, thus becoming true mobile gaming consoles.

### Conclusion

How to Reach Us:

Home Page: www.freescale.com

E-mail: support@freescale.com

#### USA/Europe or Locations Not Listed:

Freescale Semiconductor Technical Information Center, CH370 1300 N. Alma School Road Chandler, Arizona 85224 +1-800-521-6274 or +1-480-768-2130 support@freescale.com

#### Europe, Middle East, and Africa:

Freescale Halbleiter Deutschland GmbH Technical Information Center Schatzbogen 7 81829 Muenchen, Germany +44 1296 380 456 (English) +46 8 52200080 (English) +49 89 92103 559 (German) +33 1 69 35 48 48 (French) support@freescale.com

#### Japan:

Freescale Semiconductor Japan Ltd. Headquarters ARCO Tower 15F 1-8-1, Shimo-Meguro, Meguro-ku, Tokyo 153-0064, Japan 0120 191014 or +81 3 5437 9125 support.japan@freescale.com

#### Asia/Pacific:

Freescale Semiconductor Hong Kong Ltd. Technical Information Center 2 Dai King Street Tai Po Industrial Estate Tai Po, N.T., Hong Kong +800 2666 8080 support.asia@freescale.com

#### For Literature Requests Only:

Preescale Semiconductor Literature Distribution Center P.O. Box 5405 Denver, Colorado 80217 1-800-521-6274 or 303-675-2140 Fax: 303-675-2150 LDCForFreescaleSemiconductor@hibbertgroup.com

Document Number: IMX31GRAPHICSWP Rev. 0.0 06/2005 Information in this document is provided solely to enable system and software implementers to use Freescale Semiconductor products. There are no express or implied copyright licenses granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information in this document.

Freescale Semiconductor reserves the right to make changes without further notice to any products herein. Freescale Semiconductor makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Freescale Semiconductor assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. "Typical" parameters that may be provided in Freescale Semiconductor data sheets and/or specifications can and do vary in different applications and actual performance may vary over time. All operating parameters, including "Typicals", must be validated for each customer application by customer's technical experts. Freescale Semiconductor does not convey any license under its patent rights nor the rights of others. Freescale Semiconductor products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the Freescale Semiconductor product could create a situation where personal injury or death may occur. Should Buyer purchase or use Freescale Semiconductor products for any such unintended or unauthorized application, Buyer shall indemnify and hold Freescale Semiconductor and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Freescale Semiconductor was negligent regarding the design or manufacture of the part.

Freescale<sup>™</sup> and the Freescale logo are trademarks of Freescale Semiconductor, Inc. ARM and the ARM Powered logo are registered trademarks of ARM Limited. France Telecom – TDF – Groupe des ecoles des telecommunications Turbo codes patents license.

© Freescale Semiconductor, Inc. 2005. All rights reserved.

