Apple's iPad 2 Review: Tom's Goes Down The Tablet Rabbit Hole

Processor Performance: Now Dual-Core Flavored

Apple uses what’s referred to as a system-on-a-chip (SoC) in its mobile devices like the iPad and iPhone. In this particular implementation the SoC includes the processor core (or cores), graphics processing, and RAM in a package-on-package. Because those components sit next to each other in the same package, data transfers are achieved more efficiently. Moreover, less PCB space is consumed, since more functionality lives in one on-board component.

The influence of an SoC isn't all positive, however. A heavily integrated IC still has specific physical and thermal constraints, so the SoC's comprising subsystems aren't as potent as they might be if they were discrete.

Intel's Sandy Bridge architecture is a good example. The company simultaneously improved platform performance, while trimming power versus its previous-generation design. However, keeping processing, memory control, cache, and graphics in the same 95 W thermal window required concessions. The HD Graphics engine is perhaps the clearest indicator that Intel was working with a very specific transistor budget. Though the company's engineers created an engine deemed "good enough" for many desktop workloads, discrete graphics cards like AMD's Radeon HD 6970 and Nvidia's GeForce GTX 580 demonstrate how much more flexibility there is without the considerations afforded to more integrated solutions.

Apple A4 (iPad)
Apple A5 (iPad 2)
1 GHz ARM Cortex-A8 (single-core)
1 GHz ARM Cortex-A9 (dual-core)
256 MB LP-DDR (single-channel?)
512 MB LP-DDR2 (dual-channel)
PowerVR SGX535 (single-core)PowerVR SGX545MP2 (dual-core)
L1 Cache
32 KB / 32 KB
32 KB / 32 KB
L2 Cache640 KB
1 MB

The iPad 2 features Apple's newest SoC, the A5, which is completely different from the A4 in its iPad. Let's start with what changes in the CPU.

ARM Cortex-A8
ARM Cortex-A9
Package Size
198.8 mm2238.8 mm2
Issue Width
Out-of-Order Execution
Execution Pipeline Depth
Processing Power
2.0 DMIPS/MHz/Core2.5 DMIPS/MHz/Core

Instead of the iPad’s ARM Cortex-A8, the iPad 2 uses a dual-core ARM Cortex-A9 with a total of 1 MB L2 cache. At the architectural level, the major difference is out-of-order execution. This is regarded to be a higher-performance approach than in-order execution, which executes instructions based on the order they appear. An out-of-order design addresses instructions based on the availability of input data, thereby preventing the pipeline from spinning idly as data is retrieved.

If you want to draw a summertime analogy, consider the process of preparing a glass of ice water. You could choose to put the ice in the cup before you get the water, or you might fill the cup with water before getting the ice. The quickest task depends on where you are in relation to the refrigerator and the faucet. Out-of-order execution pipelines operate similarly.

The problem is that out-of-order execution requires a lot more transistors to implement, consuming additional die space and energy. That's one reason why Intel's small, power-efficient Atom architecture employs in-order execution. The benefit, however, is improved performance, as fewer CPU cycles are wasted. The fact that Apple moved to out-of-order execution is indicative of its emphasis on augmenting the iPad 2's performance.

According to analysis done by Chipworks, Apple also couples its dual-core ARM Corex-A9 with 512 MB of LP-DDR2 (low-power DDR2). The original iPad only used 256 MB of LP-DDR. So, not only do we have two times more memory, we have it delivered through a more modern memory technology (DDR versus DDR2).

CPU Performance

Geekbench is a synthetic benchmark similar to SiSoftware's Sandra, and it's one of the few available benchmarks available for iOS. The best part about Geekbench, however, is that it's offered on multiple platforms. That means we can use it to make apples to apples comparisons against low-power x86-based devices like netbooks.

Geekbench v.2
Score in Points, Higher is Better
Apple iPadApple iPad 2Dell Mini 1012
(Atom N450)
Floating Point

Single-threaded floating point and integer performance is much stronger on the iPad 2 than its predecessor. On average, performance nearly doubles.

The Cortex-A9 demonstrates a large lead in single-threaded scenarios due to its updated execution pipeline. Threaded performance often sees an even larger boost, as the architecture's advantages are multiplied by the increased parallelism enabled by a second core.

Clearly, we've come a long way since the original iPad debuted. But tablets fall very short of netbook-class performance. Intel's old Atom N450 still manages to outclass even Apple's latest hardware.

Geekbench v2 (detailed results)
Apple iPadApple iPad 2Dell Mini 1012
Integer Section
Blowfish (single-threaded scalar)13.6 MB/s13.2 MB/s26.2 MB/s
Blowfish (multi-threaded scalar)14.3 MB/s26.0 MB/s41.5 MB/s
Text Compress (single-threaded scalar)1.25 MB/s1.49 MB/s2.49 MB/s
Text Compress (multi-threaded scalar)1.20 MB/s2.79 MB/s3.60 MB/s
Text Decompress (single-threaded scalar)1.13 MB/s2.07 MB/s3.22 MB/s
Text Decompress (multi-threaded scalar)1.09 MB/s3.24 MB/s4.86 MB/s
Image Compress (single-threaded scalar)3.26 Mpixels/s3.77 Mpixels/s6.00 Mpixels/s
Image Compress (multi-threaded scalar)3.38 Mpixels/s7.42 Mpixels/s8.81 Mpixels/s
Image Decompress (single-threaded scalar)6.12 Mpixels/s6.66 Mpixels/s9.98 Mpixels/s
Image Decompress (multi-threaded scalar)6.04 Mpixels/s12.8 Mpixels/s15.0 Mpixels/s
Lua (single-threaded scalar)173.5 Knodes/s272.6 Knodes/s340.4 Knodes/s
Lua (multi-threaded scalar)172.9 Knodes/s535.0 Knodes/s488.4 Knodes/s
Floating Point Section
Mandelbot (single-threaded scalar)79.9 MFLOPS278.8 MFLOPS339.6 MFLOPS
Mandelbot (multi-threaded scalar)79.4 MFLOPS549.0 MFLOPS613.2 MFLOPS
Dot Product (single-threaded scalar)247.5 MFLOPS221.3 MFLOPS204.9 MFLOPS
Dot Product (multi-threaded scalar)246.2 MFLOPS435.5 MFLOPS361.5 MFLOPS
LU Decompression (single-threaded scalar)50.5 MFLOPS207.3 MFLOPS309.7 MFLOPS
LU Decompression (multi-threaded scalar)54.7 MFLOPS403.4 MFLOPS534.0 MFLOPS
Primality Test (single-threaded scalar)71.4 MFLOPS176.6 MFLOPS126.7 MFLOPS
Primality Test (multi-threaded scalar)69.2 MFLOPS316.8 MFLOPS194.5 MFLOPS
Sharpen Image (single-threaded scalar)1.51 Mpixels/s1.68 Mpixels/s482.1 Kpixels/s
Sharpen Image (multi-threaded scalar)1.52 Mpixels/s3.32 Mpixels/s858.9 Kpixels/s
Blur Image (single-threaded scalar)762.2 Kpixels/s664.4 Kpixels/s535.6 Kpixels/s
Blur Image (multi-threaded scalar)762.0 Kpixels/s1.31 Mpixels/s941.5 Kpixels/s

The write sequential and sfdlib write memory tests in Geekbench confirm better RAM performance, but it's difficult to separate how much of this is due to memory technology and how much is attributable to the processor. At the end of the day, it really doesn't matter; what does is that throughput goes up.

Intel's Atom N450 still manages to remain top dog, despite it's 64-bit single-channel interface. The Atom only falls behind in the sfdlib allocate and write tests. However, the N450's 1.97 GB/s score in read sequential is about 6x higher than what we see in the iPad 2.

Geekbench v2 (detailed results)
Apple iPadApple iPad 2Dell Mini 1012
Memory Score
Read Sequential (single-threaded scalar)306 MB/s342.2 MB/s1.97 GB/s
Write Sequential (single-threaded scalar)849.1 MB/s1.02 GB/s1.32 GB/s
Sfdlib Allocate (single-threaded scalar)1.99 Mallocs/s1.83 Mallocs/s1.25 Mallocs/s
Sfdlib Write (single-threaded scalar)1.28 GB/s2.57 GB/s1.34 GB/s
Sfdlib Copy (single-threaded scalar)830.4 MB/s474.8 MB/s1.03 GB/s
Stream Score
Stream Copy (single-threaded scalar)465.5 MB/s449.9 MB/s1.18 GB/s
Stream Scale (single-threaded scalar)320.5 MB/s372.5 MB/s1.08 GB/s
Stream Add (single-threaded scalar)655.9 MB/s606.3 MB/s1.41 GB/s
Stream Triad (single-threaded scalar)427.4 MB/s426.6 MB/s1.11 GB/s
Create a new thread in the UK Article comments forum about this subject
This thread is closed for comments
1 comment
Comment from the forums
    Your comment
  • Anonymous
    Hey can your team please update the hardware assumptions made for 3gs? In that you mentioned 3gs is using SGX520 and only one USSE pipe! Please update the details - Now it just makes the data unusable and inconsistent!