T-106.5800 Seminar on Software Techniques Seminar on Multicore Programming Multicore Technology in Mobile Devices Antti P Miettinen antti.p.miettinen@nokia.com February 12, 2009 Abstract power budget of roughly three Watts for a hand- held device remains a valid rule because of thermal Multicore design is ubiquitous among mobile hand- concerns [1]. held devices. A general purpose processor coupled with a digital signal processor has been the con- figuration for even the most basic mobile phones. Energy efficiency concerns have steered the designs towards increasing integration and heterogeneous setups. Typical components in a mobile applica- tion processor are ARM and DSP cores, various hardware acceleration blocks and a set of memory and peripheral interfaces. Limited energy and power have been critical con- straints for mobile device design and trends are pointing towards these challenges becoming contin- ually more demanding. Increasing parallelism in- side the different subsystems is one way to achieve better energy efficiency. Even though heterogeneity is likely to persist in the overall structure of mobile Figure 1: Clock speed per power for a collection of devices, also symmetrically parallel subsystems are ARM processor cores. probably going to be employed in future. Even though parallel hardware is often viewed as a challenge, it is also an opportunity for mo- 1 Introduction bile devices because of the better energy efficiency of parallel processing when compared to sequential Energy efficiency is a central theme in the design designs of comparable performance. Figure 1 shows of mobile hand-held devices. The increasing use of an overview of data collected from ARM pub- always on-line applications, multimedia, high speed lic web pages about various ARM processor cores wireless networking, large displays, etc. are making with polynomial extrapolation curves for cores with the challenge continually more demanding. Also, three data points. As can be seen, the clock speed software trends towards increasing use of e.g., web achievable with given power budget is at least applications and dynamic programming languages an order of magnitude higher for small low per- are making the optimization of energy efficiency formance cores than for bigger high performance ever more important. Additionally, even if the cores. Even though larger cores have the poten- available energy would not be a limiting factor, the tial to perform more work within one cycle, this 1
advantage is often diluted by the fact that the per- functionally quite similar to OMAP3430 which is formance of modern software tends to be limited targeted for mobile hand-held devices and is avail- by memory effects. able only to high volume customers. The structure of the OMAP3530 processor is quite representative of the overall design of a mod- 2 Anatomy of a mobile device ern mobile application processor. The main ap- plication core in OMAP3530 is an ARM Cortex- Open discussion about the design and implementa- A8 processor with 16K first level instruction and tion of mobile devices is challenging because of the data caches and 256K unified second level cache. traditionally closed nature of e.g. mobile phones. The ARM core is connected to level three inter- Fortunately, the public Internet contains quite in- connect together with quite extensive set of other teresting information about many mobile devices subsystems. For imaging, video and audio process- and the increasing use of open source operating sys- ing the OMAP3530 contains an TMS320DM64x+ tems allows deducing hardware features from kernel digital signal processor. For 2D and 3D graphics, device driver configuration. a PowerVR SGX mobile graphics processing unit As an example of a contemporary mobile hand- is provided. Dedicated interface blocks are pro- held device, we can take a look at the components vided for cameras, displays and USB. For connect- of the Nokia N95 as described in [2]. The overall ing memories, an SDRAM controller and general structure of the device is a dual chip design where purpose memory controller are included. For pe- the two main processing engines are the application ripherals with more modest throughput and latency processor (Texas Instruments OMAP2420) and cel- requirements, there is a fourth level interconnect lular modem. Connected to these main components with e.g., UARTs, general purpose I/O interfaces, are NOR and NAND flash memories, DRAM mem- timers and memory card interfaces. ories, energy and power management chips and While the main application core of a mobile ap- different peripherals, e.g., camera modules, Blue- plication processor is more or less always based on tooth, accelerometer, WiFi, audio, infrared, dis- ARM architecture there is considerable variation play, USB and memory card interfaces. in the DSPs employed by different vendors. TI This kind of design is typical for many versatile has their own line of the TMS320 family of DSPs mobile handsets. The two major subsystems re- whereas Freescale uses their StarCore DSPs and quiring high processing performance are the appli- STMicroelectronics has their MMDSP family, etc. cation subsystem and the cellular modem and this (see e.g., [4] for an overview of DSP vendors). DSP is often reflected by a dual chip design. Increas- cores appear also often inside imaging, video and ing integration has also enabled employing single audio subsystems coupled with hardware accelera- chip designs, where the cellular modem and the ap- tor blocks. plication subsystems are deployed within the same Many subsystems within a mobile application physical hardware. Single chips designs are com- processors use commercial intellectual property mon especially for highest volume devices where blocks. The PowerVR MBX and SGX are good cost optimization is the overriding design concern. examples of popular mobile GPUs. The use of com- mercial IP blocks applies also to e.g., the employed interconnects. For example, the L3 and L4 inter- 3 Design of mobile application connects inside the OMAP3530 are instantiations processors of Sonics interconnects from Sonics, Inc. For differ- entiation purposes, vendors also include their own As is the case with device design, discussion about IP blocks into their designs. mobile application processor details is hampered by the closed nature of the industry. However, for 4 The ARM architecture example Texas Instruments is occasionally provid- ing two variants of their OMAP processors. The OMAP3530 [3] is a catalog part available to any- ARM Ltd is a fabless semiconductor company, i.e. one and has public documentation while appearing it is not a chip manufacturer. Instead, it provides 2
Recommend
More recommend