Cloud computing as a technology trend has caught up fast in recent years by providing more and more services to end users in a reliable and efficient manner. In this context, RAPID proposes the development of an efficient heterogeneous CPU-GPU cloud computing infrastructure, which can be used to seamlessly offload CPU-based and GPU-based (using OpenCL API) tasks of applications running on low-power devices such as smartphones, notebooks, tablets, portable/wearable devices, robots, and cars to more powerful devices over a heterogeneous network (HetNet). In addition, RAPID proposes a secure unified model where almost any device can operate as an accelerated entity and/or as an accelerator serving other less powerful devices. Finally, a RAPID device can probe a Directory Server, which includes information for the accelerators, in order to automatically find and connect to the appropriate accelerators.
Embedded systems are becoming more and more powerful. For example, modern smartphones are equipped with multi-core processors and GPUs integrated in a single chip. On the other hand, mobile applications are becoming more and more performance and power hungry pushing the boundaries of devices’ capabilities to the limits. The problem is becoming even more noticeable when such applications are executed on older devices, which are not equipped with state-of-the-art technology. Cloud computing is becoming an attractive solution to this challenge, as it continues to develop, supplying high-performance resources at low cost. Moreover, mobile connectivity continues to improve, enabling access to cloud services with relative low latencies and at high throughputs. However, the overhead in energy and response time involved in transmitting the migrated data and code via wireless networks may be greater than the offloading savings, and thus a judicious decision must be made on whether and which computation tasks to offload.
The RAPID acceleration mechanism mainly consists of the following three entities:
- Acceleration Client: This is a runtime library, which is employed by a locally accelerated application in order to find nearby accelerators, decide whether the tasks defined by the application developer should be executed locally or offloaded remotely.
The library supports offloading of Android/Java methods and native C/C++ functions embedded in Android/Java applications. Moreover, thanks to RAPID developers can now implement Android and Java applications with CUDA support. For more information have a look at the RAPID Project on GitHub.
- Acceleration Compiler: The application developer specifies the tasks that may be executed remotely using a simple code annotation. The Acceleration Compiler will offer a source-to-source compilation platform in order to transform the application code into an equivalent code that uses the runtime functions of the Acceleration Client. This compiler essentially bridges the gap between the RAPID programming model and the Acceleration Client.
- Acceleration Server: This software receives tasks from Acceleration Clients and other Acceleration Servers, executes them, and returns the results. Two versions of the Acceleration Server are defined: the Plain and the Enhanced. The Plain Acceleration Server executes all incoming tasks locally. The Enhanced Acceleration Server uses the Acceleration Client library in order to decide for each incoming task, if it should be executed locally or forwarded to another Acceleration Server.
Several flavors of each of the aforementioned software entities should be developed depending on the target Entity. For example, an Acceleration Server running on a public cloud will use more complex task scheduling algorithms than an Acceleration Server running on a smartphone or a PC.
The figure below shows a complex scenario where we have a chain of Acceleration Servers. Certain tasks of an accelerated application running on the Google glasses can be offloaded to the Acceleration Server on the smartphone, which in its turn will execute some of them locally and forward the rest of them to the laptop or to the private cloud. If the battery of the laptop is low or the tasks require a lot of resources, the Acceleration Server on the smartphone may decide to send them all to the private cloud. On the other hand, the smartphone may decide that it will waste a lot of energy to offload the tasks to the private cloud (especially if it uses 3G for connectivity, for example) and thus it may execute them locally. In the demonstrated example, the smartphone also runs another accelerated application.
Main Components of Acceleration Client
- A Design Space Exploration engine, which decides at execution time, whether a task should be executed locally or offloaded remotely and selects the appropriate accelerator if more than one are available. In order to provide a fast, accurate, and low-power solution we plan to use profilers, which collect several device (CPU status, network status, battery level, etc.), program (execution time, memory required, I/O involved, etc.) and energy data and feed them into an energy estimation model.
- The Dispatch and Fetch engines in order to transfer the data (or state) and the executable code (an object for object-oriented languages) from a mobile device to an Acceleration Server and the results (or state) from an Acceleration Server to a mobile device. Since many mobile devices are ARM-based while cloud execution tends to be on x86 hosts, using Java bytecode which is platform independent and just-in-time compilation (JIT) along with Java reflection to automatically identify the remoteable methods, may provide an efficient way to remotely execute a Java method. Further alternatives will be evaluated in the first part of the project.
Main Components of Acceleration Server
- A task scheduler , which is responsible for scheduling the incoming tasks and distributing the cloud resources to the applications running on the mobile devices. RAPID will use virtualization in order to efficiently share the resources of the accelerator. The scheduler may use multiple virtual GPUs and virtual CPUs in order to execute a single task in parallel.
- An execution environment , which executes the scheduled tasks. In a cloud infrastructure, the task scheduler may use multiple heterogeneous VMs in parallel for executing a single task or multiple tasks in parallel. In order to support virtual GPUs, we plan to use NVIDIA hardware-assisted virtualization technology in public and private cloud-based accelerators, and GVirtuS, a software virtual GPU driver developed by UNP.