Multicore architecture for compute-intensive app
Some of the hardware that is available—like the Freescale T4240, which has 12 multi-threading cores that schedule 24 threads where two threads share a core, and where three sets of four cores share a 2MB cache, can get very complex. Is it better to run just one OS domain with all cores and threads scheduled from a single OS domain, or is it better to divide up all the compute power into many individual OS domains where they all have control of set of tasks? It really depends on the applications used. Is the application parallel safe, and is it data-intensive? Taking advantage of the shared level 2 cache may make a good boundary.
Other hardware choices include a standard set of CPUs, like that in the Intel Core i7, combined with a built-in GPU. This system can hyper-thread eight threads across four cores on the CPU complex as well as leverage the GPU for general-purpose compute. While supporting the heterogeneous CPU-GPU mix adds to system complexity, it can be worth the trouble if the system can achieve a higher performance for compute-intensive applications.
Once it's understood just how the application can be broken up, then determine what methods and languages are available to build the application. If using multi-OS configurations with either symmetric or asymmetric CPUs, shared memory is typically employed to pass data and messages between the OS domains. While that is not the only method, it generally speaks to passing a command with some data to an operating system domain, where an interrupt will process the message and thread it accordingly. But what APIs can be used?
There are several to choose from. The Multicore Association maintains the MCAPI (Multicore Communication API), which is designed specifically for the multi-OS paradigm. MCAPI (figure) can build on top of an adjacent specification, MRAPI (Multicore Resource API), which provides the low layer shared memory as a resource between multiple OS domains.
Figure: MCAPI is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation.
Other choices for this type of architecture include using a similar set of API's that are proprietary in nature. Whatever is easy to configure and maintain for the long run may be the best for implementation. One important attribute is the overhead of such an interface. These cores typically share memory, which is much faster than across an Ethernet. If one of the reasons to divide up the application into several OS domains is to prevent cache thrashing (a process where each thread of execution is fighting for the same cache line to read and write data to and from it), then an efficient implementation is necessary.
|Related Articles||Editor's Choice|