The complexity of many System-on-Chip (SoC) designs is simply staggering. As an example, this year’s HotChips symposium showcased a variety of new SoC designs for the edge and datacenter that expand our definition of a “big” chip. What are the system designs that require a leap in SoC complexity? It’s not only big datacenter artificial intelligence (AI) chips, but also autonomous vehicles such as cars, trucks and drones; they are self-landing, reusable rockets; they are medical devices carrying out remote diagnostics; and they are connected machine tool controllers supporting smart manufacturing.

These chips are starting to be referred to as “Monster Chips” because of both the size and complexity. Now, let us look at the factors behind the rise of these monster chip designs. The main reason is the exploitation of the Internet connectivity that not only provides big data information but also distributed processing that helps make decisions. These Internet-connected systems need to make some or all decisions on their own by processing more than one trillion operations per second, and this drives new hardware and software innovations as well as dramatically increased complexity.

Acting AG Whitaker

Figure 1: The complexity mandated in automated driving, machine learning, and blockchain processing is leading to a new generation of SoC designs. (Source: Arteris IP)

Yes, there are some applications that simply report information — but these are generally in small volume — representing lower-value markets and generating relatively low-profit margins. To be really valuable, connected systems have to be able to make decisions on their own, and that leads to unique software and hardware challenges.

For a start, to be able to make decisions, these systems require multiple types of sensors and high-performance data processing systems driven by sophisticated software algorithms. To be sure, the hardware is driven by the requirements of the software, but the device needs to support the algorithms that enable the system to interact with the real world.

Yes, you can put a couple of middleware layers between the software and the hardware, but this will sacrifice performance and raise the cost in many mission-critical applications.

Cache Coherency in Monster Chips

The age of the monster chips will lead to systems able to make automated decisions based on sophisticated hardware and software building blocks. Moreover, it will drive further development of IP blocks and EDA design tool technologies required to manage projects of this complexity without exceeding the ability of human designers to execute them within reasonable time and cost.

Take the case of multi-processor cache coherency in monster chips. It's a computer subsystem that is being used in parts of the SoC to simplify programming of secondary processing subsystems such as vision accelerators and other image processors.

The monster chips, in addition to the main CPU subsystem, have a hierarchy of processors for specialized data processing and efficient control. And, the number of processors is increasing in order to boost processing power, requiring support for tens of cache-coherent ports, often running heterogeneous cache-coherent protocols.

Acting AGWhitaker

Figure 2: A view of how cache coherency works in large SoC designs. (Source: Arteris IP)

The use of multiple levels of on-chip caches leads to up to four levels on on-chip cache memories. And the hierarchy of caches is being utilized to improve memory bandwidth as well as minimize latency due to off-chip memory accesses. It's a testament to how high-bandwidth memory interfaces are becoming more common as designers seek to improve memory performance.

Meshed-up Chips

The multiple processing subsystems generate a lot of data that has to be transported throughout the SoC device. Take, for example, the current automotive SoCs in advanced driver-assistance system (ADAS) applications, which can generate more than 20 GB of data per day. Therefore, in monster chips, multi-node mesh sections are utilized for on-chip, deep learning sections to turn data into actionable objects.

Twenty mesh nodes are common today in edge device subsystems, but this number is likely to grow to over one hundred high-end AI applications. For instance, the number of mesh nodes in convolutional neural networks (CNNs) for machine learning is increasing in order to better support both training and inference. Furthermore, the challenge is not going to be just data transformation inside the CNN nodes, but how efficiently you can move data between the nodes.

Acting AGWhitaker

Figure 3: An example of how machine learning and neural network architectures can be implemented in automotive SoCs. (Source: Arteris IP)

The number of power and frequency domains is also increasing to manage power consumption in these monster chips. Next, the monster chip complexity is putting pressure on design productivity. And this complexity demands IP blocks and EDA tools, which combine the ability to manually optimize with the automation that manages non-value-added complexity for the user.

Nervous System of Monster Chips

The SoCs for ADAS and autonomous cars are a classic example of monster chips. The automated vehicle is a software-driven application that may require up to 100 million lines of code to be able to interact with the real world of transportation. And the hardware, aka, automotive SoC, supporting this software has to be high performance, low power, cost-effective, and functionally safe and secure.

Not surprisingly, therefore, all this growth in complexity of both the processing and memory subsystems requires a new generation of interconnect IPs that are capable of facilitating huge data bandwidths, low latency, and efficient power utilization. In other words, the monster chips are putting pressure on the interconnect technology to be the nervous system of the SoC designs.

Yes, the processor is the most critical IP in the system, and memory bandwidth is king, but how an SoC is assembled using advanced interconnect IPs now increasingly determines the viability of SoCs designs. It is the interconnect IPs that have a massive impact on SoC performance, cost, and schedule.