I-Class I6500 Multiprocessor Core

The MIPS I6500 multiprocessor core extends the variety and scalability of “off-the-shelf” licensable cores based on the proven and respected MIPS64® architecture, delivering a compelling solution for heterogeneous computing.

This IP core offering provides key features to deliver “heterogeneous inside and out”, many core/multi-cluster scalable processing, and real-time deterministic execution – even when utilizing its support for hardware virtualization – making the I6500 family one of the most scalable, flexible and powerful IP cores in the industry.

The flexibility and scalability makes the I6500 ideal for the growing and varied requirements of heterogeneous computing applications, including advanced driver assistance systems (ADAS) and autonomous driving, high-performance networking, machine learning, drones, industrial automation, security, and video analytics.

The MIPS I6500-F version of the core adds additional features for functional safety.

I6500_diagram

Like the I6400 IP core family before it, the foundation of the I6500 family is a multi-threaded superscalar CPU core which, in a single multi-core cluster, can utilize up to six cores. As the basis for its “heterogeneous inside” capabilities, each core in the cluster can now be individually configured as part of silicon design to optimize and align the performance, area and power of the total solution to application requirements. This includes varying the number of hardware threads, the size of each L1 cache, as well as optional inclusion of a SIMD/FPU processing unit.

Data ScratchPad RAM (SPRAM) per core, up to four AXI ports for low-latency peripherals or cluster-level SPRAM, and inter-thread communications (ITC) support are available as options. These features support both deterministic, low-latency operation and fast path messaging in embedded systems, and implementations of high-performance networking/data processing applications operating as a complement to the standard cached memory system.

The combination of simultaneous multi-threading with hardware virtualization in the I6500 processor enables multiple execution environments to run simultaneously, isolated from each other, with zero context switch overhead.

 I6500-block-diagram

The operating frequency and voltage can also be varied per CPU, enabling dynamic runtime optimization of performance and power consumption, or even silicon design optimization of cores to different performance, area, and power consumption attributes within a single cluster. An example of such a configuration, as well as some of the features described above, can be seen in the figure above.

One of the key capabilities of the I6500 family is the support for multi-cluster and “heterogeneous outside” compute scalability. The primary system interface for an I6500 cluster is ACE compatible. As such, in conjunction with popular third party SoC coherent interconnect/fabric alternatives, the I6500 family can be used in “many core” implementations of up to 64 clusters, or >1500 processing elements (up to four threads/core, up to six cores/cluster, up to 64 clusters). MIPS also provides multi-cluster subsystems as one of the extended licensing options.

I6500-block-diagram

“Heterogeneous outside” features of the I6500 family encompass capabilities for mixing the multi-cluster CPU functionality that is the foundation of this technology, and adding in the ability to work coherently with other processing/compute elements. This includes the ability to support coherent multi-processing with ACE compatible co-processors.

The capabilities of the I6500 family extend even further on the “heterogeneous outside” framework through a unique feature supporting the build of “accelerator only” cluster(s). A single cluster of the I6500 family platform can actually be configured for having up to eight IO coherence units (IOCUs) connected together, with no CPU in the cluster.

Custom-designed or third party functional accelerators can be connected via standard AXI4 interface to these IOCU ports, providing very localized and concentrated compute resources for specific tasks or applications. Such a configuration provides benefits to a cluster of functional accelerators by utilizing a localized, shared low latency L2 cache among the accelerator units. It concentrates the processing and traffic of the accelerators and the CPUs into separate clusters.

In this way there is less competing traffic and bandwidth allocation to the respective L2 cache memories for each processing cluster –all the while maintaining memory coherency between the respective L2 caches. And this capability provides both a standards-based mechanism for creating clusters of accelerators, and more localized, lower latency communication to the CPUs than implementing the accelerators further out on a coherent NoC fabric.

I6500-block-diagram
MIPS I-Class I6500 Series Key Features/Benefits:
  • Heterogeneous Inside: In a single cluster, designers can optimize power consumption with the ability to configure each CPU with different combinations of threads, different cache sizes, different frequencies, and even different voltage levels.
  • Heterogeneous Outside: The latest MIPS Coherence Manager with an AMBA® ACE interface to popular ACE coherent fabric solutions such as those from Arteris and Netspeed lets designers mix on a chip configurations of processing clusters – including PowerVR GPUs – for high system efficiency.
  • Simultaneous Multi-threading (SMT): Based on a superscalar dual issue design implemented across generations of MIPS CPUs, this proven feature enables execution of multiple instructions from multiple threads every clock cycle, providing higher utilization and CPU efficiency.
  • Hardware virtualization (VZ): I6500 builds on the real time hardware virtualization capability pioneered in the MIPS I6400 core. Designers can save costs by safely and securely consolidating multiple CPU cores with a single core, save power where multiple cores are required, and dynamically and deterministically allocate CPU bandwidth per application.
  • SMT + VZ: The combination of SMT with VZ in the I6500 offers “zero context switching” for applications requiring real-time response. This feature, alongside the provision of scratchpad memory, makes the I6500 ideal for applications which require deterministic code execution.
  • Ideal for compute intensive, data processing and networking applications: The I6500 is designed for high-performance/high-efficiency data transfers to localized compute resources with data scratchpad memories per CPU, and features for fast path message/data passing between threads and cores.
  • Trusted: MIPS multi-domain security technology used across its processing families enables isolation of applications in trusted environments, providing a foundation for security by separation.
  • Straightforward software development: The I6500 is based on the mature MIPS ISA which is broadly supported in the development ecosystem by multiple vendors. Customers adopting the I6500 can enjoy a wide choice of compilers, debuggers, operating systems, hypervisors and application software all optimized for the MIPS ISA.
MIPS I-Class I6500 Base Core Features
  • 64-bit MIPS64® Release 6 Instruction Set Architecture
    • Proven, successful, well supported 64-bit architecture
    • Superset of MIPS32 – runs MIPS32 software directly
  • Balanced, 9-stage, dual-issue pipeline with Simultaneous Multi-Threading (SMT)
    • Superscalar on a single thread or two threads simultaneously per cycle
    • Up to four threads per core
    • Instruction bonding – merges sequential integer or floating point loads or stores into one operation for up to 2x increase on memory-intensive data movement routines
  • High-performance dual-issue FPU/SIMD Unit – optional
    • 32 x 128-bit register set, 128-bit loads/stores to/from SIMD unit
    • Native data types:
      • 8-/16-/32-/64-bit integer and fixed point, 16-/32-/64-bit floating point
    • IEEE-754 2008 compliant
  • Full hardware virtualization
    • Provides root and guest privilege levels for kernel and user space
    • Supports multiple guests, with full virtual CPU per guest = guest OSs run unmodified
    • Separate TLBs, COP0 contexts for root and guests –> full isolation, fast context switching, exception and interrupt handling by root
    • Complete SoC virtualization support (IOMMU and interrupt handling – see multi-core features)
  • L1 cache
    • Instruction and Data of 32 KB or 64 KB each with ECC, 4-way set associative
  • Data ScratchPad RAM (D-SPRAM)
    • Up to 1 MB with ECC, for deterministic low latency access and/or high performance data processing and movement outside of standard cached memory hierarchy (e.g. DMA directly into a core’s local D-SPRAM)
  • Programmable Memory Management Unit (MMU)
    • First and second level TLBs with arrays for variable and fixed page size support
MIPS I-Class I6500 Series Multi-Core & Multi-Cluster Features
  • Coherent multi-core and multi-cluster platform, providing extensible implementations in support of both homogeneous and heterogeneous computing applications
    • Flexibility on the mix of cores and I/O coherency unit (IOCU) ports enables compute and throughput optimization to deliver better heterogeneous performance to application needs
    • Support for multi-cluster implementations of up to 64 compute clusters
    • IP available as:
  • Single cluster IP deliverable for use in combination with coherent fabric alternatives (ACE-compatible) for multi-cluster scalability, or
  • Complete multi-cluster sub-system deliverable
  • Guest ID brought out on system i/f for integration into multi-cluster and virtualized SoC designs
  • Advanced debug capabilities – Debug and Trace
    • Debug unit (DBU) supporting JTAG or APB i/f for Coresight™ compatibility
    • Program and Data Trace (PDtrace™), with on-chip or off chip trace buffering
  • Per cluster multi-core system designed for maximum cluster-level bandwidth
    • Coherence Manager (CMv3.5)
      • Extensible to coherent multi-cluster implementations
      • Within a single cluster, supports multi-port configurations of up to:
        • Six cores in a single cluster (plus up to two hardware I/O coherency unit) IOCU ports, or
        • Eight IOCU ports for “clustering” hardware accelerators (even without a CPU core on the same cluster)
      • New directory-based coherency scheme – improves power consumption, performance and scalability
      • High-bandwidth 256-bit internal data paths and external system interface
    • Integrated L2 cache (L2$): 16-way set associative, up to 8MB of memory
      • Dual pipelines for maximizing bandwidth on L1$ misses
      • ECC option on L2$ RAM for higher data reliability
      • Configurable wait states to RAM for optimal L2$ design
      • L2$ hardware pre-fetch for higher throughput and performance
    • Up to four auxiliary AXI ports provide for enabling features such as:
      • Separate path for non-coherent memory transactions
      • Shared access to low latency peripherals
      • Shared access to low latency and deterministic SPRAM (within a cluster, or even across clusters)
    • Inter-Thread Communication (ITC)
      • Fast path, higher efficiency alternative for messaging/data passing between threads within a core or a cluster
    • Global interrupt controller (GIC) with 256-interrupts per cluster
    • Advanced power management
      • Core-level DVFS (dynamic voltage and frequency scaling) – each core can be run at independent clock and voltage level
    • Virtualization support at system and SoC level
      • Up to 31 guest execution environments per cluster
      • IOCUs include I/O MMU; GIC has virtualized interrupts