I-Class-I7200-Multiprocessor-Core

CPU IP Designed for the Next Generation of High Performance Wireless Communications and Networking

In the wireless world, 5G promises to increase data bandwidth by an order of magnitude or more over existing LTE designs. There are many techniques and technologies being applied to meet this challenge, but most are based on increasing the parallelism in the system to achieve the higher total bandwidth goals on the network.

Higher data rates and parallel processing are not unique to just the LTE and 5G market and the modems in the products that support these communication goals. With limited future benefit for frequency scaling from advances in process technology (Moore’s Law broken), many applications in the broader communications and networking markets take advantage of parallel processing to scale to the challenge of increasing data rates.

MIPS Multi-Threaded Multi-Processor IP Core

In anticipation of these market trends and needs, the MIPS I7200 processor core provides highly efficient, scalable, parallel processing performance, designed upon a foundation of hardware multi-threading and multi-core cluster CPU technologies.

MIPS introduced its first multi-threaded CPU in 2006, and extended that multi-threaded multi-core processing in 2008. Building on over a decade of expertise, the MIPS I7200 is the latest generation in a popular line of performant and efficient IP cores utilizing these technologies. It offers a substantial step forward in performance over the previous generation interAptiv™ series, delivering gains in performance of ~50% over a variety of popular benchmarks, but achieves this in only a 20% increase in core size. The increased performance comes from the I7200 being a dual issue superscalar design, enabling support for vertical multi-threading – dual issue on a thread each clock cycle, and an ability to context switch between threads from cycle to cycle.

MIPS-MT-I7200-VMT

In addition to higher performance, the multi-threading support can be used to provide very low latency response to high priority and real time events. The underlying support for zero overhead context switching between threads, hardware supported priority scheduling, and ability to allocate threads to events and suspend them until a high priority event occurs provides the foundation for very low latency response.

The MIPS I7200 is also the first core in the MIPS lineup to implement the nanoMIPS™ instruction set architecture (ISA), a new version of MIPS ISA designed to deliver best in class small code size, but without sacrificing the high performance required in today’s applications. When compiled for performance (-O3 compile flag), nanoMIPS can achieve ~ 40% smaller code size than standard MIPS32.

Key Features/Benefits
  • Dual-issue superscalar design with Vertical Multi-Threading (VMT): 50% performance gain on variety of benchmarks in only 20% increase in core area. Highly efficient area and power optimized design.
  • Real-time, low latency response for high priority events: Zero cycle context switching, instruction queues per hardware thread, hardware prioritized thread scheduling, and deterministic execution features such as ScratchPad RAM (SPRAM), simple direct-mapped memory access with memory protection provide the foundation for very low latency response in real time systems.
  • nanoMIPS™ small code size ISA: Achieves best in class small code size while delivering high performance. When compiling code for performance (-O3 flag, gcc compiler), can provide up to 40% reduction in code size of MIPS32 without sacrificing performance.
  • Multi-threaded multi-core processing: Highly scalable and parallel processing platform to the particular requirements for application. The IP core is customer configurable for # of threads per core and # of cores in the cluster during silicon design. And at run-time, standard SMP operating systems (RTOSs or Linux) can utilize threads and cores as current software workload requires.
  • Core features optimizable for Linux or RTOS-based software development: Configurable options for features such as memory management, use of caches and/or ScratchPad RAM memories, etc.
I7200-core-block-diagram
MIPS I7200 Base Core Features
  • 32-bit nanoMIPS™  Instruction Set Architecture
    • Variable length instruction set architecture optimized to deliver best in class small code size at high performance
    • Includes MIPS DSP ASE – optimized instruction set extensions for integer DSP and 32-bit SIMD operations
  • Balanced, 9-stage, dual-issue pipeline with Vertical Multi-Threading (VMT)
    • Superscalar on a single thread per cycle
    • Zero overhead context switching – can switch threads every clock cycle
    • Implements MIPS MT ASE – can implement up to 3 fully OS visible Virtual Processor Elements (VPEs) per core, and up to 9 lightweight thread contexts (TCs) per core, assignable to the VPEs
  • Configurable memory subsystem
    • Support for caches, tightly coupled ScratchPad RAM (SPRAM), or both
    • L1 caches – 4-way set associative
    • 0-128KB each of instruction and data cache with MPU
  • 0-64KB each with full TLB-based MMU
  • ScratchPad RAM (SPRAM)
    • Deterministic low latency instruction/data access and/or high performance data processing and movement outside of standard cached memory hierarchy (e.g. DMA directly into a core’s local D-SPRAM via 128b AXI-4 interface)
    • 0-1MB each for instructions, data, or unified SPRAM implementations, with Unified being sharable across CPUs in cluster
  • Configurable memory management
    • Full TLB-based Memory Management Unit (MMU) for Linux and other virtual memory based operating systems
    • Simpler, deterministic direct memory access with (up to) 32 region Memory Protection Unit (MPU) for use with Real Time Operating Systems (RTOSs) and bare metal programming
I7200-cluster-block-diagram
MIPS I7200 Multi-Core Cluster Features
  • Coherent multi-threaded multi-core platform IP – includes all elements to build a complete coherent multi-core system
  • Coherence Manager (CM)
    • Support for up to 4 cores and up to 2 hardware I/O coherency units (IOCUs)
    • Integrated L2 cache (L2$): 8-way set associative
      • up to 8MB of memory
      • Configurable wait states to RAM for optimal L2$ design
  • Inter-Thread communication Unit (ITU)
    • Fast path, higher efficiency alternative for messaging/data passing between threads within a core or a cluster
  • Global interrupt controller (GIC) with up to 256-interrupts
  • Cluster power controller (CPC) for advanced power management
    • Clock and voltage gating per CPU in multi-core cluster
  • Advanced debug capabilities – Debug and Trace
    • Aggregated debug support from all elements of cluster in MIPS Debug Hub (MDH)
    • Supports APB i/f for integration into Coresight™ compatible system debug frameworks
  • Native AXI-4 external interfaces
    • Dual AXI-4 main system interface for separate cached (up to 256b) and uncached accesses (128b)
    • 128b interface for each IOCU port
    • 64b interface for each Memory Mapped low latency I/O (MMIO) port

Benchmarks and Specifications

Per Core
1 Thread 2 Threads
CoreMark/MHz 4.6 5.9
DMIPS/MHz 2.3 2.65
TSMC 28HPM
Area from 0.27 mm2 Fully floorplanned, includes logic and 32K/32K L1$s
TSMC 16FF+
Frequency 1.7 GHz (worst case), 2.1 GHz (typical)
Area 0.27 mm2 @ 1.7 GHz implementation
Power 150 mW (dyn) @ 1.7 GHz
CoreMark >40,000 @ 1.7 GHz

worst case = SS corner silicon, V = Vnom – 10%, T = 0C
typical = TT silicon, V = Vnom, T = 85C