P-Class P5600 Multiprocessor Core

The MIPS P5600 delivers industry-leading 32-bit performance with class-leading low power characteristics in a silicon footprint significantly smaller than comparable alternatives in the industry.

The P5600 CPU core was designed for the performance and features required for mainstream consumer electronics including connected TVs and set-top boxes, and the rich, broad feature set extends its applicability into a variety of networking applications from residential gateways to network appliances, as well as high-performance compute in embedded applications.

The MIPS P5600 CPU is based on a wide issue, deeply out-of-order (OoO) implementation of the MIPS32 architecture, supporting up to six cores in a single cluster with high performance cache coherency. Complementing its raw horsepower, this core also includes 128-bit integer and floating point SIMD processing, hardware virtualization, and physical and virtual addressing capability enhancements.

Power Management

The MIPS P5600 processor IP core delivers top line performance while being the most efficient CPU core in its class, making it ideal for a wide range of applications in the rapidly growing connected consumer electronics market.

The P5600 builds upon the existing proAptiv family microarchitecture, adding 128-bit SIMD, hardware virtualization with hardware table walk, 40-bit eXtended Physical Addressing (XPA), and substantial gains in performance on system-oriented software workloads.

The P5600 CPU also achieves 5.4 CoreMark/MHz per core, and 3.5 DMIPS/MHz, matching or exceeding other high-end CPU IP cores.

The P5600 processor delivers this performance in a much smaller silicon footprint than leading IP core alternatives, achieving these results in up to 30% smaller silicon area, given a common process geometry, similar configurations and synthesis techniques used. SoC designers can use this efficiency advantage for significant cost and power savings, or to implement additional cores to deliver a performance advantage against competing silicon.

P5600 Benefits
  • 128-bit SIMD – accelerates execution of audio, video, graphics, imaging, speech and other DSP-oriented software algorithms, with instruction set designed for development in high level languages such as C, OpenCL
  • MIPS multi-domain security technology based on hardware virtualization – ensuring that applications that need to be secure are effectively and reliably isolated from each other, as well as protected from non-secure applications
  • Advanced addressing extensions for Enhanced Virtual Address (EVA) and eXtended Physical Address (XPA)
    • EVA enables 3GB+ Linux (and similar OS) implementations without use and overhead of HIGHMEM
    • XPA extends physical addressing up to 1 Terabyte (40-bits)
  • Multiple context security platform for enterprise/consumer partitioning, secure content access, payments/transactions, and isolating secure schemes from numerous content sources
  • Sophisticated branch prediction for maximizing utilization and performance on deeply pipelined CPU
  • Load/Store bonding for optimum data movement performance
  • Industry leading benchmark and real world performance at smaller area and power than competing solutions
  • Broad software and ecosystem support and mature toolchain
  • Available as synthesizable IP, for implementation in any process node, with standard cells and memories
Base Core Features
  • 32-bit MIPS32® Release 5 Instruction Set Architecture
  • High-performance, 16-stage, wide issue, out-of-order (OoO) pipeline
    • Quad instruction fetch per cycle
    • Triple bonded dispatch per cycle
    • Instruction peak issue of 4 integer and 2 SIMD operations per cycle
    • Sophisticated branch prediction scheme, plus L0/L1/L2 branch target buffers (BTBs), Return Prediction Stack (RPS), Jump Register Cache (JRC)
    • Instruction bonding – merges two 32-bit integer accesses into one 64-bit access, or two 64-bit floating point accesses into one 128-bit access for up to 2x increase on memory-intensive data movement routines
  • L1 cache size for Instruction and Data of 32KB or 64KB each, 4-way set associative
  • New high-performance dual-issue 128-bit SIMD Unit – optional
    • 32 x 128-bit register set, 128-bit loads/stores to/from SIMD unit
    • Native data types:
      • 8-/16-/32-bit integer and fixed point, 16-/32-/64-bit floating point
    • IEEE-754 2008 compliant
    • Runs at full speed with CPU core
  • Full hardware virtualization
    • Provides root and guest privilege levels for kernel and user space
    • Supports multiple guests, with full virtual CPU per guest = guest OSs run unmodified
    • Separate TLBs, COP0 contexts for root and guests –> full isolation, fast context switching, exception and interrupt handling by root
    • HW table walk support in TLB for optimal performance
    • Complete SoC virtualization support (IOMMU and interrupt handling – see multi-core features)
  • Programmable Memory Management Unit (MMU)
    • Enhanced Virtual Address (EVA) – Programmable kernel and user segment sizese
    • eXtended Physical Address (XPA) – Provides extension to 40-bits of physical address bits (1 TB)
    • 1st level micro TLBs (uTLBs) – 16 entry instruction TLB, 32 entry data TLB
    • 2nd level TLBs – simultaneous access, variable and fixed page sizes
      • 64×2 entry VTLB, 512×2 entry 4-way set associative FTLB
    • Hardware table walk for fast page refills
  • Power Management Features
    • Multi-core cluster power controller (CPC):
      • Register-based, visible to/controllable by operating system
      • Per CPU voltage domain gating; per CPU clock gating
      • Cluster level DVFS capable
    • Core level
      • Course and fine-grained clock gating throughout core
      • Way prediction on data and instruction L1 caches
      • Instruction and register-based sleep modes
  • EJTAG/PDtrace debug blocks and interface
Coherent Multi-Core Processor Features
  • Superscalar, deeply OoO multi-core processor
  • Advanced debug capabilities – PDtrace subsystem allows visibility to core- and cluster-level trace information
  • Complete multi-core system designed for maximum cluster-level bandwidth
    • Coherence manager- – supports multi-core configurations up to six cores in a single cluster
    • High-bandwidth 256-bit internal data paths and external system interface
    • Integrated L2 cache (L2$): 4-way set associative, up to 8MB of memory
      • ECC option on L2$ RAM for higher data reliability
      • Configurable wait states to RAM for optimal L2$ design
      • L2$ hardware pre-fetch for higher throughput and performance
    • Up to two IO Coherence Units (IOCU) per coherent processing system
    • Cluster Power Controller (CPC) for voltage/clock gating per-CPU
    • 256-interrupt Global Interrupt Controller (GIC)
    • Virtualization support at system level – IOCUs have IO MMU, and GIC has virtualized interrupts


Target TSMC 28HPM
Frequency 1 GHz – 2+ GHz*
CoreMark/MHz (per core) >5
Total CoreMark @ 1.5GHz >7500 per core
DMIPS/MHz (per core) 3.5
Total DMIPS @ 1.5GHz >5250 per core

Frequencies indicated are based on pre-production P5600 RTL and compared with results for fully floorplanned dual core proAptivimplementation, and range from 12T SVt area-optimized in worst case silicon corner, to 12T MVt speed-optimized typical corner silicon. Final production RTL results may vary.

Each base core configuration:
  • 32KB Data/Inst L1 caches with parity, BIST
  • New high-speed Integer + Floating Point (SP and DP) SIMD unit
  • Fully-featured MMU, using multi-level TLB (I/D uTLBs + 128 entry VTLB + 1024 entry FTLB)
  • PDtrace™ debug
Multi-core cluster configuration:
  • Dual fully-configured P5600 cores per above
  • Coherence Manager + integrated 1MB L2$ w/ECC
  • One hardware IO Coherence Unit (IOCU) port
  • Cluster level PDtrace
Implementation libraries/parameters – speed optimized, based on:
  • TSMC 28HPM 12T standard cells + Synopsys memories
  • Worst case, slow-slow corner silicon (zero temp, WCZ) with 10% OCV + 25ps clock jitter margins, except where noted at typical silicon