Intel ‘Lunar Lake’ launched with LionCove, Skymont CPU Cores, Next Gen Xe2 GPU, and NPU 4
During Computex 2024, Pat Gelsinger, Intel's CEO, provided insights into the Lunar Lake client computing processor, emphasizing advancements in architecture, performance, and efficiency.
Intel’s Lunar Lake Architecture -
The Lunar Lake System-on-Chip (SOC) comprises seven main components, commencing with the interposer package, housing memory, a stiffener, and a Base Tile. Employing Foveros interconnect, the Base Tile integrates the compute tile and Platform Controller Tile, distinguishing itself from Meteor Lake by utilizing fewer tiles to boost efficiency and lower latency.
The compute tile is crafted on TSMC’s N3B process, while the Platform Controller Tile adopts the TSMC N6 process. Lunar Lake presents on-package memory in 16 GB and 32 GB LPDDR5X configurations, with speeds reaching up to 8533 MT/s per chip. This memory configuration supports a 16b x4 channel, diminishing PHY power by 40% and saving 250 mm² in area compared to conventional PCB designs.
The chip embraces an 8-core hybrid design, comprising four P-Cores and four E-Cores, bolstered by a novel Thread Director. P-Cores offer 2.5 MB of L2 cache each and up to 12 MB of shared L3 cache, while E-Cores furnish 4 MB of shared L2 cache and double the vector and AI throughput. The Xe2 GPU in Lunar Lake sports 8 Xe cores, 8 Ray Tracing Units, XMX support, and an 8 MB dedicated cache, culminating in a total of 120 TOPS, with contributions from the NPU, GPU, and CPU.
Intel Lion Cove P-Core Architecture -
The architecture of Intel's P-Core, Lion Cove, within Lunar Lake is optimized for both high performance and efficiency. With its innovative microarchitecture, Lion Cove delivers a notable 14% increase in IPC (Instructions Per Cycle) compared to Redwood Cove cores, resulting in enhanced single-threaded performance. Moreover, it achieves a 15% improvement in performance per watt and a 10% boost in performance per area.
Notable features of Lion Cove include an 8-wide allocation/rename unit, a 12-wide retirement unit, a sizable 576-deep instruction window, and 18 execution ports. The memory subsystem is structured with a 3-level cache hierarchy, comprising a 48 KB L0 cache, 192 KB L1 cache, and a generous 2.5 MB L2 cache per core.
Intel Skymont E-Core Architecture -
Skymont, designated as the "E" core in Lunar Lake, is specifically engineered for efficiency, bolstering workload coverage, vector and AI throughput, and scalability. Notably, Skymont demonstrates a substantial 38% improvement in IPC for integer tasks and an impressive 68% enhancement for floating-point tasks compared to Crestmont E-Cores.
Its front-end is equipped with an 8-wide allocation and 16-wide retire unit, complemented by a 416-entry out-of-order window. Vector performance receives a notable upgrade with a 4x 128-bit floating point pipeline and SIMD vector, resulting in improved AI capabilities.
The memory subsystem of Skymont incorporates a 4 MB L2 cache per four-core cluster, doubling the bandwidth and facilitating faster L1 to L1 transfers. These enhancements culminate in significant efficiency improvements, with Skymont E-Cores achieving up to 4x higher performance at peak power compared to Crestmont.
Power Management & Thread Director Enhancements -
In Lunar Lake, a revamped Thread Director is introduced to optimize the utilization of both P-Core and E-Core resources. Improved algorithms and increased precision in workload management enhance overall efficiency. Additionally, new OS Containment Zones are implemented to regulate power and performance by assigning tasks to specific core types.
The power management module integrated into the SOC offers three distinct profiles: Best Efficiency Mode, Balanced Mode, and Performant Mode. These profiles dynamically adjust the SOC's frequency and scheduling to maximize power efficiency, resulting in significant power savings of up to 35% in applications such as Microsoft Teams.
Intel Lunar Lake’s NPU -
The NPU 4 integrated into Lunar Lake represents a substantial advancement in AI processing, boasting 48 Peak TOPS, marking a remarkable 4.36x improvement over the NPU featured in Meteor Lake.
NPU 4 is equipped with 12K MACs and 6 Neural Compute Engines, operating at a higher clock rate of 1.95 GHz. This configuration results in a significant enhancement across various metrics: a 12x increase in vector performance, 4x higher AI TOPS, and double the IP bandwidth compared to its predecessor.
Lunar Lake’s IO and Connectivity -
Lunar Lake comes with enhanced connectivity options, featuring support for Wi-Fi 7 and Thunderbolt 4. It provides the flexibility of up to 3 Thunderbolt 4 ports, delivering speeds that are 25% faster when utilizing Thunderbolt 5 SSDs.
Moreover, the integrated Wi-Fi 7 solution presents several improvements, including a 28% reduction in silicon size, an 11 Gbps CNVio 3 interface, and heightened reliability through Multi-Link Operation (MLO). Furthermore, Lunar Lake incorporates robust security features such as Intel SSE, GSC, CSME, and PSE engines, reinforcing hardware security measures.
Availability -
Intel aims to launch over 80 designs from 20+ partners for Lunar Lake SOCs in Q3 2024, with broader availability slated for Q4 2024. An AI PC developer kit based on Lunar Lake will also be accessible, supporting forthcoming CPUs like Panther Lake.