Lunar Lake with 2nd generation Core Ultra presented - New AI notebooks, faster, more economical and with a V in the suffix

Intel today unveiled the new Core Ultra processors for notebooks based on the Lunar Lake architecture at IFA 2024 in Berlin. They represent a significant further development of their CPU series, with a strong focus on AI capabilities, energy efficiency and an integrated system-on-chip (SoC) architecture.

Here is a clear summary of the most important technical details for anyone who didn’t watch the live stream or would like to read up on it again. However, I also have the latest performance data and the popular slide collection for you at the end, so it’s worth it.

Architectural innovations

Lunar Lake introduces a fully integrated SoC design that eliminates the need for separate system memory or a traditional chipset, making it a highly compact and efficient solution. The processors feature a hybrid architecture that combines performance cores (P-cores) and efficiency cores (E-cores) to optimize performance per watt for different workloads. This design uses Intel’s updated Thread Director to dynamically assign tasks to the appropriate cores, improving both performance and energy efficiency.

Intel’s “Lion Cove” cores, which are integrated as P-cores in the new Lunar Lake processors, bring several technological improvements aimed at higher performance and efficiency. The most important improvement is a 14% increase in instructions per cycle (IPC) compared to the previous generation Redwood Cove used in Meteor Lake. This performance increase results from a number of architectural changes, including a significantly enlarged cache hierarchy and optimized prediction mechanisms.

The Lion Cove cores have introduced a redesigned cache structure that includes a new L0 cache level that sits between the existing L1 and L2 cache. This new L0 cache, which is 192 KB in size, helps to reduce the average latency for memory accesses, increasing the efficiency and responsiveness of the cores. At the same time, the L2 cache has been increased to 2.5 MB, which can store a larger amount of data and therefore improves overall performance.

Another important development is the improvement of the prediction and execution architecture. Intel has tripled the bandwidth of instructions routed from the cache to the L2 cache layer and doubled the instruction fetch bandwidth from 64 to 128 bytes per second. In addition, the bandwidth for decoding instructions has been increased, resulting in faster and more efficient processing. These changes contribute to better performance of the Lion Cove cores, especially in single-threaded applications.

One notable move was the removal of Hyper-Threading (HT) from the Lion Cove cores. This decision was made to improve energy efficiency and simplify thermal management, which is particularly beneficial for use in ultra-thin notebooks. Without HT, Intel can provide a simpler and more energy-efficient architecture that still delivers high performance in single-threaded applications.

Additionally, power management has been refined through the use of AI-based controllers that dynamically adapt to operating conditions. This adaptation allows the cores to control clock rates more finely, ensuring more accurate power management. The combination of these improvements significantly increases the efficiency and performance of the Lion Cove cores in the Lunar Lake processors and strengthens Intel’s position in competition with other manufacturers such as Apple, AMD and Qualcomm in the mobile processor market.

The “Skymont” E-cores in Intel’s Lunar Lake processors represent a significant evolution in architecture aimed at efficiency and performance. These E-cores offer a significant increase in instructions per cycle (IPC) compared to the previous “Crestmont” E-cores used in Meteor Lake. The “Skymont” cores achieve up to 68% higher IPC performance for floating point operations and 38% higher performance for integer computations. This makes them particularly efficient for energy-intensive applications, while consuming less power than their predecessors.

The improvements of the “Skymont” E cores result from several architectural changes. Key innovations include an expanded 9x decode unit compared to the previous 6x unit and improved branch prediction, which significantly increases the efficiency of instruction execution. In addition, the width of the integer ALUs has been increased to 80, which further increases parallelism and processing speed. These E-cores also have increased bandwidth within the cache and register files, which improves overall performance and data processing efficiency.

In addition, the four “Skymont” E-cores share a 4 MB L2 cache, with the L2 bandwidth doubled to enable faster data accesses. This helps to reduce power consumption while maximizing performance in multi-core applications. These cores are also designed to operate in a “compute island” configuration, which enables efficient use of compute resources and reduces the need for hyper-threading, resulting in further power savings.

AI and graphics improvements

A standout feature of Lunar Lake is its focus on the much-hyped AI capabilities. The new NPU 4 (Neural Processing Unit) achieves up to 48 TOPS (trillions of operations per second) at INT8 performance and is specifically designed to handle AI tasks such as those required for Microsoft Copilot.

A key feature of the NPU 4 is also its improved ability to work with different precisions such as FP16 (Floating Point 16-bit), resulting in more accurate and efficient calculations. In addition, the NPU 4 offers four times higher vector computing performance compared to its predecessor, the NPU 3, and significantly improves performance on transform models and large language models (LLMs). This enables faster and more energy-efficient processing of complex neural networks.

The architecture of NPU 4 also includes an optimized pipeline for inference tasks, supporting more complex and sophisticated neural network models with higher speed and accuracy. These improvements, along with increased IP bandwidth and advanced data conversion techniques, make NPU 4 a compelling solution for demanding AI workloads on mobile platforms.

In addition, Intel has optimized the NPU’s frequency and voltage curves through the use of AI techniques, resulting in an additional reduction in power consumption of up to 20%. This is particularly relevant as modern applications require more and more performance while consuming less power.

Intel’s Xe2 GPU, which is integrated into the new Lunar Lake processors, represents a significant advance in graphics performance and efficiency. The Xe2 GPU is based on the “Battlemage” architecture and offers 1.5 times higher performance at the same power consumption compared to the previous generation. This GPU contains eight second generation Xe cores, 64 vector units, two geometry pipelines and eight ray tracing units, resulting in improved graphics and AI performance.

It supports advanced features such as ray tracing and XeSS (Xe Super Sampling) technology, which improves image quality at lower compute loads. The GPU also offers broad support for output standards, including HDMI 2.1, DisplayPort 2.1 and eDP 1.5, and can drive up to three 4K60 HDR displays or one 8K60 HDR display.

The Xe2 GPU also makes an important contribution to AI applications by providing additional computing power for machine learning and other AI workloads. In combination with the NPU 4 (Neural Processing Unit), the system can achieve a total of up to 120 TOPS (Tera operations per second), making it ideal for modern AI PCs and demanding applications. The GPU alone can process up to 67 TOPS, which is a significant increase in performance compared to the previous generation. In addition, it also offers hardware acceleration for matrix operations, significantly boosting AI processing capabilities compared to previous generations.

The GPU also features an improved media engine that supports 8K60 HDR decoding and encoding, as well as the new H.266/VVC codec for decoding. These improvements certainly make the Xe2 GPU a powerful component for multimedia applications and intensive gaming in thin and light laptops.

[the_ad_group id=”7834″]

Performance and efficiency

Lunar Lake’s architectural improvements result in significant performance and efficiency gains. The E cores see a 68% improvement in instructions per cycle (IPC), while the P cores achieve a 14% increase in IPC compared to the Meteor Lake architecture. This efficiency is supported by a new ‘side cache’ that reduces data flow and power consumption across the SoC.

This side cache is an additional caching layer that is inserted between the traditional level-1 (L1) and level-2 (L2) caches to improve the efficiency of memory accesses. It consists of an 8 MB shared cache that acts as a kind of L4 cache, although technically it is not one, as it is shared between all processing units. This cache is designed to improve data locality and reduce data movement between the different processing units on the chip, resulting in energy consumption savings.

The introduction of the side cache enables more efficient use of cache memory by ensuring that more frequently used data is available faster without having to go through the different layers of the cache system. This improves overall performance, especially for applications that require high memory bandwidth.

In addition to this side cache, Lunar Lake also supports LPDDR5X memory directly on the chip, which also helps to reduce energy consumption and further increases the overall efficiency of the system. This configuration not only reduces latency, but also lowers power consumption by up to 40% compared to traditional designs. These changes in memory architecture are part of Intel’s strategy to improve the performance and power efficiency of its processors to compete with rival products such as Apple’s M-series and Qualcomm’s Snapdragon chips.

Performance and first benchmarks

Of course, Intel leaves nothing to chance and, with one exception, all live benchmarks were only seen on the appropriate Intel systems, all of which are devices that can be pre-ordered from today and purchased after IFA.

But of course there are also more comparative benchmarks in slide form in the file collection attached below, which I don’t want to withhold from you as part of the presentation:

IFA 2024 Press Deck

Manufacturing and process nodes

Intel’s Lunar Lake processors are produced using an optimized manufacturing process based on collaboration with TSMC, the world’s largest semiconductor manufacturing company. The key components, known as “tiles”, are produced on TSMC’s state-of-the-art manufacturing processes, in particular the N3B node for the compute tile, which contains the CPU cores, GPU and NPU. The platform controller tile is manufactured on TSMC’s N6 node, which enables a finer structure and higher density of circuits. Only the base tile, which serves as the interconnect between the different tiles, is manufactured by Intel itself on its 22FFL process.

The reason for the decision to use TSMC’s manufacturing technology lies in their superior technology at the time of production. Intel made this decision to achieve the best mix of performance, scalability and efficiency that was not achievable with Intel’s own process nodes (Intel 4 or Intel 3). This allows the Lunar Lake processors to offer higher energy efficiency and improved overall performance, which is especially important for mobile and low-power applications.

In addition to advanced manufacturing techniques, Intel also uses its Foveros packaging technology to connect the different tiles of the SoC. This technology enables tight integration of components, which contributes to improved performance and reduced latency. The use of a smaller bump pitch (25 microns compared to 36 microns for Meteor Lake) enables denser communication between units and helps reduce power consumption.

Target market and availability

Intel is positioning the Lunar Lake processors primarily for the mobile market, with a focus on thin and light devices that require high performance and efficient AI capabilities. The processors are expected to launch in the third quarter of 2024 and are expected to compete strongly against Qualcomm and Apple’s M-series offerings

Intel’s Lunar Lake processors signal a strong push towards AI-centric computing and improved power efficiency, aiming to deliver best-in-class performance in a compact, integrated package that meets the demands of modern mobile computing. What the end customer will notice and benefit from in practice will have to be shown in appropriate tests.

Intel Core Ultra 200V Series - Product Brief

The information was provided by Intel as part of an event under NDA. The only condition was compliance with the publication deadline. The costs for accommodation and meals during the event were borne by Intel.

[the_ad_group id=”7834″]