Choosing the Right SSD for a Skylake-U Systemby Ganesh T S on May 9, 2016 8:00 AM EST
Our Skylake NUC review had a brief section on the performance of the storage subsystem. The comments section raised a few questions about the inability of SSDs such as the Samsung SSD 950 PRO to achieve maximum performance in the NUC. After some discussion with Intel, we discovered some interesting aspects in the design of Skylake-U systems that have a bearing on the performance of some M.2 PCIe SSDs. These can affect the consumer's choice of SSDs for a Skylake-U system - be it a NUC or an user-upgradeable notebook.
Intel has a wide range of CPUs based on the Skylake microarchitecture. These target a variety of markets ranging from tablets / 2-in-1s and Compute Sticks to the traditional tower desktops. The same microarchitecture is able to serve different markets because of the scalable nature of the TDP / power envelop (from 4.5W to 91W).
While the high-performance H-, S- and K- CPUs need a separate Intel 100 Series platform controller hub (Sunrise Point PCH), the Skylake-U and Skylake-Y are Multi-Chip Packages (MCP) that have the Sunrise Point-LP PCH die integrated with the CPU in a single package.
The communication between the CPU and the PCH in the H-,S- and K- systems is via the Direct Media Interface (DMI 3.0),a proprietary link protocol developed by Intel. Skylake-U/-Y series processors, on the other hand, have an On Package DMI interconnect interface termed as OPI. Unlike DMI 3.0, the OPI in Skylake-U/-Y can be configured to meet the desired power or performance needs of a mobile system design. The following table summarizes the differences between DMI and the two configurable OPI options in Skylake systems.
|Skylake CPU - PCH(-LP) Communication Link Characteristics|
|Aspect||DMI 3.0||OPI GT2||OPI GT4|
|Transfer Rate per Lane||8 GT/s||2 GT/s||4 GT/s|
|Max. Theoretical Bandwidth||3.94 GBps||2 GBps||4 GBps|
For all practical purposes, DMI 3.0 and PCIe 3.0 are equivalent, and this is important when a PCIe 3.0 x4 SSD is connected to the a Skylake-H/-S/-K system using PCIe lanes from the PCH. Any other peripheral communicating with the CPU at the same time as the PCIe SSD would end up creating a bottleneck at the CPU-PCH link. On the other hand, Skylake-U/-Y systems that have a PCIe 3.0 x4 SSD connected to the Sunrise Point-LP PCIe lanes will be directly impacted by the configuration of the OPI. The GT4 configuration should have enough bandwidth to get full performance from a PCIe 3.0 x4 SSD, but a GT2 configuration could end up throttling such a device.
Analyzing the Skylake NUC6i5SYK Storage Subsystem
In order to determine whether the Sylake NUC6i5SYK is affected by the OPI capabilities, it is essential to understand the board design and the way each of the peripheral ports connect to the CPU.
The above block diagram should be considered in conjunction with the Skylake PCH-LP high-speed I/O (HSIO) configuration options depicted below. One of the x4 links multiplexed with a SATA lane is used for the M.2 22x42,80 SSD slot. One of the PCIe lanes that gets multiplexed with GbE is connected to the Intel I-219V Ethernet Adapter., and yet another PCIe lane is used for the WLAN adapter. The important aspect to note here is that any M.2 SSD can have full PCIe 3.0 x4 connectivity to the Sunrise Point-LP PCH.
Intel's current technical documentation (PDF) for the Skylake NUC board mentions that the maximum possible performance for any M.2 SSD is around 1600 MBps. The Samsung SSD 950 PRO and SM951 PCIe 3.0 x4 NVMe SSDs claim performance numbers in excess of 2000 MBps. This obviously means that there is a bottleneck between the Skylake CPU and the Sunrise Point-LP.
Intel's Skylake-U/-Y reference designs are optimized for lower power and default the OPI to GT2 rates. In the development of the NUC6i5SY product family, the Intel team utilized the reference designs and default settings for the OPI and GT2 rates. Therefore, PCIe 3.0 x4 SSDs connected to the M.2 port of the NUC6i5SYK (BIOS v0042) are effectively limited to PCIe 2.0 x4 rates. This throttling makes sense for battery-operated devices like 2-in-1s, but, not so much for UCFF desktops like the NUCs.
After we brought this to Intel's attention, the development team decided to complete the necessary changes and validation to support the maximum PCIe 3.0 performance. Intel sent over a development BIOS (v1142) that turned on the higher performance OPI GT4 rate. This BIOS is scheduled to be made public before the end of May 2016 (after completion of internal validation).
Evaluating the NUC6i5SYK Storage Subsystem
The rest of this review deals with two major aspects - a quantitative measurement of the effectiveness of different types of SSDs in the Skylake NUC, and an evaluation of the improvements resulting from ramping up the OPI to GT4 rates (i.e, a comparison of the performance using BIOS v0042 and BIOS v1142). In order to do this, we processed various benchmarks while keeping everything other than the M.2 SSD and the BIOS version constant.
|Intel NUC6i5SYK Benchmarked Configuration|
|Processor||Intel Core i5-6260U
Skylake, 2C/4T, 1.8 GHz (Turbo to 2.9 GHz), 14nm, 4MB L2, 15W TDP
|Memory||Corsair CMSX16GX4M2A2400C16 DDR4
15-15-15-35 @ 2133 MHz
|Graphics||Intel Iris Graphics 540 (Skylake-U GT3e)|
|Disk Drive(s)||Various M.2 SSDs|
|Operating System||Windows 10 Pro x64|
|Full Specifications||Intel NUC6i5SYK Specifications|
The various benchmarks presented in the next few sections were all processed with the M.2 SSD as the primary drive. The drive was initialized with two partitions. The primary OS partition was set to be 120GB in size, while the remaining space was allocated to the secondary partition. Both of the partitions were formatted in NTFS with default settings.
In the next section, we will first take a look at the specifications of the four M.2 SSDs that were evaluated in the NUC6i5SYK, along with CrystalDiskMark scores for each in both the BIOS versions. Following this, we move on to real-world benchmarks - SYSmark 2014, PCMark 8 Storage Bench and a slightly tweaked AnandTech DAS Suite. Prior to our concluding remarks, we take a look at a few miscellaneous aspects - power consumption, thermal characteristics and pricing.
Post Your CommentPlease log in or sign up to comment.
View All Comments
sorcio46 - Monday, May 9, 2016 - linkIs there a reason why these flash SSDs have a lower 4K read speed compared with 4K write?
James5mith - Monday, May 9, 2016 - linkMy guess is writes are buffered vs. reads straight from the raw NAND. But I have no idea if it's actually true.
hojnikb - Monday, May 9, 2016 - linkMore or less this.
rossjudson - Thursday, May 12, 2016 - linkFlash drives use variations on log-structured storage. The basic thing going on is that the *logical* block numbers being written (which are random) are not the same as the *physical* blocks being written. Drives create 0-N append points, and all those random writes end up becoming sequential writes to pages. At the high end, your write rate can get limited by the page erase rate, which basically translates to an energy/thermal issue (it takes a fair bit of power to erase flash memory pages). The best high end drives can sustain very high mixed read/write rates -- and the key is "sustain" -- for hours/days. Lots of drives out there can handle a short burst of activity for a few tens of seconds, caching everything in RAM on the hardware until the RAM runs out.
Random reads are tougher, because you actually have to go to a random storage block and pull the data. Sequential reads admit lookahead, but random reads don't.
Kristian Vättö - Monday, May 9, 2016 - linkSmall writes are cached to DRAM for write combining i.e. multiple IOs are written to NAND at once as the IOs are smaller than the page size. Once the IOs hit the DRAM cache, they are considered complete, hence the higher speed.
dzezik - Thursday, May 12, 2016 - linkSF-2281 has no DRAM. this is not a DRAM cache. the old SandForce 2281 was designed for SLC. the performance with MLC is medicore
bug77 - Monday, May 9, 2016 - linkAnd I'm going to make this point again: if even when using NVMe, your random reads are still limited at ~50MB/s, you're only missing on sequential transfers if you stick with AHCI and SATA. Because right now, the bottleneck is elsewhere.
Also, for Skylake-U (mobile), SATA offers lower standby power.
Kristian Vättö - Monday, May 9, 2016 - link50MB/s is a ~50% upgrade over ~30MB/s that SATA offers. It's not even close to what HDD to SSD offers, of course, but we will have to wait for next generation memory for the next huge upgrade.
bug77 - Monday, May 9, 2016 - linkSATA can do better than ~30MB/s (not sure whether Skylake-U limits the performance in any way, however).
NVMe/PCIe still makes sense, because the price premium is not that large. But I'd like to see more reviews highlighting that if you need to save ~$20, going AHCI/SATA is a better option than getting a smaller drive.
vladx - Monday, May 9, 2016 - linkPrice premium is not large? LMAO it's almost double over SATA ones.