Technology.Life.Insight.

The Native MSFT Stack: S2D & RDS – Part 3

Part 1: Intro and Architecture

Part 2: Prep & Configuration

Part 3: Performance & Troubleshooting (you are here)

Make sure to check out my series on RDS in Server 2016 for more detailed information on designing and deploying RDSH or RDVH.

 

Performance

This section will give an idea of disk activity and performance given this specific configuration as it relates to S2D and RDS.

Here is the 3-way mirror in action during the collection provisioning activity, 3 data copies, 1 per host, all hosts are active, as expected:

 

Real-time disk and RDMA stats during collection provisioning:

 

RDS provisioning isn’t the speediest solution in the world by default as it creates VMs one by one. But it’s fairly predictable at 1 VM per minute, so plan accordingly.

 

Optionally, you can adjust the concurrency level to adjust the number of VMs RDS can create in parallel by using the Set-RDVirtualDesktopConcurrency CMDlet. Server 2012R2 supported a max of 20 concurrent operations but make sure the infrastructure can handle whatever you set here.

Set-RDVirtualDesktopConcurrency –ConnectionBroker “RDCB name” –ConcurrencyFactor 20

 

To illustrate the capacity impact of 3-way mirroring, take my lab configuration which provides my 3-node cluster with 14TB of total capacity, each node contributing 4.6TB. I have 3 volumes fixed provisioned totaling 4TB with just under 2TB remaining. The math here is that of my raw capacity, 30% is usable, as expected. This equation gets better at larger scales, starting with 4 nodes, so keep this in mind when building your deployment.

 

Here is the disk performance view from one of the Win10 VMs within the pool (VDI-23) running Crystal DiskMark, which runs Diskspd commands behind the scenes. Read IOPS are respectable at an observed max of 25K during the sequential test.

image_thumb[3]

 

Write IOPS are also respectable at a max observed of nearly 18K during the Sequential test.

image_thumb[4]

 

Writing to the same volume, running the same benchmark but from the host itself, I saw some wildly differing results. Similar bandwidth numbers but poorer showing on most tests for some reason. One run yielded some staggering numbers on the random 4K tests which I was unable to reproduce reliably.

image_thumb[5]

Running Diskspd tells a slightly different story with more real-world inputs specific to VDI: write intensive. Here given the punishing workload, we see impressive disk performance numbers and almost 0 latency. The following results were generated from a run generating a 500MB file, 4K blocks weighted at 70% random writes using 4 threads with cache enabled.

Diskspd.exe -b4k -d60 -L -o2 -t4 -r -w70 -c500M c:\ClusterStorage\Volume1\RDSH\io.dat

image_thumb[6]

 

Just for comparison, here is another run with cache disabled. As expected, not so impressive.

image_thumb[8]

 

Troubleshooting

Here are a few things to watch out for along the way.

When deploying your desktop collection, if you get an error about a node not having a virtual switch configured:

 

One possible cause is that the default SETSwitch was not configured for RDMA on that particular host. Confirm and then enable per the steps above using Enable-NetAdapterRDMA.

 

On the topic of VM creation within the cluster, here is something kooky to watch out for. If working on one of your cluster nodes locally and you try to add a new VM to be hosted on a different node, you may have trouble. When you get to the OS install portion where the UI offers to attach an ISO to boot from. If you point to an ISO sitting on a network share, you may get the following error: Failing to add the DVD device due to lack of permissions to open attachment.

 

You will also be denied if you try to access any of the local paths such as desktop or downloads.

 

The reason for this is that the dialog is using the remote browser for file access on the server you are creating the VM. Counter-intuitive perhaps but the work around is to either log into the host where the VM is to be created directly or copy your ISO local to that host. 

 

Watch this space, more to come…

 

Part 1: Intro and Architecture

Part 2: Prep & Configuration

Part 3: Performance & Troubleshooting (you are here)

 

Resources

S2D in Server 2016

S2D Overview

Working with volumes in S2D

Cache in S2D

The Native MSFT Stack: S2D & RDS – Part 2

Part 1: Intro and Architecture

Part 2: Prep & Configuration (you are here)

Part 3: Performance & Troubleshooting

Make sure to check out my series on RDS in Server 2016 for more detailed information on designing and deploying RDSH or RDVH.

 

Prep

The number 1 rule of clustering with Microsoft is homogeneity. All nodes within a cluster must have identical hardware and patch levels. This is very important. The first step is to check all disks installed to participate in the storage pool, bring all online and initialize. This can be done via PoSH or UI.

Once initialized, confirm that all disks can pool:

Get-PhysicalDisk -CanPool $true | sort model

 

Install Hyper-V and Failover Clustering on all nodes:

Install-WindowsFeature -name Hyper-V, Failover-Clustering -IncludeManagementTools -ComputerName InsertName -Restart

 

Run cluster validation, note this command does not exist until the Failover Clustering feature has been installed. Replace portions in red with your custom inputs. Hardware configuration and patch level should be identical between nodes or you will be warned in this report.

Test-Cluster -Node node1, node2, etc -include "Storage Spaces Direct", Inventory, Network, "System Configuration"

 

The report will be stored in c:\users\<username>\AppData\Local\Temp

Networking & Cluster Configuration

We'll be running the new Switch Embedded Team (SET) vSwitch feature, new for Hyper-V in Server 2016.

 

Repeat the following steps on all nodes in your cluster. First, I recommend renaming the interfaces you plan to use for S2D to something easy and meaningful. I'm running Mellanox cards so called mine Mell1 and Mell2. There should be no NIC teaming applied at this point!

 

Configure the QoS policy, first enable the datacenter bridging (DCB) feature which will allow us to prioritize certain services the traverse the Ethernet.

Install-WindowsFeature -Name Data-Center-Bridging

 

Create a new policy for SMB Direct with a priority value of 3 which marks this service as “critical” per the 802.1p standard:

New-NetQosPolicy "SMBDirect" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3

 

Allocate at least 30% of the available bandwidth to SMB for the S2D solution:

New-NetQosTrafficClass "SMBDirect" -Priority 3 -BandwidthPercentage 30 -Algorithm ETS

 

Create the SET switch using the adapter names you specified previously. Items in red should match your choices or naming standards. Note that this SETSwitch is ultimately a vNIC and can receive an IP address itself:

New-VMSwitch -Name SETSwitch -NetAdapterName "Mell1","Mell2" -EnableEmbeddedTeaming $true

 

Add host vNICs to the SET switch you just created, these will be used by the management OS. Items in red should match your choices or naming standards. Assign static IPs to these vNICs as required as these interfaces are where RDMA will be enabled.

Add-VMNetworkAdapter –SwitchName SETSwitch –name SMB_1 –managementOS

Add-VMNetworkAdapter –SwitchName SETSwitch –name SMB_2 –managementOS

 

Once created, the new virtual interface will be visible in the network adapter list by running get-netadapter.

image_thumb

 

Optional: Configure VLANs for the new vNICs which can be the same or different but IP them uniquely. If you don’t intend to tag VLANs in your cluster or have a flat network with one subnet, skip this step.

Set-VMNetworkAdapterVLAN –VMNetworkAdapterName “SMB_1” –VlanId 00 –Access –ManagementOS

Set-VMNetworkAdapterVLAN –VMNetworkAdapterName “SMB_2” –VlanId 00 –Access –ManagementOS

 

Verify your VLANs and vNICs are correct. Notice mine are all untagged since this demo is in a flat network.

Get-VMNetworkAdapterVlan –ManagementOS

image_thumb1

 

Restart each vNIC to activate the VLAN assignment.

Restart-NetAdapter “vEthernet (SMB_1)”

Restart-NetAdapter “vEthernet (SMB_2)”

 

Enable RDMA on each vNIC.

Enable-NetAdapterRDMA “vEthernet (SMB_1)”, “vEthernet (SMB_2)”

 

Next each vNIC should be tied directly to a preferential physical interface within the SET switch. In this example we have 2 vNICs and 2 physical NICs for a 1:1 mapping. Important to note: Although this operation essentially designates a vNIC assignment to a preferential pNIC, should the assigned pNIC fail, the SET switch will still load balance vNIC traffic across the surviving pNICs. It may not be immediately obvious that this is the resulting and expected behavior.

Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName "SMB_1" -ManagementOS –PhysicalNetAdapterName “Mell1”

Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName "SMB_2" -ManagementOS –PhysicalNetAdapterName “Mell2”

 

To quickly prove this, I have my vNIC “SMB_1” preferentially tied to the pNIC “Mell1”. SMB_1 has the IP address 10.50.88.82

image_thumb3

 

Notice that even though I have manually disabled Mell1, the IP still responds to a ping from another host as SMB_1’s traffic is temporarily traversing Mell2:

image_thumb4

 

Verify the RDMA capabilities of the new vNICs and associated physical interfaces. RDMA Capable should read true.

Get-SMBClientNetworkInterface

image_thumb2

 

Build the cluster with a dedicated IP and slash subnet mask. This command will default to /24 but still might fail unless explicitly specified.

New-Cluster -name "Cluster Name" -Node Node1, Node2, etc -StaticAddress 0.0.0.0/24

 

Checking the report output stored in c:\windows\cluster\reports\, it flagged not having a suitable disk witness. This will auto-resolve later once the cluster is up.

 

The cluster will come up with no disks claimed and if there is any prior formatting, they must first be wiped and prepared. In PowerShell ISE, run the following script, enter your cluster name in the red text.

icm (Get-Cluster -Name S2DCluster | Get-ClusterNode) {

Update-StorageProviderCache

Get-StoragePool | ? IsPrimordial -eq $false | Set-StoragePool -IsReadOnly:$false -ErrorAction SilentlyContinue

Get-StoragePool | ? IsPrimordial -eq $false | Get-VirtualDisk | Remove-VirtualDisk -Confirm:$false -ErrorAction SilentlyContinue

Get-StoragePool | ? IsPrimordial -eq $false | Remove-StoragePool -Confirm:$false -ErrorAction SilentlyContinue

Get-PhysicalDisk | Reset-PhysicalDisk -ErrorAction SilentlyContinue

Get-Disk | ? Number -ne $null | ? IsBoot -ne $true | ? IsSystem -ne $true | ? PartitionStyle -ne RAW | % {

$_ | Set-Disk -isoffline:$false

$_ | Set-Disk -isreadonly:$false

$_ | Clear-Disk -RemoveData -RemoveOEM -Confirm:$false

$_ | Set-Disk -isreadonly:$true

$_ | Set-Disk -isoffline:$true

}

Get-Disk |? Number -ne $null |? IsBoot -ne $true |? IsSystem -ne $true |? PartitionStyle -eq RAW | Group -NoElement -Property FriendlyName

} | Sort -Property PsComputerName,Count

Once successfully completed, you will see an output with all nodes and all disk types accounted for.

 

Finally, it’s time to enable S2D. Run the following command and select “yes to all” when prompted. Make sure to use your cluster name in red.

Enable-ClusterStorageSpacesDirect –CimSession S2DCluster

 

Storage Configuration

Once S2D is successfully enabled, there will be a new storage pool created and visible within Failover Cluster Manager. The next step is to create volumes within this pool. S2D will make some resiliency choices for you depending on how many nodes are in your cluster. 2 nodes = 2-way mirroring, 3 nodes = 3-way mirroring, if you have 4 or more nodes you can specify mirror or parity. When using a hybrid configuration of 2 disk types (HDD and SSD), the volumes reside within the HDDs as the SSDs simply provide caching for reads and writes. In an all-flash configuration only the writes are cached. Cache drive bindings are automatic and will adjust based on the number of each disk type in place. In my case, I have 4 SSDs + 5 HDDs per host. This will net a 1:1 cache:capacity map for 3 pairs of disks and a 1:2 ratio for the last 3. Microsoft’s recommendation is to make the number of cache drives a multiple of the number of capacity drives, for simple symmetry. If a host experiences a cache drive failure, the cache to capacity mapping will readjust to heal. This is why a minimum of 2 cache drives per host are recommended.

 

Volumes can be created using PowerShell or Failover Cluster Manager by selecting the “new disk” option. This one simple PowerShell command does three things: creates the virtual disk, places a new volume on it and makes it a cluster shared volume. For PowerShell the syntax is as follows:

New-Volume –FriendlyName “Volume Name” –FileSystem CSVFS_ReFS –StoragePoolFriendlyName S2D* –size xTB

 

CSVFS_ReFS is recommended but CSVFS_NTFS may also be used. Once created these disks will be visible under the Disks selection within Failover Cluster Manager. Disk creation within the GUI is a much longer process but gives meaningful information along the way, such as showing that these disks are created using the capacity media (HDD) and that the 3-server default 3-way mirror is being used.

 

The UI also shows us the remaining pool capacity and that storage tiering is enabled.

 

Once the virtual disk is created, next we need to create a volume within using another wizard.  Select the vDisk created in the last step:

 

Storage tiering dictates that the volume size must match the size of the vDisk:

 

Skip the assignment of a drive letter, assign a label if you desire, confirm and create.

 

 

The final step is to add this new disk to a Cluster Shared Volume via Failover Cluster Manager. You’ll notice that the owner node will be automatically assigned:

 

Repeat this process to create as many volumes as required.

 

RDS Deployment

Install the RDS roles and features required and build a collection. See this post for more information on RDS installation and configuration. There is nothing terribly special that changes this process for a S2D cluster. As far as RDS is concerned, this is an ordinary Failover Cluster with Cluster Shared Volumes. The fact that RDS is running on S2D and “HCI” is truly inconsequential.

As long as the RDVH or RDSH roles are properly installed and enabled within the RDS deployment, the RDCB will be able to deploy and broker connections to them. One of the most important configuration items is ensuring that all servers, virtual and physical, that participate in your RDS deployment, are listed in the Servers tab. RDS is reliant on Server Manager and Server Manager has to know who the players are. This is how you tell it.

image_thumb[1]

 

To use the S2D volumes for RDS, point your VM deployments at the proper CSVs within the C:\ClusterStorage paths. When building your collection, make sure to have the target folder already created within the CSV or collection creation will fail. Per the example below, the folder “Pooled1” needed to be manually created prior to collection creation.

 

During provisioning, you can select a specific balance of VMs to be deployed the RDVH-enabled hosts you select, as can be seen below.

 

RDS is cluster aware in that the VMs it creates can be created as HA within the cluster and the RD Connection Broker (RDCB) remains aware as to which host is running which VMs should they move. In the event of a failure, VMs will be moved to a surviving host by the cluster service and the RDCB will keep track.

 

Part 1: Intro and Architecture

Part 2: Prep & Configuration (you are here)

Part 3: Performance & Troubleshooting

 

Resources

S2D in Server 2016

S2D Overview

Working with volumes in S2D

Cache in S2D

The Native MSFT Stack: S2D & RDS – Part 1

Part 1: Intro and Architecture (you are here)

Part 2: Prep & Configuration

Part 3: Performance & Troubleshooting

Make sure to check out my series on RDS in Server 2016 for more detailed information on designing and deploying RDSH or RDVH.

 

 

For Microsoft purists looking to deploy VDI or another workload on the native Microsoft stack, this is the solution for you!  Storage Spaces Direct (S2D) is Microsoft’s new kernel-mode software-defined-storage model exclusive to Server 2016. In the last incarnation of Storage Spaces on Server 2012R2, JBODs were required with SAS cables running to each participating node that could then host SMB namespaces. No longer! Server 2016 Datacenter Edition, local disks and Ethernet (10Gb + RDMA recommended) are all you need now. The setup is executed almost exclusively in PowerShell which can be scripted, of course. Microsoft has positioned their first true HCI stack targeting the smaller side of SMB with a 16 node maximum scale. The S2D architecture runs natively on a minimum of 2 nodes configured in a Failover Cluster with 2-way mirroring.

NVMe, SSD or HDDs installed in each host are contributed to a 2-tier shared storage architecture using clustered mirror or parity resiliency models. S2D supports hybrid or all-flash models with usable disk capacity calculated using only the capacity tier. SSD or NVMe can be used for caching and SSD or HDD can be used for capacity. Or a mix of all three can be used with NVMe used for cache, SSD and HDD providing capacity. Each disk tier is extensible and can be modified up or down at any time. Dynamic cache bindings ensure that the proper ratio of cache:capacity disks remain consistent for any configuration regardless of whether cache or capacity disks are added or removed. The same is true in the case of drive failure in which case S2D self heals to ensure a proper balance. Similar to VMware VSAN, the storage architecture changes a bit between hybrid or all-flash. In the S2D hybrid model, the cache tier is used for both writes and reads. When the all-flash model is in play, the cache tier is used only for writes. Cluster Shared Volumes are created within the S2D storage pool and shared across all nodes in the cluster. Consumed capacity in the cluster is determined by provisioned volume sizes, not actual storage usage within a given volume.

 

A new Switch Embedded Teaming (SET) model brings NIC teaming integration to Hyper-V in a way that provides easy pNIC service protection and vNIC scale out for distributed management architectures. By contrast, in 2012R2 the teaming mechanism exists at the Windows OS layer with a few load balancing modes included. For Hyper-V in 2016 SET is the model you want to use. Important to note that this should not be combined with any other NIC teaming option, you need to use one or the other. vNICs are created to connect VMs to the SET switch or used by the Management OS for clustering or other services. If you want to use different VLANs for specific services, a vNIC per VLAN will be required.

 

NTFS is available but ReFS is the recommended file system for most implementations which currently doesn’t support deduplication. Storage capacity efficiency is directly related to the resiliency model you choose to implement for your volumes. Mirroring does as it suggests by creating 2 or 3 copies of data on discrete hosts. Parity, also known as Erasure Coding, is similar to a RAID model which uses less storage for resiliency but at the cost of disk performance. New to Server 2016 is a mixed parity mode that allows a volume to exist as part mirror and part parity. Writes would first go to the mirror portion then later be de-staged to the parity portion. A desired level of fault tolerance will affect the minimum number of nodes required per a given configuration, as outlined in the table below.

 

Resiliency Fault Tolerance Storage Efficiency Minimum # hosts
2-way mirror 1 50% 2
3-way mirror 2 33% 3
Dual parity 2 50-80% 4
Mixed 2 33-80% 4

 

Much more information on S2D resiliency and examples here. As a workload, RDS layers on top of S2D as it would with any other Failover Cluster configuration. Check out my previous series for a deeper dive into the native RDS architecture.

 

Lab Environment

For this exercise I’m running a 3-node cluster on Dell R630’s with the following specs per node:

  • Dual E5-2698v4
  • 256GB RAM
  • HBA330
  • 4 x 400GB SAS SSD
  • 5 x 1TB SAS HDD
  • Dual Mellanox ConnectX-3 NICs RDMA Over Converged Ethernet (ROCE)
  • Windows 2016 Datacenter (1607)

 

The basic high-level architecture of this deployment:

 

 

 

Part 1: Intro and Architecture (you are here)

Part 2: Prep & Configuration

Part 3: Performance & Troubleshooting

 

Resources

S2D in Server 2016

S2D Overview

Working with volumes in S2D

Cache in S2D

New PC Build for 2017

Credit: BitFenix

I can't believe it's time already, 2 years has flown by! I've managed to stick to a 2-year upgrade cadence for the last 8 years, performing meaningful upgrades at each interval. This year I debated skipping but I pressed forward as there are a few key technologies I want to enable: Pascal GPU, M.2 NVMe, DDR4 RAM and the Z270 chipset. 2 years is also a great time frame for reasonable cost reclamation on my used parts. The new Kaby Lake CPU upgrade doesn't look to be a massive jump from the Haswell chip I ran for the past 2 years (4790K) and an even smaller step from Skylake. In case you want to check out my previous builds they can be found here and here.

I plan to reuse my BitFenix Phenom M Micro-ATX case which I absolutely love, my amazing Noctua NH-U12S CPU cooler + case fans and my Silverstone ST55F PSU, but those will be the only holdovers.

 

 

Motherboard

Micro-ATX is still the smallest form factor that works for me given what I want to achieve and I endeavor to go as small as possible.  The Asus ROG Strix Z270i came very close to pushing me into Mini-ITX. For this build I need 2 x M.2 slots and like the idea of having a 2nd PCIe slot should I want to do SLI at some point. 2 x DIMM slots is doable to run the 32GB RAM I need but again, no expansion while reusing my existing investment. So I stayed with mATX and went with the Asus Prime Z270M-Plus which will suit my needs without a lot of extra pizazz. There are a number of considerations in the Asus mobo lineup as well as in the Intel chipsets. The graphic below highlights the key features of my new board. Most notably, RAM clock support up to 3866MHz, a USB Type-C connector in the IO panel, dual x4 M.2 slots positioned optimally below the PCIe slots, Intel Gb NICs and SafeSlot Core which provides PCIe slot strength to hold those heavy GPUs.

 

Credit: Asus

 

Asus put together a great blog post that details all their current motherboard product lines and how they are differentiated. In short the Prime line is the all-around mainstream offer without a ton of extravagances, TUF is the military spec with drop resistance, Strix is geared at the gaming crowd with 5 levels (low to high) and supports higher RAM clocks, ROG is aimed at people who consider themselves enthusiasts with extra lighting support, overclocking options and the highest available RAM clocks. Having bought ROG boards in the past and not used a fraction of the extra stuff they come with, I took a serious feature evaluation on exactly what I need in a motherboard. Prime suits my needs just fine and the saved $ is welcomed.

 

Chipset

The 270 is the new chipset for the 7th gen Intel proc and comes in “Z” or “H” variations. Z for consumers includes features like overclocking while H is geared more for corporate builds with less features. New in the 200 series chipset is support for Intel’s Optane NVMe and dual m.2 slots with dedicated PCIe lanes. Both Skylake and Kaby Lake CPUs use the same 1151 socket and are interchangeable on either 100 or 200 series boards. Below is a PCH feature comparison of the 4 most relevant configurations at the moment. You notice the biggest difference between the 170 and 270 are more IO/PCIe lanes. Also on the 170 chipset, a M.2 device cost 2 x SATA ports since these share PCIe lanes (same on the Z97 boards), this tax has been removed on the 270.

 

Chipset

Intel Z270

Intel H270

Intel Z170

Intel H170

SKU Focus Segment

Consumer

Consumer / Corporate

Consumer

Consumer / Corporate

CPU Support

Kaby Lake-S / Skylake-S

Kaby Lake-S / Skylake-S

Kaby Lake-S / Skylake-S

Kaby Lake-S / Skylake-S

CPI PCI-e Configuration

1 x 16 or 2 x 8 or 1 x 8 + 2 x 4

1 x 16

1 x 16 or 2 x 8 or 1 x 8 + 2 x 4

1 x 16

Independent DisplayPort

3

3

3

3

Memory DIMMs

4

4

4

4

Overclocking

Yes

No

Yes

No

Intel SmartSound Technology

Yes

Yes

Yes

Yes

Intel Optane Technology

Yes

Yes

No

No

Intel Rapid Storage Technology

15

15

14

14

Intel Rapid Storage Technology From PCIe Storage Drive Support

Yes

Yes

Yes

Yes

RAID 0,1,5,10

Yes

Yes

Yes

Yes

Intel Smart Response Technology

Yes

Yes

Yes

Yes

I/O Port Flexibility

Yes

Yes

Yes

Yes

Maximum High Speed I/O Lanes

30

30

26

22

Total USB Ports (Max USB 3.0)

14 (10)

14 (8)

14 (10)

14 (8)

Max SATA 6 Gbps Ports

6

6

6

6

Max PCI-E 3.0 Lanes

24

20

20

16

Max Intel RST for PCIe Storage Ports (x2 M.2 or x4 M.2)

3

2

3

2

 

CPU

Intel is making it less difficult to choose a desktop CPU these days with far fewer overall options if you want a higher-end unlocked CPU under $500 and aren’t upgrading from a recent gen. If you want the best 8 core desktop CPU Intel has to offer and have no budget, look no further than the i7-6900K. If you’re upgrading from a 4th or earlier gen i7 the choice is simple: i7-7700K (14nm). If you have a Skylake CPU already (i7-6700K, also 14nm) there is no need to upgrade as any performance gain will be minimal. If you’re like me coming from Haswell (i7-4790K), the performance gain will be at best in the 10% range and may not be worth it, unless also like me, you want to upgrade to enable other specific features. It’s odd to put this in print, but for this build the CPU is actually the least exciting thing I’m adding here! It also appears that this trend will continue for the next few generations of Intel CPUs, so I might not be upgrading again in 2 years. Between the 6th and 7th generation CPUs mentioned above, the only improvements on the Kaby Lake are: +200MHz base clock (times 4 cores), +300MHz to Turbo,  increased DDR4 speeds, and new embedded graphics gen, same socket, same 91W TDP. There just isn’t much here to warrant an upgrade from 6th to 7th gen sadly.

 

Another alternative if you’re tired of the stale and uninspiring Intel lineup is the AMD Ryzen 7 coming in a few weeks. Early benchmarks show Ryzen beating the pants off of the highest end Intel chips. Promising to ship for a fraction of the cost ($329) and boasting 8 cores with a higher base clock but lower TDP, Ryzen should get serious consideration. This will be ultimately a good thing for Intel and push them to respond in kind upping their game to compete. Who knows, in 2 years I may be rebuilding with AMD.

Credit: AMD

CPU Cooler

Since the mounting of Intel CPUs hasn’t changed enough from the 4th gen CPUs, I was able to reuse my Noctua NH-U12S tower cooler which is designed to support many different mounting requirements. This all copper 6 heat piper cooler has 45MM fins so will not encroach on any  memory heat spreader, regardless of height. With the NF-F12 fan attached to the cooler, cold air is pushed across the fins, carrying hot air straight out the back of my case. This remains and excellent part and excellent choice for a SFF Kaby Lake PC build.

Credit: Noctua

Memory

Another reason I wanted to do this build was for the DDR4 upgrade as my Haswell build was DDR3. The memory market is a bit more complicated from the last time I looked at it, with most higher-end vendors playing the same marketing game. Most of the DIMMs available in this space are really 2133MHz DIMMs, but using XMP2 profiles enabled in the BIOS, the modules clock much higher. These modules are marketed using this higher frequency, which is not the default. This is the difference between the 1.2v SPD value of the module and the “tested” value enabled via higher voltage in XMP. What you pay for ultimately is this higher tested speed, a pretty heat spreader and a lifetime warranty.  After reading several articles comparing various DDR4 clock speeds across various benchmarks and applications, what is glaringly apparent is that the usefulness of faster memory fits into a very limited use case. In other words, almost no one will truly benefit by paying too much for high clocked RAM. My research showed that 3000MHz is about the sweet spot for DDR4 right now, anything faster is just more expensive with unlikely-to-achieve returns.

Now that said, I was initially sold on a 16GB x 2 kit from the Corsair Vengeance LPX line marketed at 3200MHz. I went with these because Corsair is a well regarded brand and they were priced like a lower clocked pair. I was unable to ever achieve the marketed speed of 3200MHz, the closet I got, stable, was 3100MHz. But to get there I had to overclock the CPU as well. With only XMP enabled on these modules, I was never able to even POST, so I sent them back. There were several other reviews stating the same problem, which I should have heeded. Read the negative reviews and don’t settle for sub-par quality!

My second attempt was a 32GB 3000MHz TridentZ kit from G.Skill, F4-3000C15D-32GTZ. These modules are amazing, not only did the XMP profile take immediately as expected with no CPU OC required, they are quite possibly the most beautiful DIMMs I’ve ever laid eyes on. Pictures really don’t do them justice.

Credit: G.Skill

 

You’ll notice in the G.Skill spec table below the difference between the SPD and Tested values, these are sold with the tested speeds as you can see on the DIMM label above. Nevertheless, I REALLY like this memory!

 

XMP2 profile applied, RAM performing as tested, as expected.

 

One note on the capacity, how much RAM do you really need? Personally, I leave a ton of stuff open all the time on my PC. 50+ tabs in chrome, word docs, Visio diagrams, RDP sessions etc. I like not having to ever worry about not having enough memory, no matter the workload I need to run. Dishonored 2 itself can use upwards of 8GB all on its own! The other thing I like is not having to use a pagefile which saves wear and tear on my SSDs but also guarantees high performance for all applications. To pull this off, watch the committed value on the memory tab in Task Manager. If this grows anywhere near the amount of physical RAM you have, the pagefile should probably be left enabled. If not, you could disable the pagefile and ease the burden on your SSD.

Data

One the largest motivators for my upgrade this year was moving to Non-Volative Memory Express (NVMe), specifically for the really exciting modules in the new Samsung 960 line. I’ll continue my three-tier model for OS, apps and data, using NVMe for the first two and spinning media for the third. Eventually I’d like to replace the data tier with an SSD, when the time is right. For this build I have a 512GB Samsung 960 Pro for my C drive, a 500GB Samsung 960 Evo for the app drive and a 4TB Seagate BarraCuda 3.5” SATA disk for archive data. These new Samsung NVMe modules are outperforming anything else in this space right now with the following factory performance profiles.

 

Module Capacity Sequential Reads Sequential Writes
960 Pro 512GB 3500MB/s 2100MB/s
960 Evo 500GB 3200MB/s 1900MB/s

 

These optimistic factory numbers are several times faster than any SSD from any previous build. Crazy! As of this writing, the 512GB Pro module is $330 with the 500GB Evo at $250. Definitely a healthy cost differential for what you may argue is a near negligible performance delta. I wanted, what was perceived, as the fastest NVMe on the market so it seemed worth the price. Capacities on the Pro module go up to 2TB and 1TB on the Evo. The other big difference between these is the Pro modules come with a 5 year warranty vs 3 years on the Evo. These are x4 PCIe 3.0 modules using Samsung’s new famed Polaris controller and 48-layer V-NAND memory.

 

Credit: Samsung

 

For local archive data I’m running the 4TB Seagate BarraCuda which is 6Gb SATA and 3.5”. Not a lot to say about this guy, who is a dinosaur in the making.

Credit: Seagate

 

My basic data scheme goes like this: C: OS only, D: Apps, games and a copy of working data (Dropbox), X: maintains a copy of working data from D and anything else too large or unimportant enough to replicate. Important data is then replicated to a 3rd target: NAS.

 

Graphics

Upgrading my graphics capability was easily the most noticeable upgrade from my prior build. My previous MSI GTX970 was so solid, I went that way again with a MSI GTX1070 Gaming X 8G. I’m a huge fan of these cards. This one in particular comes with a back plate, 8GB GDDR5 memory @ 8108MHz, a tunable GPU clock and fans that only run when temps exceed 60C (which isn’t very often). The 1080 model gets you higher GPU and memory clocks with more CUDA cores but looking at the benchmarks it barely outperforms its lesser sibling the 1070. I haven’t yet been able to throw anything that this card couldn’t handle at ultra resolution settings, so far Dishonored2, Mafia3 and Dues Ex: Mankind Divided.

Credit: MSI

PSU

Reusing now for its third build in a row, my trusty Silverstone ST55F. Full modularity providing up to 600w max output with a 140mm footprint at 80Plus Gold efficiency… Hard to beat this one, unless your case can’t accept the depth. One step above is the newer ST55F with the platinum efficiency rating. The PP05 short cable kit worked well again and kept cable clutter to a minimum, although I don’t require many cables anymore. And now with the NVMe’s I used a full cable less no longer needing to power 2 extra SATA devices. As always, make sure to check JonnyGuru.com first for thorough and detailed PSU reviews, the ST55F scores very high there.

Credit: Silverstone

 

Fans

High quality SSO-bearing driven cooling with a low decibel rating and zero vibration is worth the small upcharge. Noctua makes the best fans on the market, so outfitted the entire case. Dual NF-F12 PWM 120mm at the top for downward intake and 1 x NF-A14 140mm fan at the rear for exhaust, in addition to the 120mm NF-F12 on the CPU cooler. I can’t recommend these highly enough.

Credit: Noctua

Case and Complete Build

Once again I built into my BitFenix Phenom M which remains my favorite as easily the best case I’ve ever built with. If you go this route I highly recommend you flip the side panels putting the panel with the buttons and USB ports on the left side of the case instead of the default right side. Otherwise, any time you need to open the case you will have to contend with cords and possibly unplugging cables from a power switch or USB header. 

 

Credit: BitFenix

 

The Phenom is a mix of steel and soft plastic with the same basic air flow design as the SG09: top down cooling, rear and optional bottom exhaust. This goes against the natural order of things (heat rising) but works very well with 2 x 120mm NF-F12 fans blowing cool air down into the case, directly into the GPU, down into the CPU cooler’s fan blowing across the tower fins and 140mm at the back to exhaust. The GPU breaths cool air, blows it across the PCB and out the rear slot. The rear fan can be either 120mm or 140mm, I opted for the later which is audible at times briefly under load but provides extremely effective cooling. Otherwise this build is silent! The black removable dust filter along the top of the case is the only filtered air input aside from the mesh in the front cover used by the solely PSU, which exhausts downward below the case. The case supports a 120mm radiator, such as the Corsair H80i, if desired. The base section can support dual 120mm fans, a single 200mm fan, or 2 x 3.5” HDDs. The case also comes with a removable heat shield underneath to prevent exhaust heat from the PSU getting sucked back into the case. Lots of different configurations are possible with this case which is fantastic!

The image below shows my completed build and illustrates the airflow scheme per my design selections.  Ultimately this case works extremely well and keeps all components well cooled. There aren’t a lot of nooks to tuck cables into in this case so bundling is about as good as it gets which is complicated when using shorter cables. The key is to not block any airflow with the bundle.

 

CPU Performance

The tools I used to benchmark CPU are PassMark PerformanceTest 9 and Cinebench. The CPU Mark results with the 2 reference parts tells the tale, there is only a very small incremental gain over my previous 4790K. The gain is even smaller from the reference 6700K Skylake part, reinforcing the fact that if you own Skylake now, there is really no point at all upgrading to Kaby Lake. My results for all components ranked within the 99th percentile according to PassMark.

 

Similar story on Cinebench but as is the case with all the benchmarks, there’s a ton of variance and a wide spread in the scores.

 

Memory Performance

For the Memory Mark I included a few other high-end PC builds with varying memory speeds. Not sure how useful this bench is honestly due to that top result on slower DDR3, shown below. Generally not a lot of variance here further suggesting that paying for higher RAM clocks is probably not worth it.

 

Graphics Performance

At the time I bought my new MSI GTX 1070 Gaming X board, the benchmarks showed this card barely below the top dog 1080, but for a significant cash savings. Coming from the 970, the 1070 is improved in every way with more memory, more cores, higher clocks and higher supported resolutions (4K). There was nothing my prior 970 couldn’t handle, as long as I had my games properly optimized, which meant a setting quality, at best, to “high”. With the 1070 I can run very intensive titles like Dishonored 2 at Ultra settings with no trouble at all.

 

+3500 points in the 3D graphics mark test:

 

 

Old vs new on the Unigine Heaven bench, pretty significant difference even running the 1070 a step higher at Ultra:

  

 

 

 

960 Pro vs 960 Evo

Ok prepared to be amazed, here are the 960’s in action. This test consists of running a simple Crystal Disk Mark bench while monitoring IOPS with Perfmon. The 960 Pro generates nearly 200,000 read IOPS while clocking a staggering 3255 MB/s, which as amazing as that is, is actually much lower than what this part should be capable of!

 

 

Writes clock in around 140,000 IOPS with 1870MB/s on the 32 queues sequential test.

 

 

The highest numbers I observed in these tests were 3422MB/s reads and 1990MB/s writes. Impressive, but still below the advertised 3500/ 2100MB/s and the performance of this module is not nearly as consistent as it should be.The 960 Evo is no slouch either and actually outperforms the Pro in these tests!! This definitely should not be the case.

 

Running the same bench, the 960 Evo generated 203,000 IOPS while pushing 3335 MB/s. This is higher than the Pro module in both areas.

 

Reads are also impressive here and while the Evo doesn’t beat the Pro in MB/s, it does clock in 13,000 IOPS higher.

 

Running the benchmark within Samsung Magician draws an even starker picture. The Evo absolutely kills the Pro in read performance but the Pro wins the write bench by a small margin. These results should not be.

 

In light of this, the Pro module is going back, I’ll update this post with new numbers once I receive my replacement part. If the replacement doesn’t perform, it too will be going back to be replaced by another Evo module instead. As is stands, the Evo’s performance is very predictable, uncomfortably close to the Pro and for $65 less it may be the better buy, if you don’t mind a shorter 3 year warranty.

 

Just for comparison, here is the DiskMark for my older Samsung 840 Pro which is a solidly performing SSD:

 

Temps & Power Consumption

Even with my case’s upside-down airflow design, it does a good job keeping things cool. This is also in no small part due to the incredible Noctua fans which are literally the best you can buy and well worth it. I run my PC using the default Balanced power plan which allows the CPU to throttle down during idle or low activity periods. At idle with fans running at low speeds and the core voltage at .7v, the CPU chirps along at a cozy 38C with the fans barely spinning.

 

Under benchmark load with the CPU pegged at 100%, temps shift between low 50s to low 60’s with fans shifting their RPMs upward in kind. I touched 66 a couple of times but couldn’t capture the screenshot. 63 degrees under load I’d say is very good for this CPU! Once the Arctic Silver 5 thermal compound breaks in completely this should get a touch better.

 

During the PassMark bench I recorded the temps, voltage and frequency as shown below. Here you can see the full affect of the Kaby Lake stock Turbo in play with the CPU running its max clock of 4.5GHz. 55 degrees recorded with the fans at less than 50%. You can also see the XMP2 profile in affect with the RAM clock at 3000MHz.

 

I recorded power draw at idle and under load with a heavy graphics bench, where the PC will draw its highest wattage with the GPU thirstily slurping juice. At its lowest observed reading with absolutely nothing happening, I recorded 41W at the wall. Very efficient.

 

Under the heavy graphics bench, the highest reading I observed was 244w at the wall. As it stands, given my components, their power draw and my hungriest graphical workload, my 550W PSU is more than sufficient for this build.

 

Build Summary

All in all this is a pretty exciting year to build a high-end PC with loads of improvements available in graphics and disk.

 

Motherboard Asus Z270M-Plus $150
CPU Intel Core i7-7700K $379
CPU Cooler Noctua NH-U12S (reuse)
Memory G.Skill TridentZ $200
OS NVME
Apps NVMe
Data HDD
Samsung 960 Pro
Samsung 960 Evo
Seagate Baracuda
$330
$250
$130
Graphics MSI GTX 1070 Gaming X $399
PSU Silverstone SST-ST55F-G (reuse)
Fans Noctua NF-F12 PWM
NF-A14 PWM
(reuse)
Case BitFenix Phenom M (reuse)
OS Window10 Pro  
Monitors 2 x Dell U2415b (reuse)
Mouse & Keyboard Corsair M65/ K70 (reuse)
                       Total invested   $1838

Recent Comments

Popular Posts

Powered by Blogger.

Twitter Feed

.

.