Technology.Life.Insight.

Google Wifi Behind a Firewall

More specifically behind a FIOS firewall but really much of the information below applies to any firewall you might put Google Wifi behind. I’ll dig into what you need to know and how to make some key services work while buried in a double-NAT, double firewall deployment.

Google Wifi (gWifi) is a relatively new mesh networking solution from Google that has a simple goal: Provide stupid easy, crazy fast, blanketed wifi coverage for your entire house regardless of size. This is done via very small (4.17” diameter) but very aesthetically pleasing Wifi “Points”, as Google calls them, that you place in key areas of your home to provide coverage. One Wifi Point will act as primary and connect via a BaseT Ethernet cable to your existing modem or router. The other Points will connect to the primary wirelessly and expand the signal of a single Wireless LAN (WLAN). The gWifi solution operates on both the 2.4GHz and 5GHz bands but unlike most other wireless routers, gWifi enables seamless and automatic switching between the bands as required, using a single SSID. Every client available today is supported here: 802.11a/b/g/n/ac. No need to set up and manually switch between discrete SSIDs for each band. gWifi will place your device on the best possible band given its capabilities and proximity to the gWifi Points.

Easily one the prettiest networking devices around

With the primary router cabled to the FIOS router, there is one additional switched Ethernet port available on that device and two ports available in each secondary mesh Point for wired Ethernet clients. Need more coverage? Add more gWifi Points. Unlike most routers there is no web interface here to access for management, you must use the gWifi app installed on your mobile device. The available features are intentionally limited to keep ease of use high, so those looking for advanced features like VPN may not find what they need here. Luckily, everything I need appears to be intact: custom DNS, DHCP reservations and port forwarding. Another important callout is that you cannot currently change the IP range that gWifi uses for DHCP assignment, it is and always will be 192.168.86.0/24. If this range is already in use on your network for some reason, gWifi will shift to 192.168.85.0/24.


General setup is fairly simple: Connect an Ethernet cable to the WAN port of your primary gWifi Point from the switch side of your modem or firewall and power on. Download the gWifi app to your phone or tablet, find the device and follow the prompts to configure. Additional Points are easily added through the setup process which will help to ensure ideal placement for maximum coverage. Connect your wireless clients to the new network like normal and you’re done. The rest of this post will dig into more advanced topics around presenting services connected to the gWifi network out to the internet, or to clients connected to the FIOS network.

View of my gWifi mesh network

Network Architecture

In Google’s Wifi documentation, the clear preferred deployment model is the Google router connected directly to a broadband modem. This enables Google Wifi to be the primary router, AP and firewall of the house brokering all connections to the internet. Well, what if you don’t have a cable modem because you have FIOS or similar service? Or don’t really want gWifi as your barrier to the big bad web? In my particular case I use FIOS for internet access only, my TV service is provided via PlayStation Vue so I have no Set Top Boxes (STB) and my phone service is provided by Ooma. For those of us with FIOS, we have ONT boxes in our garages that connect to a router inside the house. If you have speeds >100Mbps this connection is made using a Cat5e cable instead of coax for lower speeds. I’ve read that while it appears possible to successfully replace the FIOS router with gWifi, there are some mixed results doing this. Really, this should only be considered if you are an internet-only customer in my view. If you have STBs, this won’t work as you need a coax connection between them and the router to complete the MoCa network. The easiest thing to do is simply leave the FIOS router in place as is and install the gWifi pieces behind it.

What this will create, is 2 stateful firewalls, directly connected LAN to WAN, each with their own separate subnets, with the Google device having a leg in both networks. Leave DHCP enabled on the FIOS router to serve the clients directly connected via the Ethernet switch, including the WAN port of the Google router. There is no way currently to manually specify a static IP address for the gWifi router, so it must receive via DHCP.

Important to note that the gWifi router can be put into bridge mode, thus making your FIOS router ‘primary’ but doing so will disable the mesh capabilities of the gWifi system. Only do this if you have a single gWifi router or knowingly want to disable mesh (why buy gWifi??).


The diagram below displays my home network and how I installed the gWifi system. The FIOS router is still my primary firewall and gateway to the internet, everything ultimately connects behind this device. I have a few devices that connect directly to the FIOS network and a 1Gb run that goes between floors that feeds the primary gWifi router from the FIOS router. The only device I have currently hard wired to the gWifi network is my media server (Plex/ NAS). This gives me 1Gb wired end to end from my PC to my NAS downstairs. All other devices like my PS4, Smart TVs, kid and Chromecast devices connect via wifi. The gWifi devices are depicted as G1, G2 and G3 below.


You’ll notice that the G1 router depicted above, has two IP addresses, 192.168.1.22 on the WAN side and 192.168.86.1 on the LAN side. The .22 address was assigned via DHCP from the FIOS router which I reserved so the G1 device will always receive this IP. This is important as we will configure an upstream static routing rule later that will point to this address, so we need it to not change. The gWifi routers will assign all IPs to all clients they service directly, wired or wireless. The G2 and G3 routers simply serve as extensions of G1 and will serve any clients that connect via proximity. By default, any device attached to the FIOS network (192.168.1.0) will have no knowledge of the .86 network, nor how to get to anything that lives there. So we have to tell the FIOS router how to find the .86 network, if I ever want my PC to connect to file shares hosted my my NAS, for example.


The main configuration activities I’ll cover here are:

  • Reserving IP addresses
  • Routing between networks
  • Port Forwarding key services
  • Configuring the Windows Firewall
  • Plex remote access


Reserving IP Addresses

In FIOS

First, find the IP assigned to the WAN side of the Google router via the gWifi app under Network Device Settings: Settings > Network & general > wifi points > [primary device] :

Login to the FIOS router and navigate to: Advanced > IP Address Distribution, then click the “Connection List” button. Find the gWifi WAN IP assignment in the connection list, click the pencil to edit and check the box to set the “Static Lease type”. This will ensure that the Google WAN port will always receive this IP.

In gWifi

Port forwarding rules can only be applied to reserved IP addresses, so lock in any PCs or Servers you intend to be configured with rules. To reserve an IP address assigned to any client on the gWifi network, open the app and navigate to: Network & general > Advanced networking > DHCP IP reservations and tap the green + button in the lower corner. Select the chosen device in the list and tap next. The MAC address and type of device is displayed, along with the gWifi Point to which the device is attached. Accept the current IP assignment or change it to suit your needs and tap next to save. 

Routing

As the network exists right now, my PC can’t reach my media server on the .86 network as it doesn’t know how to get there. I could add a persistent route on my PC which would solve the problem for my PC, but a better option would be to configure a global routing rule on the FIOS router. This will enable any client on the FIOS network to be able to connect to the hosts or clients on the gWifi network. Let the router do its job: route traffic. The WAN port of the gWifi router is the gateway for any traffic destined to the .86 network. All traffic in or out of the .86 network will pass through this port. So my routing rule needs to send all traffic from the 192.168.1.0/24 network destined for the 192.168.86.0/24 network to 192.168.1.22. To illustrate this further, the image below depicts the connection from my PC to the the media server and associated hops along the way:

On the FIOS router, navigate to Advanced > Routing and click the “New Route” link in red. Enter the pertinent details and click apply. Traffic is now flowing in the right directions.


Port Forwarding

All network-accessible services, running on a PC or server, listen for connections via network ports, opened within those applications. TCP and UDP protocols with specific port numbers are assigned to applications and made available for access, such as HTTP (web server) listening on TCP 80. For my scenario, I have several ports that I need to open from my media server to make accessible to my PC. Per the diagram below, TCP 445 = SMB (file services), TCP 3389 = RDP and TCP 32400 = Plex server. Anything on the gWifi network that you wish to expose to the internet will need to be port forwarded on both the gWifi and FIOS routers, unless it advertises itself as a UPnP (Universal Plug and Play) capable service that can set its own rule on the router, which gWifi will allow. Plex happens to be one of those services, so TCP 32400 is automatically opened through the gWifi router. Currently there is no way to view or control UPnP-configured rules in the gWifi app, so you could have a slew of ports opened and not even know it. The only option at the moment is to disable UPnP entirely within the gWifi app. Hopefully Google will fix this in the near future.


The other two ports to be opened are done so with manual port forwarding rules in the gWifi app. Navigate to: Network & general > Advanced networking > Port forwarding and tap the green + button. Choose a device to forward a port from, which will only be possible if you reserved the IP address on that device. Select TCP, UDP or both as well as the internal and external ports. Both are required here. Tapping done will create the rule and set it as active.  

     


After this step all required rules are in place on both the FIOS and gWifi router and the media server is accessible from the FIOS network. The image below shows everything configured so far along with the traffic flow.


Windows Firewall

Because my media server is a Windows box, if you need to access resources connected to the gWifi network such as NAS or Plex, you must allow these connections through the Windows firewall. By default, the Windows Firewall blocks all inbound connections unless a specific rule exists otherwise. Create a new custom firewall rule on the PC or server allowing traffic from the gateway of the FIOS network: 192.168.1.1/24. This will treat all traffic sourced from the FIOS network as trusted. Remember this entire network segment is already behind a firewall to the internet, the FIOS router. So you can safely port forward hosts on the gWifi network to the inside of the FIOS network.  The new rule should be custom allowing all programs and all ports when applied to the remote IP address of the FIOS router. If you want to get more granular and only allow access to the specific ports you are exposing, that is an option as well.



Plex Remote Access

One additional step is required if you want to configure Plex with remote access. An additional port must be forwarded out from the FIOS router to allow an internal connection to TCP 32400. Log into the FIOS router and navigate to Firewall Settings > Port Forwarding. Add a new rule and make sure to click the Advanced button. Enter the IP addresses, ports and dropdowns per the screenshot below. Choose specify IP, enter the IP address of the Plex server, forward to port 32400. For the destination port, which is what will be exposed to the internet, you can either custom create the number or match the random 5-digit port generated in the Plex settings. Either way, these port numbers must match between the Plex server and the FIOS router. It’s probably best to not expose TCP 32400 directly to the internet. I don’t know of a Plex specific exploit but the service behind this port is well known so best not to advertise what it is.

 

Once the rule is in place and active, the Plex service should report as fully accessible. If the following dialog reports any issues, try specifying a manual public port and recreate the port forwarding rule on the FIOS router.

image


The diagram below depicts the remote access connection originating on the public internet, forwarding to the Plex service inside through both routers. Now your Plex server is accessible from anywhere on the internet. Remember that the public port can be any number you want.

image


Final Thoughts

Overall my experience with the gWifi system has been quite good. Elegant, unobtrusive, high performance and so simple. There’s a lot of high-end competition in the mesh wifi business right now and Google does a really good job, assuming you don’t require too many advanced features. Google also includes some decent performance testing tools in the gWifi app so you can gauge the performance of the speed to the internet, the wifi clients and the mesh itself as well as figure out where any problems might lie.

   


You can view a device’s network consumption real time and shut down any client you choose or give that device network priority for up to 4 hours. Being able to group and schedule internet access for particular devices is especially useful as a parent. If you have Sonos in the house, make sure to setup all devices using Standard mode connected to the new gWifi network. Boost mode will still work if required, but make sure to keep all devices on the same network for ease of interoperability. It appears problematic to have a controller on the other side of a firewall with all the Sonos devices behind it, due to the number of ports and connections required.

VMware vSphere HA Mode for vGPU

The following post was written and scenario tested by my colleagues and friends Keith Keogh and Andrew Breedy of the Limerick, Ireland CCC engineering team. I helped with editing and art. Enjoy.

 

VMWare has added High Availability (HA) for virtual machines (VM’s) with an NVIDIA GRID vGPU shared pass-through device in vSphere 6.5. This feature allows any vGPU VM to automatically restart on another available ESXi host with an identical NVIDIA GRID vGPU profile if the VM’s hosting server has failed. 
To better understand how this feature would work in practice we decided to create some failure scenarios. We tested how HA for vGPU virtual machines performed in the following situations:
1)  Network outage: Network cables get disconnected from the virtual machine host.
2)  Graceful shutdown: A host has a controlled shut down via the ESXi console.
3)  Power outage: A host has an uncontrolled shutdown, i.e. the server loses AC power.
4)  Maintenance mode: A host is put into maintenance mode.


The architecture of the scenario used for testing was laid out as follows:
  

ao

 

Viewed another way, we have 3 hosts, 2 with GPUs, configured in an HA VSAN cluster with a pool of Instant Clones configured in Horizon.

clus1
 

1)  For this scenario, we used three Dell vSan Ready Node R730 servers configured per our C7 spec, with the ESXi 6.5 hypervisor installed, clustered and managed through a VMWare vCenter 6.5 appliance. image
2)  VMWare VSAN 6.5 was configured and enabled to provide shared storage.
3)  Two of the three servers in the cluster had an NVIDIA M60 GPU card and the appropriate drivers installed in these hosts. The third compute host did not have a GPU card.
4)  16 vGPU virtual machines were placed on one host. The vGPU VM’s were a pool of linked clones created using VMWare Horizon 7.1. The virtual machine Windows 10 master image had the appropriate NVIDIA drivers installed and an NVIDIA GRID vGPU shared PCI pass through device added.
5)  All 4GB of allocated memory was reserved on the VM’s and the M60-1Q profile assigned which has a maximum of 8 vGPU’s per physical GPU. Two vCPU’s were assigned to each VM.
6)  High Availability was enabled on the vSphere cluster.


Note that we intentionally didn’t install an M60 GPU card in the third server of our cluster. This allowed us to test the mechanism that allows the vGPU VM’s to only restart on a host with the correct GPU hardware installed and not attempt to restart the VM’s on a host without a GPU. In all cases this worked flawlessly, the VM’s always restarted on the correct backup host with the M60 GPU installed.

 

s1

Scenario One


To test the first scenario of a network outage we simply unplugged the network cables from the vGPU VM’s host server. The host showed as not responding in the vSphere client and the VM’s showed as disconnected within a minute of unplugging the cables.  After approximately another 30 seconds all the VM’s came back online and restarted normally on the other host in the cluster that contained the M60 GPU card.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scenario Twos2


A controlled shutdown down of the host through ESXi worked in a similar manner. The host showed as not responding in the vSphere client and the VM’s showed as disconnected as the ESXi host began shutting down the VSAN services as part of ESXi’s shutdown procedure. Once the host had shut down, the VM’s restarted quite quickly on the backup host. In total, the time from starting the shutdown of the host until the VM’s restarted on the backup host was approximately a minute and a half.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

s3

Scenario Three


To simulate a power outage, we simply pulled the power cord of the vGPU VM’s host. After approximately one minute the host showed as not responding and the VM’s showed as disconnected in the vSphere client. Roughly 10 seconds later the vGPU VM’s had all restarted on the back up host.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

s4

Scenario Four


Placing the host in maintenance mode worked as expected in that any powered on VMs must first be powered off on that host before completion can occur. Once powered off, the VM’s were then moved to the backup host but not powered on (the ‘Move powered-off and suspended virtual machines to other hosts in the cluster’ option was left ticked when entering maintenance mode). The vGPU-enabled VM’s were moved to the correct host each time we tested this: the host with the M60 GPU card.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In conclusion, the HA failover of vGPU virtual machines worked flawlessly in all cases we tested. In the context of speed of recovery, we made the following observations with regard to the scenarios tested. The unplanned power outage test recovered the quickest at 1 minute 10 seconds followed by the controlled shutdown, with the network outage taking slightly longer again to recover.

Failure Scenario Approximate Recovery Time
Network Outage 1 min, 30 sec
Controlled Shutdown 1 min, 30 sec
Power Outage 1 min, 10 sec


In a real world scenario the HA failover for vGPU VM’s would of course require an unused GPU card in a backup server for it to work. In most vGPU environments you would probably want to fill the GPU to its maximum capacity of vGPU based VM’s to gain maximum value from your hardware. In order to provide a backup for a fully populated vGPU host, it would mean having a fully unused server and GPU card lying idle until circumstances required it. It is also possible to split the vGPU VM’s between two GPU hosts, placing, for example 50% of the VM’s on each host. In the event of a failure the VM’s will failover to the other host. In this scenario though, each GPU would be used to only 50% of its capacity.


An unused back-up GPU host may be ok in larger or mission critical deployments but may not be very cost effective for smaller deployments. For smaller deployments it would be recommended to leave a resource buffer on each GPU server so that if there was a host failure, there is room on the GPU enabled server to facilitate these vGPU Enabled VM’s.

The Native MSFT Stack: S2D & RDS – Part 3

Part 1: Intro and Architecture

Part 2: Prep & Configuration

Part 3: Performance & Troubleshooting (you are here)

Make sure to check out my series on RDS in Server 2016 for more detailed information on designing and deploying RDSH or RDVH.

 

Performance

This section will give an idea of disk activity and performance given this specific configuration as it relates to S2D and RDS.

Here is the 3-way mirror in action during the collection provisioning activity, 3 data copies, 1 per host, all hosts are active, as expected:

 

Real-time disk and RDMA stats during collection provisioning:

 

RDS provisioning isn’t the speediest solution in the world by default as it creates VMs one by one. But it’s fairly predictable at 1 VM per minute, so plan accordingly.

 

Optionally, you can adjust the concurrency level to adjust the number of VMs RDS can create in parallel by using the Set-RDVirtualDesktopConcurrency CMDlet. Server 2012R2 supported a max of 20 concurrent operations but make sure the infrastructure can handle whatever you set here.

Set-RDVirtualDesktopConcurrency –ConnectionBroker “RDCB name” –ConcurrencyFactor 20

 

To illustrate the capacity impact of 3-way mirroring, take my lab configuration which provides my 3-node cluster with 14TB of total capacity, each node contributing 4.6TB. I have 3 volumes fixed provisioned totaling 4TB with just under 2TB remaining. The math here is that of my raw capacity, 30% is usable, as expected. This equation gets better at larger scales, starting with 4 nodes, so keep this in mind when building your deployment.

 

Here is the disk performance view from one of the Win10 VMs within the pool (VDI-23) running Crystal DiskMark, which runs Diskspd commands behind the scenes. Read IOPS are respectable at an observed max of 25K during the sequential test.

image_thumb[3]

 

Write IOPS are also respectable at a max observed of nearly 18K during the Sequential test.

image_thumb[4]

 

Writing to the same volume, running the same benchmark but from the host itself, I saw some wildly differing results. Similar bandwidth numbers but poorer showing on most tests for some reason. One run yielded some staggering numbers on the random 4K tests which I was unable to reproduce reliably.

image_thumb[5]

Running Diskspd tells a slightly different story with more real-world inputs specific to VDI: write intensive. Here given the punishing workload, we see impressive disk performance numbers and almost 0 latency. The following results were generated from a run generating a 500MB file, 4K blocks weighted at 70% random writes using 4 threads with cache enabled.

Diskspd.exe -b4k -d60 -L -o2 -t4 -r -w70 -c500M c:\ClusterStorage\Volume1\RDSH\io.dat

image_thumb[6]

 

Just for comparison, here is another run with cache disabled. As expected, not so impressive.

image_thumb[8]

 

Troubleshooting

Here are a few things to watch out for along the way.

When deploying your desktop collection, if you get an error about a node not having a virtual switch configured:

 

One possible cause is that the default SETSwitch was not configured for RDMA on that particular host. Confirm and then enable per the steps above using Enable-NetAdapterRDMA.

 

On the topic of VM creation within the cluster, here is something kooky to watch out for. If working on one of your cluster nodes locally and you try to add a new VM to be hosted on a different node, you may have trouble. When you get to the OS install portion where the UI offers to attach an ISO to boot from. If you point to an ISO sitting on a network share, you may get the following error: Failing to add the DVD device due to lack of permissions to open attachment.

 

You will also be denied if you try to access any of the local paths such as desktop or downloads.

 

The reason for this is that the dialog is using the remote browser for file access on the server you are creating the VM. Counter-intuitive perhaps but the work around is to either log into the host where the VM is to be created directly or copy your ISO local to that host. 

 

Watch this space, more to come…

 

Part 1: Intro and Architecture

Part 2: Prep & Configuration

Part 3: Performance & Troubleshooting (you are here)

 

Resources

S2D in Server 2016

S2D Overview

Working with volumes in S2D

Cache in S2D

The Native MSFT Stack: S2D & RDS – Part 2

Part 1: Intro and Architecture

Part 2: Prep & Configuration (you are here)

Part 3: Performance & Troubleshooting

Make sure to check out my series on RDS in Server 2016 for more detailed information on designing and deploying RDSH or RDVH.

 

Prep

The number 1 rule of clustering with Microsoft is homogeneity. All nodes within a cluster must have identical hardware and patch levels. This is very important. The first step is to check all disks installed to participate in the storage pool, bring all online and initialize. This can be done via PoSH or UI.

Once initialized, confirm that all disks can pool:

Get-PhysicalDisk -CanPool $true | sort model

 

Install Hyper-V and Failover Clustering on all nodes:

Install-WindowsFeature -name Hyper-V, Failover-Clustering -IncludeManagementTools -ComputerName InsertName -Restart

 

Run cluster validation, note this command does not exist until the Failover Clustering feature has been installed. Replace portions in red with your custom inputs. Hardware configuration and patch level should be identical between nodes or you will be warned in this report.

Test-Cluster -Node node1, node2, etc -include "Storage Spaces Direct", Inventory, Network, "System Configuration"

 

The report will be stored in c:\users\<username>\AppData\Local\Temp

Networking & Cluster Configuration

We'll be running the new Switch Embedded Team (SET) vSwitch feature, new for Hyper-V in Server 2016.

 

Repeat the following steps on all nodes in your cluster. First, I recommend renaming the interfaces you plan to use for S2D to something easy and meaningful. I'm running Mellanox cards so called mine Mell1 and Mell2. There should be no NIC teaming applied at this point!

 

Configure the QoS policy, first enable the datacenter bridging (DCB) feature which will allow us to prioritize certain services the traverse the Ethernet.

Install-WindowsFeature -Name Data-Center-Bridging

 

Create a new policy for SMB Direct with a priority value of 3 which marks this service as “critical” per the 802.1p standard:

New-NetQosPolicy "SMBDirect" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3

 

Allocate at least 30% of the available bandwidth to SMB for the S2D solution:

New-NetQosTrafficClass "SMBDirect" -Priority 3 -BandwidthPercentage 30 -Algorithm ETS

 

Create the SET switch using the adapter names you specified previously. Items in red should match your choices or naming standards. Note that this SETSwitch is ultimately a vNIC and can receive an IP address itself:

New-VMSwitch -Name SETSwitch -NetAdapterName "Mell1","Mell2" -EnableEmbeddedTeaming $true

 

Add host vNICs to the SET switch you just created, these will be used by the management OS. Items in red should match your choices or naming standards. Assign static IPs to these vNICs as required as these interfaces are where RDMA will be enabled.

Add-VMNetworkAdapter –SwitchName SETSwitch –name SMB_1 –managementOS

Add-VMNetworkAdapter –SwitchName SETSwitch –name SMB_2 –managementOS

 

Once created, the new virtual interface will be visible in the network adapter list by running get-netadapter.

image_thumb

 

Optional: Configure VLANs for the new vNICs which can be the same or different but IP them uniquely. If you don’t intend to tag VLANs in your cluster or have a flat network with one subnet, skip this step.

Set-VMNetworkAdapterVLAN –VMNetworkAdapterName “SMB_1” –VlanId 00 –Access –ManagementOS

Set-VMNetworkAdapterVLAN –VMNetworkAdapterName “SMB_2” –VlanId 00 –Access –ManagementOS

 

Verify your VLANs and vNICs are correct. Notice mine are all untagged since this demo is in a flat network.

Get-VMNetworkAdapterVlan –ManagementOS

image_thumb1

 

Restart each vNIC to activate the VLAN assignment.

Restart-NetAdapter “vEthernet (SMB_1)”

Restart-NetAdapter “vEthernet (SMB_2)”

 

Enable RDMA on each vNIC.

Enable-NetAdapterRDMA “vEthernet (SMB_1)”, “vEthernet (SMB_2)”

 

Next each vNIC should be tied directly to a preferential physical interface within the SET switch. In this example we have 2 vNICs and 2 physical NICs for a 1:1 mapping. Important to note: Although this operation essentially designates a vNIC assignment to a preferential pNIC, should the assigned pNIC fail, the SET switch will still load balance vNIC traffic across the surviving pNICs. It may not be immediately obvious that this is the resulting and expected behavior.

Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName "SMB_1" -ManagementOS –PhysicalNetAdapterName “Mell1”

Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName "SMB_2" -ManagementOS –PhysicalNetAdapterName “Mell2”

 

To quickly prove this, I have my vNIC “SMB_1” preferentially tied to the pNIC “Mell1”. SMB_1 has the IP address 10.50.88.82

image_thumb3

 

Notice that even though I have manually disabled Mell1, the IP still responds to a ping from another host as SMB_1’s traffic is temporarily traversing Mell2:

image_thumb4

 

Verify the RDMA capabilities of the new vNICs and associated physical interfaces. RDMA Capable should read true.

Get-SMBClientNetworkInterface

image_thumb2

 

Build the cluster with a dedicated IP and slash subnet mask. This command will default to /24 but still might fail unless explicitly specified.

New-Cluster -name "Cluster Name" -Node Node1, Node2, etc -StaticAddress 0.0.0.0/24

 

Checking the report output stored in c:\windows\cluster\reports\, it flagged not having a suitable disk witness. This will auto-resolve later once the cluster is up.

 

The cluster will come up with no disks claimed and if there is any prior formatting, they must first be wiped and prepared. In PowerShell ISE, run the following script, enter your cluster name in the red text.

icm (Get-Cluster -Name S2DCluster | Get-ClusterNode) {

Update-StorageProviderCache

Get-StoragePool | ? IsPrimordial -eq $false | Set-StoragePool -IsReadOnly:$false -ErrorAction SilentlyContinue

Get-StoragePool | ? IsPrimordial -eq $false | Get-VirtualDisk | Remove-VirtualDisk -Confirm:$false -ErrorAction SilentlyContinue

Get-StoragePool | ? IsPrimordial -eq $false | Remove-StoragePool -Confirm:$false -ErrorAction SilentlyContinue

Get-PhysicalDisk | Reset-PhysicalDisk -ErrorAction SilentlyContinue

Get-Disk | ? Number -ne $null | ? IsBoot -ne $true | ? IsSystem -ne $true | ? PartitionStyle -ne RAW | % {

$_ | Set-Disk -isoffline:$false

$_ | Set-Disk -isreadonly:$false

$_ | Clear-Disk -RemoveData -RemoveOEM -Confirm:$false

$_ | Set-Disk -isreadonly:$true

$_ | Set-Disk -isoffline:$true

}

Get-Disk |? Number -ne $null |? IsBoot -ne $true |? IsSystem -ne $true |? PartitionStyle -eq RAW | Group -NoElement -Property FriendlyName

} | Sort -Property PsComputerName,Count

Once successfully completed, you will see an output with all nodes and all disk types accounted for.

 

Finally, it’s time to enable S2D. Run the following command and select “yes to all” when prompted. Make sure to use your cluster name in red.

Enable-ClusterStorageSpacesDirect –CimSession S2DCluster

 

Storage Configuration

Once S2D is successfully enabled, there will be a new storage pool created and visible within Failover Cluster Manager. The next step is to create volumes within this pool. S2D will make some resiliency choices for you depending on how many nodes are in your cluster. 2 nodes = 2-way mirroring, 3 nodes = 3-way mirroring, if you have 4 or more nodes you can specify mirror or parity. When using a hybrid configuration of 2 disk types (HDD and SSD), the volumes reside within the HDDs as the SSDs simply provide caching for reads and writes. In an all-flash configuration only the writes are cached. Cache drive bindings are automatic and will adjust based on the number of each disk type in place. In my case, I have 4 SSDs + 5 HDDs per host. This will net a 1:1 cache:capacity map for 3 pairs of disks and a 1:2 ratio for the last 3. Microsoft’s recommendation is to make the number of cache drives a multiple of the number of capacity drives, for simple symmetry. If a host experiences a cache drive failure, the cache to capacity mapping will readjust to heal. This is why a minimum of 2 cache drives per host are recommended.

 

Volumes can be created using PowerShell or Failover Cluster Manager by selecting the “new disk” option. This one simple PowerShell command does three things: creates the virtual disk, places a new volume on it and makes it a cluster shared volume. For PowerShell the syntax is as follows:

New-Volume –FriendlyName “Volume Name” –FileSystem CSVFS_ReFS –StoragePoolFriendlyName S2D* –size xTB

 

CSVFS_ReFS is recommended but CSVFS_NTFS may also be used. Once created these disks will be visible under the Disks selection within Failover Cluster Manager. Disk creation within the GUI is a much longer process but gives meaningful information along the way, such as showing that these disks are created using the capacity media (HDD) and that the 3-server default 3-way mirror is being used.

 

The UI also shows us the remaining pool capacity and that storage tiering is enabled.

 

Once the virtual disk is created, next we need to create a volume within using another wizard.  Select the vDisk created in the last step:

 

Storage tiering dictates that the volume size must match the size of the vDisk:

 

Skip the assignment of a drive letter, assign a label if you desire, confirm and create.

 

 

The final step is to add this new disk to a Cluster Shared Volume via Failover Cluster Manager. You’ll notice that the owner node will be automatically assigned:

 

Repeat this process to create as many volumes as required.

 

RDS Deployment

Install the RDS roles and features required and build a collection. See this post for more information on RDS installation and configuration. There is nothing terribly special that changes this process for a S2D cluster. As far as RDS is concerned, this is an ordinary Failover Cluster with Cluster Shared Volumes. The fact that RDS is running on S2D and “HCI” is truly inconsequential.

As long as the RDVH or RDSH roles are properly installed and enabled within the RDS deployment, the RDCB will be able to deploy and broker connections to them. One of the most important configuration items is ensuring that all servers, virtual and physical, that participate in your RDS deployment, are listed in the Servers tab. RDS is reliant on Server Manager and Server Manager has to know who the players are. This is how you tell it.

image_thumb[1]

 

To use the S2D volumes for RDS, point your VM deployments at the proper CSVs within the C:\ClusterStorage paths. When building your collection, make sure to have the target folder already created within the CSV or collection creation will fail. Per the example below, the folder “Pooled1” needed to be manually created prior to collection creation.

 

During provisioning, you can select a specific balance of VMs to be deployed the RDVH-enabled hosts you select, as can be seen below.

 

RDS is cluster aware in that the VMs it creates can be created as HA within the cluster and the RD Connection Broker (RDCB) remains aware as to which host is running which VMs should they move. In the event of a failure, VMs will be moved to a surviving host by the cluster service and the RDCB will keep track.

 

Part 1: Intro and Architecture

Part 2: Prep & Configuration (you are here)

Part 3: Performance & Troubleshooting

 

Resources

S2D in Server 2016

S2D Overview

Working with volumes in S2D

Cache in S2D

Recent Comments

Popular Posts

Powered by Blogger.

Twitter Feed

.

.