SPARC Logical Domains: Alternate Service Domains Part 1
In this series, we will be going over configuring alternate I/O and Service domains, with the goal of increasing the serviceability the SPARC T-Series servers without impacting other domains on the hypervisor. Essentially enabling rolling maintenance without having to rely on live migration or downtime. It is important to note, that this is not a cure-all, for example, base firmware updates would still be interruptive, however minor firmware such as disk and I/O cards only should be able to be rolled.
In Part One we will go through the initial Logical Domain configuration, as well as mapping out the devices we have and if they will belong in the primary or the alternate domain.
In Part Two we will go through the process of creating the alternate domain and assigning the devices to it, thus making it independent of the primary domain.
In Part Three we will create redundant services to support our Logical Domains as well as create a test Logical Domain to utilize these services.
Initial Logical Domain Configuration
I am going to assume that your configuration is currently at the factory default and that you like me are using Solaris 11.2 on the hypervisor.
# ldm ls<br />
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME<br />
primary active -n-cv- UART 256 511G 0.4% 0.3% 6h 24m
The first thing we need to do is remove some of the resources from the primary domain, so that we are able to assign them to other domains. Since the primary domain is currently active and using these resources we will enable delayed reconfiguration mode, this will accept all changes, and then on a reboot of that domain (in this case primary which is the control domain – or the physical machine) it will enable the configuration.
# ldm start-reconf primary<br />
Initiating a delayed reconfiguration operation on the primary domain.<br />
All configuration changes for other domains are disabled until the primary<br />
domain reboots, at which time the new configuration for the primary domain<br />
will also take effect.
Now we can start reclaiming some of those resources, I will assign 2 cores to the primary domain and 16GB of RAM.
# ldm set-vcpu 16 primary<br />
——————————————————————————<br />
Notice: The primary domain is in the process of a delayed reconfiguration.<br />
Any changes made to the primary domain will only take effect after it reboots.<br />
——————————————————————————<br />
ldm set-memory 16G primary<br />
——————————————————————————<br />
Notice: The primary domain is in the process of a delayed reconfiguration.<br />
Any changes made to the primary domain will only take effect after it reboots.<br />
——————————————————————————
I like to add configurations often when we are doing a lot of changes.
# ldm add-config reduced-resources
Next we will need some services to allow us to provision disks to domains and to connect to the console of domains for the purposes of installation or administration.
# ldm add-vdiskserver primary-vds0 primary<br />
——————————————————————————<br />
Notice: The primary domain is in the process of a delayed reconfiguration.<br />
Any changes made to the primary domain will only take effect after it reboots.<br />
——————————————————————————<br />
# ldm add-vconscon port-range=5000-5100 primary-vcc0 primary<br />
——————————————————————————<br />
Notice: The primary domain is in the process of a delayed reconfiguration.<br />
Any changes made to the primary domain will only take effect after it reboots.<br />
——————————————————————————
Let’s add another configuration to bookmark our progress.
# ldm add-config initial-services
We need to enable the Virtual Network Terminal Server service, this allows us to telnet from the control domain into the other domains.
# svcadm enable vntsd
Finally a reboot will put everything into action.
# reboot
When the system comes back up we should see a drastically different LDM configuration.
Identify PCI Root Complexes
All the T5-2’s that I have looked at have been laid out the same, with the SAS HBA and onboard NIC on pci_0 and pci_2, and the PCI Slots on pci_1 and pci_3. So to split everything evenly pci_0 and pci_1 stay with the primary, while pci_2 and pci_3 go to the alternate. However so that you understand how we know this I will walk you through identifying the complex as well as the discreet types of devices.
# ldm ls -l -o physio primary</p>
<p>NAME<br />
primary</p>
<p>IO<br />
DEVICE PSEUDONYM OPTIONS<br />
pci@340 pci_1<br />
pci@300 pci_0<br />
pci@3c0 pci_3<br />
pci@380 pci_2<br />
pci@340/pci@1/pci@0/pci@4 /SYS/MB/PCIE5<br />
pci@340/pci@1/pci@0/pci@5 /SYS/MB/PCIE6<br />
pci@340/pci@1/pci@0/pci@6 /SYS/MB/PCIE7<br />
pci@300/pci@1/pci@0/pci@4 /SYS/MB/PCIE1<br />
pci@300/pci@1/pci@0/pci@2 /SYS/MB/SASHBA0<br />
pci@300/pci@1/pci@0/pci@1 /SYS/MB/NET0<br />
pci@3c0/pci@1/pci@0/pci@7 /SYS/MB/PCIE8<br />
pci@3c0/pci@1/pci@0/pci@2 /SYS/MB/SASHBA1<br />
pci@3c0/pci@1/pci@0/pci@1 /SYS/MB/NET2<br />
pci@380/pci@1/pci@0/pci@5 /SYS/MB/PCIE2<br />
pci@380/pci@1/pci@0/pci@6 /SYS/MB/PCIE3<br />
pci@380/pci@1/pci@0/pci@7 /SYS/MB/PCIE4
This shows us that pci@300 = pci_0, pci@340 = pci_1, pci@380 = pci_2, and pci@3c0 = pci_3.
Map Local Disk Devices To PCI Root
First we need to determine which disk devices are in the zpool, so that we know which ones that cannot be removed.
# zpool status rpool<br />
pool: rpool<br />
state: ONLINE<br />
scan: resilvered 70.3G in 0h8m with 0 errors on Fri Feb 21 05:56:34 2014<br />
config:</p>
<p>NAME STATE READ WRITE CKSUM<br />
rpool ONLINE 0 0 0<br />
mirror-0 ONLINE 0 0 0<br />
c0t5000CCA04385ED60d0 ONLINE 0 0 0<br />
c0t5000CCA0438568F0d0 ONLINE 0 0 0</p>
<p>errors: No known data errors
Next we must use mpathadm to find the Initiator Port Name. To do that we must look at slice 0 of c0t5000CCA04385ED60d0.
# mpathadm show lu /dev/rdsk/c0t5000CCA04385ED60d0s0<br />
Logical Unit: /dev/rdsk/c0t5000CCA04385ED60d0s2<br />
mpath-support: libmpscsi_vhci.so<br />
Vendor: HITACHI<br />
Product: H109060SESUN600G<br />
Revision: A606<br />
Name Type: unknown type<br />
Name: 5000cca04385ed60<br />
Asymmetric: no<br />
Current Load Balance: round-robin<br />
Logical Unit Group ID: NA<br />
Auto Failback: on<br />
Auto Probing: NA</p>
<p>Paths:<br />
Initiator Port Name: w5080020001940698<br />
Target Port Name: w5000cca04385ed61<br />
Override Path: NA<br />
Path State: OK<br />
Disabled: no</p>
<p>Target Ports:<br />
Name: w5000cca04385ed61<br />
Relative ID: 0
Our output shows us that the initiator port is w5080020001940698.
# mpathadm show initiator-port w5080020001940698<br />
Initiator Port: w5080020001940698<br />
Transport Type: unknown<br />
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@1<br />
Initiator Port: w5080020001940698<br />
Transport Type: unknown<br />
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@2<br />
Initiator Port: w5080020001940698<br />
Transport Type: unknown<br />
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@8<br />
Initiator Port: w5080020001940698<br />
Transport Type: unknown<br />
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@4
So we can see that this particular disk is on pci@300, which is pci_0.
Map Ethernet Cards To PCI Root
First we must determine the underlying device for each of our network interfaces.
# dladm show-phys net0<br />
LINK MEDIA STATE SPEED DUPLEX DEVICE<br />
net0 Ethernet up 10000 full ixgbe0
In this case ixgbe0, we can then look at the device tree to see where it is pointing to to find which PCI Root this device is connected to.
# ls -l /dev/ixgbe0<br />
lrwxrwxrwx 1 root root 53 Feb 12 2014 /dev/ixgbe0 -> ../devices/pci@300/pci@1/pci@0/pci@1/network@0:ixgbe0
Now we can see that it is using pci@300, which translates into pci_0.
Map Infiniband Cards to PCI Root
Again let’s determine the underlying device name of our infiniband interfaces, on my machine they were defaulted at net2 and net3, however, I had previously renamed the link to ib0 and ib1 for simplicity. This procedure is very similar to Ethernet cards.
# dladm show-phys ib0<br />
LINK MEDIA STATE SPEED DUPLEX DEVICE<br />
ib0 Infiniband up 32000 unknown ibp0
In this case our device is ibp0. So now we just check the device tree.
# ls -l /dev/ibp0<br />
lrwxrwxrwx 1 root root 83 Nov 26 07:17 /dev/ibp0 -> ../devices/pci@380/pci@1/pci@0/pci@5/pciex15b3,673c@0/hermon@0/ibport@1,0,ipib:ibp0
We can see by the path, that this is using pci@380 which is pci_2.
Map Fibre Channel Cards to PCI Root
Now perhaps we need to have some Fibre Channel HBA’s split up as well, first thing we must do is look at the cards themselves.
# luxadm -e port<br />
/devices/pci@300/pci@1/pci@0/pci@4/SUNW,qlc@0/fp@0,0:devctl NOT CONNECTED<br />
/devices/pci@300/pci@1/pci@0/pci@4/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED
We can see here that these use pci@300 which is pci_0.
The Plan
Basically we are going to split our PCI devices by even and odd, with even staying with Primary and odd going with Alternate. On the T5-2, this will result on the PCI-E cards on the left side being for the primary, and the cards on the right for the alternate.
Here is a diagram of how the physical devices are mapped to PCI Root Complexes.
FIGURE 1.1 – Oracle SPARC T5-2 Front View
FIGURE 1.2 – Oracle SPARC T5-2 Rear View
References
SPARC T5-2 I/O Root Complex Connections – https://docs.oracle.com/cd/E28853_01/html/E28854/pftsm.z40005601508415.html
SPARC T5-2 Front Panel Connections – https://docs.oracle.com/cd/E28853_01/html/E28854/pftsm.bbgcddce.html#scrolltoc
SPARC T5-2 Rear Panel Connections – https://docs.oracle.com/cd/E28853_01/html/E28854/pftsm.bbgdeaei.html#scrolltoc