Failed Disk in x4500 and hdtool
Writing about web page http://download.oracle.com/docs/cd/E19962-01/820-1120-22/chapter2.html#d0e1462
Today we had a disk failure in one of the x4500s. These machines hold 48 500gb SATA disk drives, and the units are what Oracle term a CRU – Customer Replaceable Units. Which, of course means there has to be a straightforward and reliable way to identify the disk location of any given disk from the information given in the messages file. There is a very useful tool on the ‘x4500 Tools and Drivers CD” called ‘hd’ or hdtool.
The Tools and Drivers CD can be downloaded from My Oracle Support (MOS), under Patches and Updates, search using “Product or Family”, set the product to seach for to ‘x4500’ and you should find the Tools and Drivers CD in the list.
Once downloaded, unzip into /var/spool/pkg and find the .iso contained within;
bash-3.00# unzip p10335199_160_Generic.zip
Archive: p10335199_160_Generic.zip
creating: Tools_and_Drivers/
inflating: Tools_and_Drivers/X4500_Tools_And_Drivers_common_47001.zip
inflating: Tools_and_Drivers/license_agreement1.html
inflating: Tools_and_Drivers/MD5SUM-SoftwareAndDocumentation.txt
inflating: Tools_and_Drivers/readme.html
inflating: Tools_and_Drivers/X4500_Tools_And_Drivers_linux_47001.tar.bz2
inflating: Tools_and_Drivers/X4500_Tools_And_Drivers_windows_47001.zip
inflating: Tools_and_Drivers/X4500_Tools_And_Drivers_read_me.html
inflating: Tools_and_Drivers/X4500_Tools_And_Drivers_solaris_47001.tar.bz2
inflating: Tools_and_Drivers/X4500_Tools_And_Drivers_CD_47001.iso
Lofi mount it;
bash-3.00# lofiadm -a /var/spool/pkg/Tools_and_Drivers/X4500_Tools_And_Drivers_CD_47001.iso
/dev/lofi/1
bash-3.00# mount -F hsfs /dev/lofi/1 /mnt
bash-3.00# cd /mnt
Then pkgadd from mnt/solaris/tools/hdtool/SUNWhd-1.07.pkg
bash-3.00# pkgadd -d SUNWhd-1.07.pkg
The following packages are available:
1 SUNWhd Sun Fire X4500/X4540 Hard Disk Suite
(i386) 1.07
Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]:
Processing package instance <SUNWhd> from </mnt/solaris/tools/hdtool/SUNWhd-1.07.pkg>
Sun Fire X4500/X4540 Hard Disk Suite(i386) 1.07
Copyright 2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Using </opt> as the package base directory.
## Processing package information.
## Processing system information.
## Verifying package dependencies.
## Verifying disk space requirements.
## Checking for conflicts with packages already installed.
## Checking for setuid/setgid programs.
This package contains scripts which will be executed with super-user
permission during the process of installing this package.
Do you want to continue with the installation of <SUNWhd> [y,n,?] y
Installing Sun Fire X4500/X4540 Hard Disk Suite as <SUNWhd>
## Installing part 1 of 1.
/opt/SUNWhd/hd/bin/hd
/opt/SUNWhd/hd/bin/hd.html
/opt/SUNWhd/hd/bin/hdadm
/opt/SUNWhd/hd/bin/hdadm.html
/opt/SUNWhd/hd/bin/read_cache
/opt/SUNWhd/hd/bin/write_cache
[ verifying class <none> ]
## Executing postinstall script.
Installation of <SUNWhd> was successful.
bash-3.00#
You can now use this tool to view information regarding the disks in your system;
bash-3.00# ./hd -?./hd: illegal option -- ?
Usage: hd [ -c(olor mode) ] [ -s(ummary) ] [ -p(latform) ]
[ -b(ypass) to print SunFireX4500 map ]
[ -d(iagnose)] [-f { syslog_file } ]
[ -m { adjacent | cross | front2back | diagonal } Mapping pairs ]
[ -w { <pci_disk_device_path> } ]
[ -a (fdisk pArtition type) ]
[ -q (list drive slot number in seQuential list) ]
[ -g (list drive slot number in seQuential list with temperature ) ]
[ -l (List SunFireX4500/X4540 available disk in physical orders) ]
[ -r (List SMART data for all disks in drive slot number) ]
[ -R (List SMART data's indivdual id in landscape view for all disks) ]
[ -e <cXtY> (List SMART data for specified disk) ]
[ -E <cXtY> (List raw hex SMART data for specified disk) ]
[ -j (List SunFireX4500/X4540 HBA controller numbers and pci nodes) ]
[ -T (List vtoc for all drives for SunFireX4500/X4540 platform) ]
[ -t (List vtoc for specified drives) ]
[ -i (List cXtY, sd# and PCI path) ]
[ -o (List LSI HBA#, Drive Target# and cXtY) ]
[ -x (Generate hd_map.html) ]
The HD map it generates, includes, amongst other detail a simple way of showing the status of each drive:
++: Device is present and accessible.
Red: Device not enumerated or no drive in physical slot/location.
--: Device is not accessible, absent/empty or down.
For example, the following shows the failure of c4t4;
---------------------SunFireX4500------Rear----------------------------
36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47:
c3t3 c3t7 c2t3 c2t7 c5t3 c5t7 c4t3 c4t7 c1t3 c1t7 c0t3 c0t7
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:
c3t2 c3t6 c2t2 c2t6 c5t2 c5t6 c4t2 c4t6 c1t2 c1t6 c0t2 c0t6
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
c3t1 c3t5 c2t1 c2t5 c5t1 c5t5 c4t1 c4t5 c1t1 c1t5 c0t1 c0t5
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
c3t0 c3t4 c2t0 c2t4 c5t0 c5t4 c4t0 c4t4 c1t0 c1t4 c0t0 c0t4
^b+ ^b+ ^++ ^++ ^++ ^++ ^++ ^-- ^++ ^++ ^++ ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------
More complete documentation can be found here
bash-3.00# cd /opt/SUNWhd/hd/bin/
bash-3.00# ./hd
platform = Sun Fire X4500
Device Serial Vendor Model Rev Temperature
------ ------ ------ ----- ---- -----------
c0t4d0p0 F400P6G4ES5F ATA HITACHI HUA7250S A90A 26 C (78 F)
c0t3d0p0 F400P6G4XGYF ATA HITACHI HUA7250S A90A 32 C (89 F)
c5t6d0p0 F400P6G4VU4F ATA HITACHI HUA7250S A90A 30 C (86 F)
c5t1d0p0 F400P6G4LH8F ATA HITACHI HUA7250S A90A 28 C (82 F)
c1t4d0p0 F400P6G50N6F ATA HITACHI HUA7250S A90A 26 C (78 F)
c1t3d0p0 A570P6G4LSBF ATA HITACHI HUA7250S A90A 31 C (87 F)
c6t7d0p0 F400P6G4WKDF ATA HITACHI HUA7250S A90A 31 C (87 F)
c6t0d0p0 F500P6G4VJDF ATA HITACHI HUA7250S A90A 25 C (77 F)
c4t6d0p0 F400P6G4XX2F ATA HITACHI HUA7250S A90A 32 C (89 F)
c4t1d0p0 F400P6G4ZH6F ATA HITACHI HUA7250S A90A 30 C (86 F)
c3t5d0p0 F400P6G4RBWF ATA HITACHI HUA7250S A90A 29 C (84 F)
c3t2d0p0 F400P6G4X16F ATA HITACHI HUA7250S A90A 31 C (87 F)
c5t0d0p0 F400P6G0JWTF ATA HITACHI HUA7250S A90A 26 C (78 F)
c5t7d0p0 A570P6G4G25F ATA HITACHI HUA7250S A90A 31 C (87 F)
c0t2d0p0 F400P6G4UY0F ATA HITACHI HUA7250S A90A 30 C (86 F)
c0t5d0p0 F400P6G4KVDF ATA HITACHI HUA7250S A90A 30 C (86 F)
c3t3d0p0 A570P6G4LA4F ATA HITACHI HUA7250S A90A 32 C (89 F)
c3t4d0p0 F500P6G4XWYF ATA HITACHI HUA7250S A90A 26 C (78 F)
c4t0d0p0 F500P6G531WF ATA HITACHI HUA7250S A90A 27 C (80 F)
c4t7d0p0 A570P6G4R5VF ATA HITACHI HUA7250S A90A 32 C (89 F)
c6t1d0p0 F400P6G4WW1F ATA HITACHI HUA7250S A90A 28 C (82 F)
c6t6d0p0 F400P6G4XEHF ATA HITACHI HUA7250S A90A 30 C (86 F)
c1t2d0p0 F400P6G4Z54F ATA HITACHI HUA7250S A90A 30 C (86 F)
c1t5d0p0 F400P6G4SL0F ATA HITACHI HUA7250S A90A 28 C (82 F)
c3t7d0p0 F400P6G4MAZF ATA HITACHI HUA7250S A90A 32 C (89 F)
c3t0d0p0 F500P6G5538F ATA HITACHI HUA7250S A90A 27 C (80 F)
c4t4d0p0 F400P6G0JSGF ATA HITACHI HUA7250S A90A 28 C (82 F)
c4t3d0p0 F400P6G4YUUF ATA HITACHI HUA7250S A90A 34 C (93 F)
c6t5d0p0 F400P6G4SLEF ATA HITACHI HUA7250S A90A 27 C (80 F)
c6t2d0p0 F400P6G4Z4KF ATA HITACHI HUA7250S A90A 30 C (86 F)
c1t6d0p0 F400P6G4X5BF ATA HITACHI HUA7250S A90A 30 C (86 F)
c1t1d0p0 F400P6G0JYUF ATA HITACHI HUA7250S A90A 29 C (84 F)
c5t4d0p0 F400P6G42U3F ATA HITACHI HUA7250S A90A 26 C (78 F)
c5t3d0p0 F400P6G4X8KF ATA HITACHI HUA7250S A90A 31 C (87 F)
c0t6d0p0 F400P6G4X9XF ATA HITACHI HUA7250S A90A 32 C (89 F)
c0t1d0p0 F400P6G4KGYF ATA HITACHI HUA7250S A90A 29 C (84 F)
c1t0d0p0 F400P6G4XVSF ATA HITACHI HUA7250S A90A 25 C (77 F)
c1t7d0p0 F400P6G49X1F ATA HITACHI HUA7250S A90A 32 C (89 F)
c6t3d0p0 A570P6G4G5LF ATA HITACHI HUA7250S A90A 31 C (87 F)
c6t4d0p0 F400P6G4W3TF ATA HITACHI HUA7250S A90A 25 C (77 F)
c4t2d0p0 F400P6G4WK6F ATA HITACHI HUA7250S A90A 32 C (89 F)
c4t5d0p0 F400P6G4UHMF ATA HITACHI HUA7250S A90A 29 C (84 F)
c3t1d0p0 F400P6G0K0HF ATA HITACHI HUA7250S A90A 29 C (84 F)
c3t6d0p0 F400P6G4U62F ATA HITACHI HUA7250S A90A 30 C (86 F)
c0t0d0p0 F400P6G0JVZF ATA HITACHI HUA7250S A90A 26 C (78 F)
c0t7d0p0 F400P6G4X75F ATA HITACHI HUA7250S A90A 32 C (89 F)
c5t2d0p0 F400P6G4U47F ATA HITACHI HUA7250S A90A 30 C (86 F)
c5t5d0p0 F400P6G4N7LF ATA HITACHI HUA7250S A90A 27 C (80 F)
---------------------SunFireX4500------Rear----------------------------
36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47:
c4t3 c4t7 c3t3 c3t7 c6t3 c6t7 c5t3 c5t7 c1t3 c1t7 c0t3 c0t7
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:
c4t2 c4t6 c3t2 c3t6 c6t2 c6t6 c5t2 c5t6 c1t2 c1t6 c0t2 c0t6
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
c4t1 c4t5 c3t1 c3t5 c6t1 c6t5 c5t1 c5t5 c1t1 c1t5 c0t1 c0t5
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
c4t0 c4t4 c3t0 c3t4 c6t0 c6t4 c5t0 c5t4 c1t0 c1t4 c0t0 c0t4
^b+ ^b+ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------
As for the disk failure, I have logged a call with Oracle and am waiting to hear back from dispatch. Fortunately, the disk that failed was being used as a SPARE drive in the main datapool. We have 4 other SPARES at the moment, so replacement isn’t urgent, although we are running with slightly lower resilience to failure in the meantime.
Paul.
Add a comment
You are not allowed to comment on this entry as it has restricted commenting permissions.