This content originally appeared on DEV Community 👩‍💻👨‍💻 and was authored by Ruslan Kh.
Maintaining a server's health is crucial for ensuring smooth operation of a business or organization's IT infrastructure. One of the tasks involved in maintaining a server's health is replacing failed or failing disks. However, identifying which disk needs to be replaced can be a difficult and dangerous task if the correct procedures are not followed.
It is often necessary to identify disks for a number of reasons, including replacing a failed disk, searching for a specific disk, or upgrading to faster or larger disks.
If there is a large number of servers located in different data centers, it is not always possible to independently replace the disk in the server, in this case the replacement is performed by data center engineers, who should be prepared, which drive must be replaced, indicating the rack and the unit in which the server is located.
To reduce the risks, it is important to have a clear method of indicating which drive needs to be replaced. This can be done using platform tools such as disk LED indicators.
In this article, we will discuss the importance of highlighting the disk that needs to be replaced, the difficulties involved in identifying the disk in the server, and the dangers of disconnecting the wrong disk. We will also provide an overview of the different methods available for highlighting the disk that needs to be replaced, so that administrators can make informed decisions about which solution best suits their needs.
Why is disk identification important? In many cases, servers are configured with RAID arrays, which provide redundancy and improves perfomance. If a disk fails, it is essential to identify the correct disk to replace it in order to maintain the integrity of the system. When upgrading disks, it is also important to ensure that the correct disk is targeted to avoid any potential data loss.
Risks of incorrectly chosen disks. If the wrong disk is identified, there is a high risk of data loss and system damage, particularly in the case of RAID1 arrays. Before performing any disk-related operations, it is important to take precautions to prevent such risks.
In this article, we will cover two popular server systems: Huawei and Supermicro, and provide a guide on how to identify disks in both systems.
Before performing any disk-related operations, it is important to check the contents of the
locate
file and set the value to0
to ensure that the disk you are manipulating is the one you need.
for i in $(find /sys -name 'locate'); do echo 0 > "$i"; done
Huawei.
Disk Identification on Huawei servers.
We will assume that the failed disk has the identification /dev/sdaj
.
To identify a disk in a Huawei server, you can use the following steps:
- Access the system shell
- Locate the disk:
> ls /sys/class/enclosure/*/*/device/block/sdaj <TAB>
- Use the TAB key to autocomplete the device name, for example:
> ls /sys/class/enclosure/0\:0\:38\:0/ArrayDevice23/device/block/sdaj/
- To find the location of the ArrayDevice, run the following command:
> find /sys -name "locate" | grep ArrayDevice23
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:0/expander-0:0/port-0:0:38/end_device-0:0:38/target0:0:38/0:0:38:0/enclosure/0:0:38:0/ArrayDevice23/locate
- Turn on the UID LED:
> echo 1 > '/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:0/expander-0:0/port-0:0:38/end_device-0:0:38/target0:0:38/0:0:38:0/enclosure/0:0:38:0/ArrayDevice23/locate'
Supermicro.
Disk Identification on Supermicro servers.
When using Supermicro, I came across two ways to identify disks. The only difference is the presence of a space in the slot name.
Method #1
We will assume that the failed disk has the identification /dev/sda
.
- Access the system shell
- Locate the disk:
> ls /sys/class/enclosure/*/*/device/block/sda <TAB>
- Use the TAB key to autocomplete the device name, for example:
> ls /sys/class/enclosure/1\:0\:13\:0/Slot00/device/block/sda/
- To find the location of the ArrayDevice, run the following command:
> find /sys -name "locate" | grep Slot00
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host1/port-1:1/expander-1:1/port-1:1:11/end_device-1:1:11/target1:0:25/1:0:25:0/enclosure/1:0:25:0/Slot00/locate
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host1/port-1:0/expander-1:0/port-1:0:13/end_device-1:0:13/target1:0:13/1:0:13:0/enclosure/1:0:13:0/Slot00/locate
- Turn on the UID LED:
> echo 1 > '/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host1/port-1:1/expander-1:1/port-1:1:11/end_device-1:1:11/target1:0:25/1:0:25:0/enclosure/1:0:25:0/Slot00/locate'
Method #2
We will assume that the failed disk has the identification /dev/sdh
.
- Access the system shell
- Locate the disk:
> ls /sys/class/enclosure/*/*/device/block/sdh/
- Use the TAB key to autocomplete the device name, for example:
> ls /sys/class/enclosure/0\:0\:25\:0/Slot\ 08/device/block/sdh/
- To find the location of the ArrayDevice, run the following command:
> find /sys -name "locate" | grep Slot\ 08
/sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/port-0:1/expander-0:1/port-0:1:0/end_device-0:1:0/target0:0:25/0:0:25:0/enclosure/0:0:25:0/Slot 08/locate
/sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/port-0:0/expander-0:0/port-0:0:24/end_device-0:0:24/target0:0:24/0:0:24:0/enclosure/0:0:24:0/Slot 08/locate
- Turn on the UID LED:
> echo 1 > '/sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/port-0:0/expander-0:0/port-0:0:24/end_device-0:0:24/target0:0:24/0:0:24:0/enclosure/0:0:24:0/Slot 08/locate'
In some cases to identify a disk in server, you will need to use a specialized disk compatible with controller that has the ability to analyze errors and display an indicator, usually in red. The process may differ depending on the platform, so it is recommended to consult the user manual or vendor's documentation for specific instructions.
In exceptional cases where there is no UID LED indication, you can use less reliable methods. For example by loading the drive with reads, which will cause a continuous green LED indication.
In conclusion, disk identification is an important aspect of server management that must be performed with care to prevent data loss and system damage. By following the steps outlined in this guide, you can identify disks in Huawei and Supermicro servers with confidence. I hope my experience will help you to identify disks when it is not possible to enable disk UID indication using IPMI.
This content originally appeared on DEV Community 👩‍💻👨‍💻 and was authored by Ruslan Kh.
Ruslan Kh. | Sciencx (2023-02-05T17:58:40+00:00) Identifying Disks in Dedicated Servers. Retrieved from https://www.scien.cx/2023/02/05/identifying-disks-in-dedicated-servers/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.