RAID arrays are fundamentally about trading drive cost and capacity for increased performance or redundancy, or both.
Let’s say you’re setting up a new server and want to use four 2TB SATA drives. You want to balance speed and data safety.
# Example: Creating a RAID 10 array with mdadm on Linux
mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sdc1 /dev/sdc2 /dev/sdc3 /dev/sdc4
Here’s how that works under the hood:
- RAID 0 (Striping): Data is split across multiple drives. If you have two drives, the first block goes to drive A, the second to drive B, third to A, fourth to B, and so on. This doubles read/write speeds because each drive is only handling half the I/O. The catch? If any drive fails, all data is lost. It’s pure speed, zero redundancy.
- RAID 1 (Mirroring): Data is written identically to two drives. Drive A gets data, and Drive B gets the exact same data. This provides excellent redundancy – if one drive dies, the other has a perfect copy. However, you only get the capacity of one drive (e.g., two 2TB drives give you 2TB usable space), and write performance is often slower because the system has to write to both drives. Read performance can be slightly better as the system can read from either drive.
- RAID 5 (Striping with Parity): This is a popular balance. Data is striped across drives, but one drive’s worth of capacity is used for parity information. Parity is like a checksum that allows the system to reconstruct data if one drive fails. So, with four 2TB drives, you get 6TB usable space (4 drives - 1 drive for parity = 3 drives * 2TB). It offers good read speeds and decent write speeds, and can tolerate one drive failure. However, rebuilds can be slow and I/O intensive, and it’s vulnerable if a second drive fails during a rebuild.
- RAID 6 (Striping with Double Parity): Similar to RAID 5, but uses two drives’ worth of capacity for parity. With four 2TB drives, you’d get 4TB usable space (4 drives - 2 drives for parity = 2 drives * 2TB). This allows the array to tolerate two drive failures, making it much more robust, especially for large arrays or when rebuild times are long. Write performance is a bit slower than RAID 5 due to the extra parity calculation.
- RAID 10 (1+0 - Mirrored Stripes): This combines RAID 0 and RAID 1. You create pairs of mirrored drives (RAID 1), and then you stripe across those pairs (RAID 0). With four 2TB drives, you’d create two RAID 1 pairs (2TB + 2TB = 2TB usable for pair 1; 2TB + 2TB = 2TB usable for pair 2). Then you stripe across these two pairs, giving you 4TB usable space. This offers the performance benefits of RAID 0 and the redundancy of RAID 1. It can tolerate multiple drive failures, as long as no drive in a given mirror pair fails simultaneously. It’s generally considered the best of both worlds for performance and redundancy, but it’s the most expensive in terms of usable capacity.
The mdadm command is the standard tool on Linux for software RAID. The --create flag initiates the process. /dev/md0 is the name of the new RAID device that will be created (like /dev/sda). --level=10 specifies RAID 10. --raid-devices=4 tells mdadm how many drives to use for this array. Finally, /dev/sdc1 /dev/sdc2 /dev/sdc3 /dev/sdc4 are the actual partitions or drives that will form the array. You’d typically use entire drives (/dev/sdc, /dev/sdd, etc.) or dedicated partitions on those drives.
For RAID 10, mdadm automatically pairs them up for mirroring and then stripes across those pairs. The system will then ask for confirmation before proceeding. Once created, you’d format and mount /dev/md0 like any other drive.
The actual physical layout of RAID 10 is that drives 1 and 2 form a mirror, drives 3 and 4 form another mirror, and then data is striped across these two mirror pairs. This means if drive 1 fails, its mirror (drive 2) takes over, and the array is degraded but functional. If drive 3 fails, drive 4 takes over. If drive 1 and drive 3 both fail, the array is still functional. But if drive 1 and drive 2 both fail, the entire array is lost.
The most surprising thing about RAID parity calculations is that they are computationally inexpensive for RAID 5 and 6, meaning the performance hit isn’t as massive as you might expect, and the complexity is managed by the RAID controller or software.
If you forget to add a spare drive when creating your RAID 5 array, the next error you’ll hit is likely related to disk space or the ability to handle a single drive failure gracefully.