GlusterFS is a distributed file system that aggregates disk storage resources from multiple servers into a single global namespace.

Let’s see it in action with a simple setup. Imagine we have two servers, gluster1 and gluster2, each with a directory we want to make available as part of a distributed volume.

On gluster1:

sudo mkdir -p /gluster/brick1
sudo chown gluster:gluster /gluster/brick1
# Assuming GlusterFS is installed and running
# The glusterd service should be active

On gluster2:

sudo mkdir -p /gluster/brick2
sudo chown gluster:gluster /gluster/brick2

Now, let’s create a distributed volume named gv0 that uses these two directories as its bricks. We’ll use a replicated volume type for data redundancy, meaning data written to gv0 will be present on both gluster1 and gluster2.

On either server (let’s use gluster1):

sudo gluster volume create gv0 replica 2 transport tcp gluster1:/gluster/brick1 gluster2:/gluster/brick2 force
sudo gluster volume start gv0

The replica 2 part tells GlusterFS to create two copies of the data. The transport tcp specifies the network protocol. gluster1:/gluster/brick1 and gluster2:/gluster/brick2 are the "bricks" – the actual storage locations on each server. The force option is used here because we’re creating a volume with existing directories, which GlusterFS might otherwise flag as potentially unsafe.

After starting, we can mount this volume on a client machine. Let’s say our client is client1.

On client1:

sudo apt-get update && sudo apt-get install glusterfs-client -y # For Debian/Ubuntu
# Or sudo yum install glusterfs-client -y # For RHEL/CentOS
sudo mkdir -p /mnt/gluster/gv0
sudo mount -t glusterfs gluster1:/gv0 /mnt/gluster/gv0

Now, anything written to /mnt/gluster/gv0 on client1 will be distributed and replicated across gluster1 and gluster2.

echo "Hello GlusterFS!" | sudo tee /mnt/gluster/gv0/testfile.txt

If you check the bricks: On gluster1:

cat /gluster/brick1/testfile.txt

Output:

Hello GlusterFS!

On gluster2:

cat /gluster/brick2/testfile.txt

Output:

Hello GlusterFS!

This demonstrates how GlusterFS creates a unified namespace and handles data distribution and replication. The core problem GlusterFS solves is the need for scalable, reliable, and distributed storage without a single point of failure or complex metadata servers like traditional NAS. It achieves this by using a distributed hash table (DHT) to locate files across the bricks and by employing translators that define how data is distributed, replicated, or even spread across multiple disks (striping).

The mental model is that GlusterFS is made of "bricks" (directories on servers) that are assembled into "volumes." These volumes can be configured with different "translator" options like replica (for redundancy), distributed (for spreading files across bricks), or stripe (for spreading file data across bricks). The client then mounts this volume, and GlusterFS handles the underlying complexities of finding and writing data to the correct bricks.

When you configure a volume, GlusterFS doesn’t copy data immediately. Instead, it builds an internal graph of how data should be distributed and replicated. The actual data movement and consistency checks happen on demand when files are accessed or modified. For a replicated volume, writes go to all bricks simultaneously (or nearly so, depending on network latency and GlusterFS internals), and reads can come from any brick. For a distributed volume, a hash of the filename determines which brick(s) the file resides on.

A common misconception is that GlusterFS requires dedicated network interfaces for replication traffic. While it’s best practice for performance, GlusterFS will happily use any available network interface and TCP/IP for both client access and inter-server communication. The transport tcp option is the default and most common, but GlusterFS also supports rdma for higher performance if your network hardware supports it.

The next concept you’ll likely encounter is dealing with volume performance tuning, especially understanding how different volume types (distributed, replicated, striped, or combinations thereof) impact read/write speeds and data durability.

Want structured learning?

Take the full Storage course →