A file system is just a highly opinionated way of mapping bytes to names.

Let’s watch a block storage system in action. Imagine a virtual machine, let’s call it vm-01. This VM needs to store its operating system and data. It doesn’t talk to a disk in the traditional sense; instead, it talks to a block device. This block device is presented by a storage system, say, an AWS Elastic Block Store (EBS) volume or a NetApp filer.

Here’s what vm-01 sees:

$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   50G  0 disk
└─sda1    8:1    0   50G  0 part /

sda is the block device. It’s a raw sequence of 512-byte blocks, numbered from 0 upwards. The VM’s operating system formats this raw space with a file system, like ext4 or XFS, which then organizes these blocks into directories, files, and metadata. When the VM wants to read myfile.txt, it asks the kernel to read blocks X, Y, and Z. The kernel, in turn, tells the block storage system, "Give me blocks X, Y, and Z from device sda." The storage system then retrieves those blocks and returns them. The key is that the VM only understands blocks; the how of retrieving those blocks is abstracted away.

Now, contrast this with a file system. A Network File System (NFS) or Server Message Block (SMB) share is different. The client machine doesn’t see raw blocks; it sees files and directories directly.

On a client machine mounting an NFS share:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
server:/exports/data  2.0T  500G 1.5T  25% /mnt/nfs/data

When the client wants to read myfile.txt from /mnt/nfs/data, it sends an NFS_READ request to the NFS server. This request specifies the file handle (a unique identifier for myfile.txt) and an offset within the file. The NFS server, which does manage the underlying block storage (or object storage, or even another file system) for its exported directories, translates this NFS_READ request into requests for specific blocks on its own storage. The crucial difference is that the client is thinking in terms of files and paths, not raw blocks. The file system protocol (NFS, SMB) handles the translation and management of file-level operations.

Object storage is the most abstract. Here, data is stored as discrete units called "objects," each with a unique identifier (like a UUID), a payload (the data itself), and a set of metadata. There’s no concept of a hierarchical file system or raw blocks from the client’s perspective.

Imagine interacting with an S3-compatible object store:

# Upload a file
aws s3 cp mydocument.pdf s3://my-bucket/documents/mydoc-v1.pdf

# Get metadata
aws s3api head-object --bucket my-bucket --key documents/mydoc-v1.pdf
{
    "LastModified": "2023-10-27T10:30:00.000Z",
    "ContentLength": 1048576,
    "ContentType": "application/pdf",
    "Metadata": {
        "project": "alpha"
    }
    // ... other metadata
}

# Download the file
aws s3 cp s3://my-bucket/documents/mydoc-v1.pdf downloaded_doc.pdf

The client interacts with the object store via an API (typically HTTP-based). It doesn’t mount anything, and it doesn’t see blocks or directories in the traditional sense. It simply puts or gets objects by their unique keys. The object store itself manages how these objects are physically stored, often across many disks and machines, and handles redundancy and durability. This makes object storage highly scalable and durable, ideal for large, unstructured data like images, videos, backups, and archives.

The real magic in object storage, and what makes it so scalable, is that the object’s key is often a strong hint for its physical location, but not a strict one. When you request an object, the object store’s control plane looks up the object’s metadata, which includes information about which storage nodes hold the object’s data chunks. It then directs your request to those nodes. Critically, the object store can move data between nodes, rebalance storage, and handle failures without the client ever knowing. The client just presents the object key, and the system finds it. This decoupling of the logical object identifier from its physical location is fundamental.

The next evolution involves combining these paradigms, like how some modern distributed file systems use object storage under the hood to achieve massive scalability while still presenting a file system interface.

Want structured learning?

Take the full System Design course →