You’re not actually limiting resources with systemd cgroup settings; you’re telling the kernel’s scheduler how to share the existing resources.
Let’s see this in action. Imagine we have a simple service that just spins up a bunch of processes:
# /etc/systemd/system/my-cpu-hog.service
[Unit]
Description=A service that hogs CPU
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'for i in $(seq 1 100); do echo "CPU hog $i"; sleep 0.001; done'
ExecStop=/bin/true
RemainAfterExit=yes
If we start this with systemctl start my-cpu-hog.service, it’ll consume as much CPU as it can, potentially impacting other services. Now, let’s limit it.
To control CPU, we use CPUQuota and CPUShares. CPUShares is a relative weighting. If service A has CPUShares=1024 and service B has CPUShares=512, service A gets twice as much CPU time as service B when both are actively trying to use CPU. It’s like a lottery ticket system.
# /etc/systemd/system/my-cpu-hog.service
[Unit]
Description=A service that hogs CPU
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'for i in $(seq 1 100); do echo "CPU hog $i"; sleep 0.001; done'
ExecStop=/bin/true
RemainAfterExit=yes
CPUQuota=50%
CPUShares=1024
Here, CPUQuota=50% tells the kernel that this unit should never use more than 50% of one CPU core’s worth of processing time. CPUShares=1024 is the default share value, so if we had another service with CPUShares=512, this service would get twice the CPU time up to its quota. If this service is the only one trying to use CPU, and CPUQuota is 50%, it will only use 50% of a core, leaving the other 50% for other tasks or the system.
Let’s check the cgroup settings directly. After starting the service with limits, you can find its cgroup path under /sys/fs/cgroup/cpu,cpuacct/system.slice/my-cpu-hog.service/.
The key files to look at are:
cpu.cfs_quota_us: This reflectsCPUQuota. If you setCPUQuota=50%, this file will contain50000(representing 50000 microseconds per 100000 microseconds, or 50%). If you setCPUQuota=20000(for 200ms per second), this file will contain20000.cpu.cfs_period_us: This is the period over whichcpu.cfs_quota_usis applied. It defaults to100000(100ms).CPUQuota=50%is equivalent tocpu.cfs_quota_us=50000andcpu.cfs_period_us=100000.cpu.shares: This reflectsCPUShares. It’s a relative weight, defaulting to1024.
So, if you set CPUQuota=20000 and CPUShares=2048 in your .service file, you’d see 20000 in cpu.cfs_quota_us and 2048 in cpu.shares. CPUQuota is an absolute hard cap, while CPUShares is a soft, relative allocation.
For memory, it’s simpler: MemoryLimit. This is a hard ceiling.
# /etc/systemd/system/my-memory-hog.service
[Unit]
Description=A service that hogs memory
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'for i in $(seq 1 1000); do echo "Memory hog $i"; sleep 0.001; done; echo "Allocating memory..."; /usr/bin/stress --vm 1 --vm-bytes 512M --vm-hang 1000000'
ExecStop=/bin/true
RemainAfterExit=yes
MemoryLimit=256M
If this service tries to allocate more than 256MB of memory, the kernel’s OOM killer will step in for that cgroup, terminating processes within my-memory-hog.service rather than the whole system.
You can verify this by checking the memory.max file in the unit’s cgroup directory (e.g., /sys/fs/cgroup/memory/system.slice/my-memory-hog.service/memory.max). Setting MemoryLimit=256M will write 268435456 (256 * 1024 * 1024) into this file.
The "surprising" thing is how CPUShares interacts with CPUQuota. If you set CPUQuota=10% and CPUShares=4096 (double the default), and there’s another service with CPUShares=2048, the first service will get 2/3rds of the available CPU time (after the 10% quota is enforced), not 2/3rds of the total CPU. The quota is always the absolute limit.
You can also specify memory limits with MemoryHigh, which acts as a soft limit. If memory usage exceeds MemoryHigh, the kernel will try to reclaim memory from that cgroup, but it won’t kill processes unless MemoryLimit is also breached.
The next concept you’ll run into is how these cgroup settings are inherited and how they interact across different cgroup hierarchies (like system.slice vs. user.slice).