Linux/cgexec

Introduction[edit | edit source]

Imagine you have multiple applications running on your computer, and you want to ensure that one application doesn't consume all the CPU power or memory, leaving the others struggling for resources. This is where cgroups come into play. Cgroups help you set limits and control the usage of resources for different processes.

Linux Control Groups, v2[edit | edit source]

Linux Control Groups, commonly referred to as cgroups, is a feature in the Linux kernel that provides a way to organize and manage system resources for processes. It allows you to allocate and limit resources like CPU, memory, and I/O bandwidth among different processes or groups of processes.

There are two versions of Control Groups. The original v1 and the new 'unified' v2.

With cgroup v1, the support for multiple hierarchies posed challenges, limiting flexibility and leading to complex configurations. Controllers like the freezer were confined to a single hierarchy, and once bound, controllers couldn't be moved. This inflexibility resulted in managing numerous hierarchies separately, hindering cooperation between controllers. In contrast, cgroup v2 adopts a unified hierarchy approach, addressing these issues with a more practical and streamlined configuration management.

NOTE. Unless stated otherwise, this wiki page soley focuses on the use of Control Groups v2.

The usual way of prioritising CPU between processes is to use the traditional nice tool to set a process's priority between -20 and 19, where -20 is highest priority and 19 the lowest.

With cgroups, it is possible to create a hierarchy and assign limits, not only CPU priorities (weight), but also I/O, memory and other types of limits, to each level and cgroup. Nested cgroups are bound within their parents limits, which can make cgroups a powerful tool to control system resources.

Linux Control Groups have no pre-defined names. Distributions may choose their own names, though you may also use your own naming scheme and hierarchical structure.

A cgroups can be created or removed using mkdir and rmdir.

# mkdir /sys/fs/cgroup/my-group
# mkdir /sys/fs/cgroup/my-group/group-a
# mkdir /sys/fs/cgroup/my-group/group-b
# rmdir /sys/fs/cgroup/my-group/group-a

If the cgroup is not empty (no pids and no children), it cannot be removed.

# rmdir /sys/fs/cgroup/my-group/
rmdir: failed to remove 'my-group': Device or resource busy

Here's an example showing how CPU time is shared based on each group's cpu.weight. Weights are only enforced when there is contention. If only once process asks for 100% CPU, it will have it until other processes start to compete. The diagram below shows how the CPU time (bandwidth) would be shared if each cgroup tries to use maximum CPU.

ROOT (usually '/sys/fs/cgroup')
├── user                   # cpu.weight: 100, effective cpu time share (40%)
│   ├── user-1000          # cpu.weight: 100, effective cpu time share (20%)
│   └── user-1001          # cpu.weight: 100, effective cpu time share (20%)
├── cgroup-1               # cpu.weight: 50, effective cpu time share (20%)
│   ├── cgroup-A           # cpu.weight: 200, effective cpu time share (6.67%)
│   └── cgroup-B           # cpu.weight: 400, effective cpu time share (13.33%)
└── cgroup-2               # cpu.weight: 100, effective cpu time share (40%)

cgroup controllers[edit | edit source]

A controller is responsible for managing a type of resource. The available controllers can be listed via the cgroups.controllers file

# cat /sys/fs/cgroup/cgroup.controllers

cpuset cpu io memory hugetlb pids rdma misc

Controller	Description
cpu	The "cpu" controllers regulates distribution of CPU cycles.
cpuset	The "cpuset" controller provides a mechanism for constraining the CPU and memory node placement of tasks. Especially useful on NUMA systems.
memory	The "memory" controller regulates distribution of memory. It also tracks file cache, kernel memory and TCP sockets that a process may use.
io	The "io" controller regulates the distribution of IO resources. This controller implements both weight based and absolute bandwidth or IOPS limit distribution. Note that this works CFS and BFQ IO schedulers but not deadline or noop.
hugetlb	The "hugetlb" controller allows to limit the HugeTLB usage per control group.
pids	The process number controller is used to allow a cgroup to stop any new tasks from being fork()'d or clone()'d after a specified limit is reached.
rdma	The "rdma" controller regulates the distribution and accounting of RDMA resources.
misc	The Miscellaneous cgroup provides the resource limiting and tracking for resources which cannot be abstracted like the other cgroup resources.

The number of controllers that are available depends on the kernel used. A full description of each controller and how to configure it is available in the Linux kernel documentation https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

cgroup interface files[edit | edit source]

A cgroup controller can be configured using its interface file.

Interface File	Description
cgroup.controllers	A list of controllers enabled for the cgroup.
cgroup.events	Events recorded for the cgroup.
cgroup.freeze	Freezes or unfreezes the tasks in the cgroup.
cgroup.kill	Writing "1" to the file causes the cgroup and all descendant cgroups to be killed.
cgroup.max.depth	Maximum depth of the cgroup hierarchy.
cgroup.max.descendants	Maximum number of descendants a cgroup can have.
cgroup.pressure	Resource pressure events for the cgroup.
cgroup.procs	List of process IDs in the cgroup.
cgroup.stat	Statistics for the cgroup.
cgroup.subtree_control	A list of controllers enabled for child cgroups.
cgroup.threads	List of thread IDs in the cgroup.
cgroup.type	Type of cgroup.
cpu.idle	CPU idle scheduler.
cpu.max	Maximum bandwidth limit for the CPU.
cpu.max.burst	Maximum burst bandwidth for the CPU.
cpu.pressure	CPU pressure events.
cpuset.cpus	CPUs assigned to the cpuset.
cpuset.cpus.effective	Effective CPUs in the cpuset.
cpuset.cpus.partition	Partitioned CPUs in the cpuset.
cpuset.mems	Memory nodes assigned to the cpuset.
cpuset.mems.effective	Effective memory nodes in the cpuset.
cpu.stat	CPU statistics.
cpu.stat.local	Throttled time for individual children cgroups.
cpu.weight	CPU bandwidth weight.
cpu.weight.nice	Nice-adjusted CPU bandwidth weight.
hugetlb.1GB.current	Current usage of 1GB huge pages.
hugetlb.1GB.events	Events for 1GB huge pages.
hugetlb.1GB.events.local	Local events for 1GB huge pages.
hugetlb.1GB.max	Maximum limit for 1GB huge pages.
hugetlb.1GB.numa_stat	NUMA statistics for 1GB huge pages.
hugetlb.1GB.rsvd.current	Current reserved 1GB huge pages.
hugetlb.1GB.rsvd.max	Maximum reserved 1GB huge pages.
hugetlb.2MB.current	Current usage of 2MB huge pages.
hugetlb.2MB.events	Events for 2MB huge pages.
hugetlb.2MB.events.local	Local events for 2MB huge pages.
hugetlb.2MB.max	Maximum limit for 2MB huge pages.
hugetlb.2MB.numa_stat	NUMA statistics for 2MB huge pages.
hugetlb.2MB.rsvd.current	Current reserved 2MB huge pages.
hugetlb.2MB.rsvd.max	Maximum reserved 2MB huge pages.
io.latency	I/O latency.
io.max	Maximum bandwidth limit for I/O.
io.pressure	I/O pressure events.
io.prio.class	I/O priority class.
io.stat	I/O statistics.
io.weight	I/O bandwidth weight.
io.bfq.weight	Weight for the BFQ I/O scheduler.
memory.current	Current memory usage.
memory.events	Memory events.
memory.events.local	Local memory events.
memory.high	High memory usage threshold.
memory.low	Low memory usage threshold.
memory.max	Maximum limit for memory usage.
memory.min	Minimum limit for memory usage.
memory.numa_stat	NUMA statistics for memory.
memory.oom.group	OOM control for memory.
memory.peak	Peak memory usage.
memory.pressure	Memory pressure events.
memory.reclaim	Memory reclaim events.
memory.stat	Memory statistics.
memory.swap.current	Current swap usage.
memory.swap.events	Swap events.
memory.swap.high	High swap usage threshold.
memory.swap.max	Maximum limit for swap usage.
memory.swap.peak	Peak swap usage.
memory.zswap.current	Current zswap usage.
memory.zswap.max	Maximum limit for zswap usage.
misc.current	Current usage for miscellaneous resources.
misc.events	Events for miscellaneous resources.
misc.max	Maximum limit for miscellaneous resources.
pids.current	Current number of processes.
pids.events	Process events.
pids.max	Maximum limit for the number of processes.
pids.peak	Peak number of processes.
rdma.current	Current usage for RDMA resources.
rdma.max	Maximum limit for RDMA resources.

mount options[edit | edit source]

Linux Control Groups are accessed through the cgroup2 filesystem.

# mount -t cgroup2 none <MOUNT_POINT>

Many Linux distributions automatically mount it at /sys/fs/cgroup or /sys/fs/cgroup/unified.

cgroup v2 currently supports the following mount options:

mount option	Description
nsdelegate	Consider cgroup namespaces as delegation boundaries. This option is system wide and can only be set on mount or modified through remount from the init namespace.
favordynmods	Reduce the latencies of dynamic cgroup modifications such as task migrations and controller on/offs at the cost of making hot path operations such as forks and exits more expensive.
memory_localevents	Only populate memory.events with data for the current cgroup, and not any subtrees. This is legacy behaviour, the default behaviour without this option is to include subtree counts.
memory_recursiveprot	Recursively apply memory.min and memory.low protection to entire subtrees, without requiring explicit downward propagation into leaf cgroups. This allows protecting entire subtrees from one another, while retaining free competition within those subtrees. This should have been the default behavior but is a mount-option to avoid regressing setups relying on the original semantics (e.g. specifying bogusly high 'bypass' protection values at higher tree levels)
memory_hugetlb_accounting	Count HugeTLB memory usage towards the cgroup's overall memory usage for the memory controller (for the purpose of statistics reporting and memory protetion).

cgexec - Execute a Command in a cgroup[edit | edit source]

cgexec is a Bash script I wrote that allows users to execute a command within a cgroup, providing control over resource limits such as CPU, I/O, and memory. It requires Linux Control Groups (cgroups) version v2, which is also known as unified cgroups.

The package libcgroup also provides a cgexec command. It is also used to control cgroup hierarchies, but uses a more complex setup and methods for assigning processes.

Usage[edit | edit source]

# cgexec -h

Attaches a program <cmd> to a cgroup with defined limits.
Requires Linux Control Groups v2.
Usage: cgexec [options] <cmd> [cmd args]
Options:
 -c cpu.weight   (0-10000)  Set CPU priority
 -C cpu.max      (1-100)    Set max CPU time in percent
 -i io.weight    (1-10000)  Set I/O weight
 -m memory.high  (0-max)    Set soft memory limit
 -M memory.max   (0-max)    Set hard memory limit
 -g group        Create or attach to existing cgroup. Default is to use an ephemeral group
 -b path         Use <path> as cgroup root

Option	Description
cpu.weight	Set CPU priority. 1-10000 where 10000 is highest priority, similar to `nice -n -19`. `0` is special and enables the `idle` CPU scheduler.
cpu.max	Sets the maximum allowed CPU time in percentages. A value less than 100 will throttle the process periodically.
io.weight	Sets the I/O priority. It is similar to CPU weight and limits the bandwidth available to a process if there is contention.
memory.high	Sets a soft memory limit. If a process tries to allocate more, it will be throttled instead of cause OOM.
memory.max	Sets an absolute limit to how much memory a process can allocate.
group	Use an existing cgroup, or create it if it doesn't exist. Default is to use a temporary cgroup named `cmd-xxxx` where xxxx is a random string.
path	Use a specific cgroup root. Can be a nested cgroup. Useful if you want to attach the process to an exusting cgroup hierarchy.

Examples[edit | edit source]

Execute a command with default settings

cgexec echo "Hello, cgroups!"

Limit CPU and memory for a command

cgexec -c 50 -m 1G my_command

Attach to an existing cgroup

cgexec -g mygroup my_command

Use a custom cgroup root

cgexec -b /sys/fs/cgroup/mysubsystem my_command

User Cgroups[edit | edit source]

While it is possible to give a user ownership of a cgroup, the user cannot directly use it because all processes initially belong to the cgroup root /sys/fs/cgroup/cgroup.procs, and even though a user can write to its own cgroup, they can not remove themselves from the root cgroup.

This catch-22 can be solved by using sudo or a root user to create a cgroup, change the owner to a user and then move that user's processes to it.

One solution is to let the user start a screen or tmux session and then as root, check what pids belong to the user session and add them to the user cgroup:

# ps af -u forza
5309 pts/0    S+     0:00  |     \_ screen
5310 ?        Ss     0:00  |         \_ SCREEN
5311 pts/1    Ss     0:00  |             \_ -/bin/bash
5483 pts/1    R+     0:00  |                 \_ ps af -u forza

# echo 5309 > /sys/fs/cgroup/user/forza/main/cgroup.procs
# echo 5310 > /sys/fs/cgroup/user/forza/main/cgroup.procs
# echo 5311 > /sys/fs/cgroup/user/forza/main/cgroup.procs

Now, any process started by that user will automatically belong to the same cgroup.

The user can now also use cgexec to create nested cgroups under its own cgroup.

Anonymous

Search

Linux/cgexec

Namespaces

More

Page actions

Contents

Introduction[edit | edit source]

Linux Control Groups, v2[edit | edit source]

cgroup controllers[edit | edit source]

cgroup interface files[edit | edit source]

mount options[edit | edit source]

cgexec - Execute a Command in a cgroup[edit | edit source]

Usage[edit | edit source]

Examples[edit | edit source]

User Cgroups[edit | edit source]

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Linux/cgexec

Introduction[edit | edit source]

Linux Control Groups, v2[edit | edit source]

cgroup controllers[edit | edit source]

cgroup interface files[edit | edit source]

mount options[edit | edit source]

cgexec - Execute a Command in a cgroup[edit | edit source]

Usage[edit | edit source]

Examples[edit | edit source]

User Cgroups[edit | edit source]

Navigation

Wiki tools

Page tools

Categories