Linux Isolation Basics

User and group access restrictions are one of the most basic forms of isolating what a particular application can do.

Users are often warned to avoid running applications as the admin user (in most cases root) as much as possible. To accomplish this, privileges can be dropped to an unprivileged user at execution time through setuid and setgid system calls.

For example taking this simple echo server in C and modifying it to bind to a port which requires root privileges:

servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htons(INADDR_ANY);
servaddr.sin_port = htons(7);

bind(listen_fd, (struct sockaddr *) &servaddr, sizeof(servaddr));

If this is run as-is, it will require root access for binding to a low numbered port. If we want to improve on this, we can require that the program drops its privileges after binding to the port with privileged access:

servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htons(INADDR_ANY);
servaddr.sin_port = htons(7);

bind(listen_fd, (struct sockaddr *) &servaddr, sizeof(servaddr));

The 65534 is the user ID for the nobody user on the particular system this is running on. (Practical solutions (i.e. not example code) should take user names as strings as well as user IDs.)

When the program is run with this new setuid addition:

# ./echo
Echoing back – Testing
# ps aux

nobody 4031 0.0 0.0 4044 344 pts/3 S+ 16:07 0:00 ./echo

This process is now running as the nobody user while still being able to properly accept connections. However, as it stands, the server has the potential to access any files on the system that the nobody user can access. While not too much of a concern for this simple echo server, it could cause security issues in a production environment, should the server be compromised.

Filesystem isolation can be used to prevent this.

Filesystem Isolation::::

The chroot is a basic form of isolation at the filesystem level, with the name of the program being an abbreviation for “change root”. Essentially it changes the root of the filesystem for a one process and all child processes under it.

A few practical uses of chroots include:

Development environments
Isolation of services
Restricted user SSH access

Let’s continue working with our echo server. Firstly, we need to create a basic directory structure, and then copy over some essential shared libraries:

# mkdir chroot
# mkdir chroot/bin
# mkdir chroot/lib64
# cp echo chroot/bin
# ldd chroot/bin/echo (0x00007fff0adff000) => /lib64/ (0x00007f516c7cf000)
/lib64/ (0x00007f516cb74000)
# cp /lib64/ chroot/lib64/
# cp /lib64/ chroot/lib64/

Now these files are copied over, it’s time to attempt to run the program in the isolated filesystem:

# chroot chroot /bin/echo
Echoing back – Testing
nobody 15285 0.0 0.0 4048 388 pts/3 S+ 16:51 0:00 /bin/echo

Even though filesystem isolation is present, there’s more we can do. The server is still using the host’s resources without any real limitation. This might be a problem if there are other processes running on this host that are competing for resources. What we need to do now is to isolate those resources.

Control Group Resource Isolation::::

Control groups, or cgroups for short, are a way to isolate shared resources. These resources include block IO, memory, CPU, and so on.

Let’s look at IO for a second. For a disk on AWS EC2 hdparam shows:

# hdparm –direct -t /dev/xvda
Timing O_DIRECT disk reads: 382 MB in 3.02 seconds = 126.68 MB/sec

So IO is around 126 MB/sec.

Now, let’s throttle that to 1 MB/sec using control groups.

First a control group needs to be created:

# cgcreate -g blkio:throttled-io

Here, blkio is the name of the subsystem (block IO) we’re going to restrict and throttled-io is the name of the control group we’re creating.

Throttling works on specific devices, so the major/minor identifier of the device needs to be obtained:

brw-rw—- 1 root disk 202, 0 Apr 24 10:06 /dev/xvda

In this case it is 202, 0.

Next, cgset is used to set the actual throttling:

# cgset -r blkio.throttle.read_bps_device=”202:0 1048576″ throttled-io

Now we can run hdparm with this new control group using cgexec:

# cgexec -g blkio:throttled-io hdparm –direct -t /dev/xvda
Timing O_DIRECT disk reads: 4 MB in 4.00 seconds = 1023.59 kB/sec

As shown, the IO rate is now throttled around 1 MB/sec. Success!

This is just one example of the many other cgroups that are available to utilize for resource management. Read more about the other cgroups in the Red Hat docs.

However, the service has the potential to see process information it really shouldn’t. This leads into the next form of isolation: namespaces.

Linux Namespaces::::

Namespaces are a way to isolate areas like network and process space.

Due to the rather complex nature of network namespaces with isolated applications and chroots, we’ll discuss that in more detail in part two of this series instead. For now we’ll focus on process space isolation.

So, we need to modify the code for this. The full code is available in this gist, but the important parts are here:

printf(“Child: PID=%ld PPID=%ld\n”, (long) getpid(), (long) getppid());

char *stack;
char *stackTop;

stack = malloc(STACK_SIZE);
if (stack == NULL) {
printf(“malloc(2) failed\n”);
return 1;


stackTop = stack + STACK_SIZE;
pid_t child_pid = clone(echo_server, stackTop, CLONE_NEWPID | SIGCHLD, NULL);

First, in the child process, the PID is printed out so that we can verify the process namespace is working properly. If it is, the PID will show as 1, which is normally the init process on the host system, but will be our top level process once it’s isolated. The clone() function creates the new namespace and will execute the server with it. This new namespace will have an entirely isolated process space.

We can see that by running:

# chroot chroot /bin/echo
Child: PID=1 PPID=0

The process has a PID of 1, showing that the process space isolation is working.

No token or token has expired.

Leave a Reply

Your email address will not be published. Required fields are marked *

File descriptor

27th Feb 2019