Advertisement
Advertisement


What is RSS and VSZ in Linux memory management


Question

What are RSS and VSZ in Linux memory management? In a multithreaded environment how can both of these can be managed and tracked?

2014/01/10
1
345
1/10/2014 4:13:24 PM


RSS is Resident Set Size (physically resident memory - this is currently occupying space in the machine's physical memory), and VSZ is Virtual Memory Size (address space allocated - this has addresses allocated in the process's memory map, but there isn't necessarily any actual memory behind it all right now).

Note that in these days of commonplace virtual machines, physical memory from the machine's view point may not really be actual physical memory.

2016/10/05

Minimal runnable example

For this to make sense, you have to understand the basics of paging: How does x86 paging work? and in particular that the OS can allocate virtual memory via page tables / its internal memory book keeping (VSZ virtual memory) before it actually has a backing storage on RAM or disk (RSS resident memory).

Now to observe this in action, let's create a program that:

  • allocates more RAM than our physical memory with mmap
  • writes one byte on each page to ensure that each of those pages goes from virtual only memory (VSZ) to actually used memory (RSS)
  • checks the memory usage of the process with one of the methods mentioned at: Memory usage of current process in C

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

typedef struct {
    unsigned long size,resident,share,text,lib,data,dt;
} ProcStatm;

/* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c/7212248#7212248 */
void ProcStat_init(ProcStatm *result) {
    const char* statm_path = "/proc/self/statm";
    FILE *f = fopen(statm_path, "r");
    if(!f) {
        perror(statm_path);
        abort();
    }
    if(7 != fscanf(
        f,
        "%lu %lu %lu %lu %lu %lu %lu",
        &(result->size),
        &(result->resident),
        &(result->share),
        &(result->text),
        &(result->lib),
        &(result->data),
        &(result->dt)
    )) {
        perror(statm_path);
        abort();
    }
    fclose(f);
}

int main(int argc, char **argv) {
    ProcStatm proc_statm;
    char *base, *p;
    char system_cmd[1024];
    long page_size;
    size_t i, nbytes, print_interval, bytes_since_last_print;
    int snprintf_return;

    /* Decide how many ints to allocate. */
    if (argc < 2) {
        nbytes = 0x10000;
    } else {
        nbytes = strtoull(argv[1], NULL, 0);
    }
    if (argc < 3) {
        print_interval = 0x1000;
    } else {
        print_interval = strtoull(argv[2], NULL, 0);
    }
    page_size = sysconf(_SC_PAGESIZE);

    /* Allocate the memory. */
    base = mmap(
        NULL,
        nbytes,
        PROT_READ | PROT_WRITE,
        MAP_SHARED | MAP_ANONYMOUS,
        -1,
        0
    );
    if (base == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }

    /* Write to all the allocated pages. */
    i = 0;
    p = base;
    bytes_since_last_print = 0;
    /* Produce the ps command that lists only our VSZ and RSS. */
    snprintf_return = snprintf(
        system_cmd,
        sizeof(system_cmd),
        "ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == \"%ju\") print}'",
        (uintmax_t)getpid()
    );
    assert(snprintf_return >= 0);
    assert((size_t)snprintf_return < sizeof(system_cmd));
    bytes_since_last_print = print_interval;
    do {
        /* Modify a byte in the page. */
        *p = i;
        p += page_size;
        bytes_since_last_print += page_size;
        /* Print process memory usage every print_interval bytes.
         * We count memory using a few techniques from:
         * https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c */
        if (bytes_since_last_print > print_interval) {
            bytes_since_last_print -= print_interval;
            printf("extra_memory_committed %lu KiB\n", (i * page_size) / 1024);
            ProcStat_init(&proc_statm);
            /* Check /proc/self/statm */
            printf(
                "/proc/self/statm size resident %lu %lu KiB\n",
                (proc_statm.size * page_size) / 1024,
                (proc_statm.resident * page_size) / 1024
            );
            /* Check ps. */
            puts(system_cmd);
            system(system_cmd);
            puts("");
        }
        i++;
    } while (p < base + nbytes);

    /* Cleanup. */
    munmap(base, nbytes);
    return EXIT_SUCCESS;
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
sudo dmesg -c
./main.out 0x1000000000 0x200000000
echo $?
sudo dmesg

where:

  • 0x1000000000 == 64GiB: 2x my computer's physical RAM of 32GiB
  • 0x200000000 == 8GiB: print the memory every 8GiB, so we should get 4 prints before the crash at around 32GiB
  • echo 1 | sudo tee /proc/sys/vm/overcommit_memory: required for Linux to allow us to make a mmap call larger than physical RAM: maximum memory which malloc can allocate

Program output:

extra_memory_committed 0 KiB
/proc/self/statm size resident 67111332 768 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 1648

extra_memory_committed 8388608 KiB
/proc/self/statm size resident 67111332 8390244 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 8390256

extra_memory_committed 16777216 KiB
/proc/self/statm size resident 67111332 16778852 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 16778864

extra_memory_committed 25165824 KiB
/proc/self/statm size resident 67111332 25167460 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 25167472

Killed

Exit status:

137

which by the 128 + signal number rule means we got signal number 9, which man 7 signal says is SIGKILL, which is sent by the Linux out-of-memory killer.

Output interpretation:

  • VSZ virtual memory remains constant at printf '0x%X\n' 0x40009A4 KiB ~= 64GiB (ps values are in KiB) after the mmap.
  • RSS "real memory usage" increases lazily only as we touch the pages. For example:
    • on the first print, we have extra_memory_committed 0, which means we haven't yet touched any pages. RSS is a small 1648 KiB which has been allocated for normal program startup like text area, globals, etc.
    • on the second print, we have written to 8388608 KiB == 8GiB worth of pages. As a result, RSS increased by exactly 8GIB to 8390256 KiB == 8388608 KiB + 1648 KiB
    • RSS continues to increase in 8GiB increments. The last print shows about 24 GiB of memory, and before 32 GiB could be printed, the OOM killer killed the process

See also: https://unix.stackexchange.com/questions/35129/need-explanation-on-resident-set-size-virtual-size

OOM killer logs

Our dmesg commands have shown the OOM killer logs.

An exact interpretation of those has been asked at:

The very first line of the log was:

[ 7283.479087] mongod invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

So we see that interestingly it was the MongoDB daemon that always runs in my laptop on the background that first triggered the OOM killer, presumably when the poor thing was trying to allocate some memory.

However, the OOM killer does not necessarily kill the one who awoke it.

After the invocation, the kernel prints a table or processes including the oom_score:

[ 7283.479292] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 7283.479303] [    496]     0   496    16126        6   172032      484             0 systemd-journal
[ 7283.479306] [    505]     0   505     1309        0    45056       52             0 blkmapd
[ 7283.479309] [    513]     0   513    19757        0    57344       55             0 lvmetad
[ 7283.479312] [    516]     0   516     4681        1    61440      444         -1000 systemd-udevd

and further ahead we see that our own little main.out actually got killed on the previous invocation:

[ 7283.479871] Out of memory: Kill process 15665 (main.out) score 865 or sacrifice child
[ 7283.479879] Killed process 15665 (main.out) total-vm:67111332kB, anon-rss:92kB, file-rss:4kB, shmem-rss:30080832kB
[ 7283.479951] oom_reaper: reaped process 15665 (main.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:30080832kB

This log mentions the score 865 which that process had, presumably the highest (worst) OOM killer score as mentioned at: https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first

Also interestingly, everything apparently happened so fast that before the freed memory was accounted, the oom was awoken again by the DeadlineMonitor process:

[ 7283.481043] DeadlineMonitor invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

and this time that killed some Chromium process, which is usually my computers normal memory hog:

[ 7283.481773] Out of memory: Kill process 11786 (chromium-browse) score 306 or sacrifice child
[ 7283.481833] Killed process 11786 (chromium-browse) total-vm:1813576kB, anon-rss:208804kB, file-rss:0kB, shmem-rss:8380kB
[ 7283.497847] oom_reaper: reaped process 11786 (chromium-browse), now anon-rss:0kB, file-rss:0kB, shmem-rss:8044kB

Tested in Ubuntu 19.04, Linux kernel 5.0.0.


I think much has already been said, about RSS vs VSZ. From an administrator/programmer/user perspective, when I design/code applications I am more concerned about the RSZ, (Resident memory), as and when you keep pulling more and more variables (heaped) you will see this value shooting up. Try a simple program to build malloc based space allocation in loop, and make sure you fill data in that malloc'd space. RSS keeps moving up. As far as VSZ is concerned, it's more of virtual memory mapping that linux does, and one of its core features derived out of conventional operating system concepts. The VSZ management is done by Virtual memory management of the kernel, for more info on VSZ, see Robert Love's description on mm_struct and vm_struct, which are part of basic task_struct data structure in kernel.

2015/08/02

They are not managed, but measured and possibly limited (see getrlimit system call, also on getrlimit(2)).

RSS means resident set size (the part of your virtual address space sitting in RAM).

You can query the virtual address space of process 1234 using proc(5) with cat /proc/1234/maps and its status (including memory consumption) thru cat /proc/1234/status


Source: https://stackoverflow.com/questions/7880784
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]