GitHunt
BA

multimonitor

A convenient command line utility to log system and process metrics.

$ multimonitor --utc_nice --gpu=min --process valley_x64 --process Xorg 
# Waiting for process valley_x64
# Waiting for process valley_x64
# Waiting for process valley_x64
# For process name valley_x64 found pids: [1996445]
# For process name Xorg found pids: [2066]
# ticks_per_second: 100
# With interval 200 ms and 100 ticks/s, expect CPU% error of ± 5.0%
#                                                                                                           Xorg
#                                                                                    valley_x64               |
#                                                                                         |                   |
#                                                                                     1996445                2066
              DATETIME-UTC            TIME      RELTIME    GPU%      VRAM      SCLK     CPU%        RSS     CPU%        RSS
2020-12-31T22:28:44.688313   709668.550807     0.200063    0.0%  309.6MiB  386.7MHz    0.00%      77MiB    5.00%     407MiB
2020-12-31T22:28:44.888319   709668.750763     0.400018    0.0%  309.6MiB  386.7MHz    0.00%      77MiB    0.00%     407MiB
2020-12-31T22:28:45.088325   709668.950799     0.600054    0.0%  311.6MiB  326.5MHz   15.00%      82MiB   10.00%     407MiB
2020-12-31T22:28:45.288331   709669.150770     0.800025    0.0%  311.6MiB  326.5MHz    0.00%      82MiB    0.00%     407MiB
2020-12-31T22:28:45.488337   709669.350800     1.000056    0.0%  311.6MiB  326.5MHz    0.00%      82MiB   10.00%     407MiB
2020-12-31T22:28:45.688343   709669.550775     1.200030    0.0%  311.6MiB  326.5MHz    0.00%      82MiB    5.00%     407MiB
2020-12-31T22:28:45.888350   709669.750795     1.400050    0.0%  311.6MiB  326.5MHz    0.00%      82MiB    5.00%     407MiB
2020-12-31T22:28:46.088356   709669.950932     1.600187    0.0%  311.6MiB  326.5MHz    0.00%      82MiB    0.00%     407MiB
2020-12-31T22:28:46.288362   709670.150692     1.799947    0.0%  294.1MiB  588.2MHz    5.01%      82MiB    5.01%     407MiB
2020-12-31T22:28:46.488368   709670.350815     2.000071    0.0%  294.1MiB  588.2MHz    0.00%      82MiB    5.00%     407MiB
2020-12-31T22:28:46.688374   709670.550755     2.200010    0.0%  294.1MiB  588.2MHz    5.00%      82MiB    5.00%     407MiB
2020-12-31T22:28:46.888381   709670.750808     2.400063    0.0%  294.1MiB  588.2MHz    0.00%      82MiB   10.00%     407MiB
2020-12-31T22:28:47.088387   709670.950767     2.600023    0.0%  294.1MiB  588.2MHz    0.00%      82MiB    5.00%     407MiB
2020-12-31T22:28:47.288393   709671.150816     2.800071    0.0%  298.2MiB  724.0MHz    5.00%      82MiB   10.00%     407MiB
2020-12-31T22:28:47.488399   709671.350754     3.000009    0.0%  298.2MiB  724.0MHz    0.00%      82MiB    0.00%     407MiB
2020-12-31T22:28:47.688405   709671.550808     3.200063    0.0%  298.2MiB  724.0MHz    0.00%      82MiB    0.00%     407MiB
2020-12-31T22:28:47.888411   709671.750837     3.400093    0.0%  298.2MiB  724.0MHz    0.00%      82MiB    5.00%     407MiB
2020-12-31T22:28:48.088418   709671.950742     3.599998    0.0%  298.2MiB  724.0MHz    0.00%      82MiB    0.00%     407MiB

Installation

To build multimonitor from source, you will need a D programming language
compiler. GDC, LDC2 and DMD are all supported. On most Linux distributions it is
easiest to install gdc, which is part of gcc, and prepackaged on most
distributions.

After obtaining source code, just execute ./build.sh (you can adjust options in
that script), or use dub to build it.

You should then get a multimonitor binary to use.

Options

Multimonitor - sample information about system and processes.
                  --sub Launch a single external command, monitor it just like
                        --pid and finish once all of them finish
                 --pids List of process pids to monitor
              --process List of process names to monitor
          --process_map Assign short names to processes, i.e. a=firefox,b=123
                  --cpu Overall CPU stats, i.e. load, average and max frequency
              --loadavg System-wide load average. Avaiable: none, min (1-min avg),
                        med (+5 min avg), max (+runnables and tasks count,
                        and forks per second)
             --cpu_temp CPU temperature
                --sched CPU scheduler details
                   --vm Virtual memory subsystem
           --interrupts Interrupts details
                   --io System-wide IO details. Available: none, min, max
                  --net System-wide networking metrics
                  --gpu Gather GPU stats. Available: none, min, max
         --mangohud_fps Gather FPS information for given processes using MangoHud RPC
                 --exec Run external command with arbitrary output once per sample
           --exec_async Run external command with arbitrary output asynchronously
                 --pipe Run external command and consume output lines as they come
     --async_delay_msec Change how often to run --exec_async and --gpu commands.
                        (default: 200ms)
         --wait_for_all Wait until all named processes are up
   --find_new_when_dead If the named process is dead, try searching again
       --exit_when_dead Stop collecting metrics and exit, when any of requested
                        pids exits too.
     --sum_all_matching For named processes, sum all matching processes metrics
                        (sum CPU, smart memory sum)
          --auto_output Automatically create timestamped output file in current
                        working directory with data, instead of using standard
                        output. (default: false)
        --interval_msec Target interval for main metric sampling and output.
                        (default: 200ms)
         --duration_sec How long to log. (default: forever)
                 --time Time mode, one of: relative, boottime, absolute, all.
                        (default: all)
             --utc_nice Show absolute time as ISO 8601 formated date and time in
                        UTC timezone. Otherwise Unix time is printed.
                        (default: false)
-H     --human_friendly Use human friendly (pretty), but still fixed units
                        (default: true)
              --verbose Show timeing loop debug info
-h               --help This help information.

Primary purpose is debugging processes, system load, memory usage, memory leaks,
GPU usage, frame rate tests, etc.

Combination of ps, top, iotop, powertop, radeontop, vmstat, free,
mpstat, pidstat, slabtop, cpufreq-info, mangohud and more, all in one.
In some areas the accuracy is significantly better than any of the above tools.

Supported

  • Extremely accurate timestamps. Absolute (ISO 8601),
    monotonic from system start, and relative from tool startup.
  • Very accurate repetition rate (usually <10us).
  • Syscall delay compensation.
  • Asynchronous sampling of expensive statistics.
  • Automatic compensation of delays when calculating rates.
  • Rich set of available metrics:
    • System CPU
    • System Memory
    • System IO, total and per-device
    • CPU temperature, frequency, and scaling governor
    • GPU frequency, temperature, load and memory usage
    • (multiple) Process CPU usage, thread count
    • (multiple) Process memory usage
    • (multiple) OpenGL / Vulkan FPS / frame time measurements (using MangoHud
      RPC system)
    • (multiple) Process IO statistics
    • (multiple) Process scheduling and IO priority logging
    • System networking statistics
    • System-wide and per-process context switches and interrupts
    • (multiple) Custom asynchronous metrics (using external scripts), like:
      • System power / current from PSU / SMBus
      • Number of files in a directory
      • ZFS statistics
      • GPU power save mode
      • Window resolution of a benchmarked game
      • Many more easy to add on the fly
    • Custom events annotation (i.e. using external scripts)
    • Ability to sample some metrics at slower rate than main metrics.
  • Human and machine friendly output in one format.
  • Self documenting output. Clear units.
  • Monitor processes by pid, or by name.
    • Continue monitoring even if pid dies, or stop. Configurable.
    • Ability to sum multiple pids or names. I.e. sum all processes
      with given name under one column.
  • Autostart/prestart - wait for a process by name, and start logging as
    soon as one is found.
  • Fast. 5Hz by default, 100Hz possible.
  • Extremely low CPU and memory overhead (<0.5% CPU, <5MiB).
  • Integration with Gnuplot for plotting.
  • Automatic detection of various failures, like signals, interrupts,
    processes death, slow syscalls / preemption, system sleep,
    process SIGSTOP, clock jumps, etc. Automatically output nan,
    or empty lines when discontinuities are detected, so plotting
    can use the data correctly.
  • Automatically calculate expected error / accuracy, and warn
    if it is low.

Many metrics support both rates and cumulative figures. Some other measure only
"gauges" (i.e. memory usage, GPU load or frequency).

The output format is a simple text format with nicely aligned and annotated
columns, but also optimized to be easily parsed by automated tools, most notably
Gnuplot. Relative and absolute timestamps allows correlating with other tools and
events, as well overlying multiple runs for comparison. Most of the data in
various columns use fixed non-configurable formats. This is mostly done to
speed-up processing and output, reduce memory allocations, and make it less
likely for user to mess things up. It also means the logs produced now, will have
exactly same format and units as the ones produced years from now, no matter the
used options. Which is great for comparing measurements with year old logs.

Examples of various modes

Some commands, like --pid, --pids, --process, --sub, --exec,
--exec_async, --pipe can be repated multiple times to monitor multiple
processes or pluging. Some, like --pids, also accept a comma separated lists.

Relative ordering of columns in the output, will in general follow relative
order of arguments. But some system level information will be output in more
rigid order. I.e. --cpu, --io, --gpu, will be displayed after timestamps,
in this order, and before any sub-process / per-process related ones.

Here is a general ordering:

  • Timestamps (influenced by --utc_nice and --time)
  • --cpu
  • --loadvg
  • --cpu_temp
  • --mem
  • --gpu
  • --sched
  • --vm
  • --io
  • --net
  • --pids, --pid
  • --process
  • --sub
  • --mangohud_fps
  • --exec
  • --exec_async
  • --pipe

Some ordering restrictions might be relaxed in the future.

--utc_nice

$ multimonitor --utc_nice
              DATETIME-UTC            TIME      RELTIME
2020-12-31T22:28:44.688313   709668.550807     0.200063
2020-12-31T22:28:44.888319   709668.750763     0.400018
2020-12-31T22:28:45.088325   709668.950799     0.600054

Without this flag, Unix time is output in the first column instead.

# multimonitor ...
        UNIX-TIME            TIME      RELTIME
1609465795.020941   721738.570333     0.200061
1609465795.220947   721738.770319     0.400047
1609465795.424953   721738.970353     0.600081
1609465795.620959   721739.170320     0.800048

Note, Unix time is roughly number of seconds since Unix Epoch
(1970-01-01 00:00:00 "UTC") minus leap seconds. A day in Unix time always has
exactly 86400 seconds. Unix time is often (even in official standards and
manuals) colloquially refereed as "seconds since the Epoch", even if that is not
strictly true.

--time

To select just one of the 3 timestamp to be present in the output, use --time
with one of absolute, boottime or relative.

$ multimonitor --time=absolute
        UNIX-TIME
1609655314.023888
1609655314.223895
1609655314.423901
1609655314.623907
$ multimonitor --time=absolute --utc_nice
              DATETIME-UTC
2021-01-03T06:28:31.495809
2021-01-03T06:28:31.695815
2021-01-03T06:28:31.895821
2021-01-03T06:28:32.095828
$ multimonitor --time=boottime
           TIME
  911283.419439
  911283.619396
  911283.819425
  911284.019410
  911284.219418
$ multimonitor --time=relative
     RELTIME
    0.200084
    0.400050
    0.600066
    0.800058
    1.000044

--pids / --pid

Monitor CPU% and RSS of processes by PID. Multiple --pids and --pid can be
specified.

CPU% of 100% means a fully utilized (by user space and system time spent in
kernel on behalf of the process) one core (or logical thread on the core). So for
example, a 8 thread CPU-bound process, on a 8 core system which is otherwise
idle, will show close to 800%. You can think of it as CPU seconds per second.
Utilization of 100% doesn't actually mean the core is fully utilized, it just
means that 100% of the time it was assigned to particular process, and not
assigned to other processes, interrupt handling or idle/sleep process. 100% CPU
can still have a lot of resources available that are not utilized (this is more
complicated with SMT), because of memory stalls, page faults, long dependency
chains, etc. If process is migrated between cores, the CPU% will track a total
time it was assigned and running on some cores. For multi-threaded processes, the
CPU% is a sum of all its threads.

RSS stands for "Resident (segment) size". It basically is a total amount of
used physical memory, so for example, it doesn't include swapped out memory. Nor
does it count mapped memory, that wasn't yet allocated to physical memory. Be
careful when interpreting these numbers, for complex processes, and multi-process
setups, as shared memory (mostly libraries, but also buffers for communication)
can be counted multiple times for different processes. (For example: You can't
just add up all RSS figures for Firefox or Chrome, to get a total memory usage
for them).

$ multimonitor --pids 1
#                                                 systemd
#                                                     |
#                                                     1
        UNIX-TIME            TIME      RELTIME     CPU%        RSS
1609465795.020941   721738.570333     0.200061    0.00%      12MiB
1609465795.220947   721738.770319     0.400047    0.00%      12MiB
1609465795.424953   721738.970353     0.600081    0.00%      12MiB
1609465795.620959   721739.170320     0.800048    0.00%      12MiB
1609465795.824966   721739.370335     1.000063    0.00%      12MiB
1609465796.020972   721739.570303     1.200031    0.00%      12MiB
1609465796.224978   721739.770336     1.400064    0.00%      12MiB

Multiple PIDs can be specified, as a comma separated list (--pids 1,2,3), or
repeated arguments (--pids 1 --pids 2), or even --pid 1,1 --pids 1,1.

Columns will be ordered in the same order as requested pid in the list(s).

Above each CPU% figure a name of the process (comm) will be displayed,
together with its pid.

Note: On Linux, you can track individual threads using --pids, by passing
thread task id. These are not pthread_t IDs (POSIX Thread IDs). One way of
finding thread pids, is by checking ls -1 /proc/<pid>/task entries. Other
tools, like ps and top can also display all threads on the system, and do
filtering. Thread pids, can also be obtained from inside the process using
gettid call. There is no easy way to convert pthread_t to tid, even on
Linux, without hacky hacks. There is also no easy way to convert tid back to
pthread_t, or pthread name and such. This is because technically pthread to
kernel task threads doesn't need to be 1:1 mapping.

--process

Monitor CPU% and RSS of process similar to --pid, but instead search for
process by name. Additionally by default, logging (and RELTIME) will not start
counting until all requested processes are found. Multiple --process can be
specified.

$ multimonitor --process steam
# Waiting for process steam
# Waiting for process steam
# Waiting for process steam
# For process name steam found pids: [2066]
#                                                   steam
#                                                      |
#                                                   2066
        UNIX-TIME            TIME      RELTIME     CPU%        RSS
1609465961.498080   721905.041175     0.200079   14.99%     405MiB
1609465961.694086   721905.241138     0.400043   10.00%     405MiB
1609465961.894092   721905.441157     0.600062   15.00%     405MiB

--sub

Executes external command in the shell, and monitors it just like the --pids.
Multiple --sub can be specified. They will be started in left to right order,
and displayed also from left to right.

Because a shell is used, and the top most process is monitored, usually an
shell's exec need to be performed to switch execution to a desired process.
Otherwise one would be monitoring CPU and RSS of the shell itself, which is
usually not what one wants.

Shell is the intermediary, to allow using features like shell file globing,
redirection, piping, pre-start configuration and environment variable overrides,
or starting some auxiliary background processes.

Example:

$ multimonitor --io=min --sub "exec sha256sum /dev/random >/dev/null"
        UNIX-TIME            TIME      RELTIME       READ       WRITE    CPU%     RSS
1609566507.918133   822470.120617     0.200078    64MiB/s      0MiB/s  100.0%   1.8MiB
1609566508.122140   822470.320565     0.400026    65MiB/s      0MiB/s  100.0%   1.9MiB
1609566508.318146   822470.520595     0.600056  1001MiB/s      0MiB/s  100.0%   1.8MiB
1609566508.518152   822470.720580     0.800041  1010MiB/s      0MiB/s  100.0%   1.8MiB
1609566508.718158   822470.920588     1.000049  1031MiB/s      0MiB/s  100.0%   1.8MiB
1609566508.918164   822471.120587     1.200048  1022MiB/s      0MiB/s  100.0%   1.8MiB
1609566509.118170   822471.320584     1.400045  1031MiB/s      0MiB/s  100.0%   1.8MiB
1609566509.318177   822471.520587     1.600048  1022MiB/s      0MiB/s  100.0%   1.8MiB
...

The standard output is unchanged (it remains same as standard output of
multimonitor). If the launched programs do have substantial own output, it
might be wise to use shell redirection in each --sub invocation, or use
--auto_output.

If --duration_sec is specified, the after duration passes, all sub-processes
will immedietly receive SIGTERM, and if after short delay they still are
not terminated, then after additional short delay, all non-terminated ones
will receive SIGKILL and be waited for.

If the child process dies, SIGCHLD will be ignored, but the zombie process,
will still be sampled, it will read as 0% CPU and 0MiB RSS. In the future, it
might be possible to handle this signal, and instead show nan, while also
handling the death process to clean it from process tables.

If the ^C is hit, or multimonitor dies in some other way, SIGHUP will be
delivered by kernel to the child processes, then reparanted under some other
system specific process (often pid 1). In the future, there might be an option to
call something like prctl(PR_SET_PDEATHSIG, SIGTERM); before doing actual
exec after the fork.

Or does it maybe sends SIGINT already? To check.

The processes for --pipe and --exec will most likely receive the SIGPIPE, in
addition to other signals mentioned above.

--gpu

$ multimonitor --utc_nice --gpu=min
              DATETIME-UTC            TIME      RELTIME    GPU%      VRAM      SCLK
2020-12-31T22:28:44.688313   709668.550807     0.200063    0.0%  309.6MiB  386.7MHz
2020-12-31T22:28:44.888319   709668.750763     0.400018    0.0%  309.6MiB  386.7MHz
2020-12-31T22:28:45.088325   709668.950799     0.600054    0.0%  311.6MiB  326.5MHz

--gpu=max could provide more information, including various GPU sub-system
loads, memory clocks, temperature, and such.

GPU stats will always be displayed before any of monitored processes (specified
via --pids, --pid, --process or --sub), as well plugins (--pipe,
--exec*).

--io

Shows system-wide block device IO.

$ multimonitor --utc_nice --io=min
              DATETIME-UTC            TIME      RELTIME      RDkB/s      WRkB/s
2021-02-02T05:43:48.568163   875927.542727     0.200118  164384KB/s       0KB/s
2021-02-02T05:43:48.768168   875927.742620     0.400011  165211KB/s       0KB/s
2021-02-02T05:43:48.968174   875927.942660     0.600052  163808KB/s       0KB/s
2021-02-02T05:43:49.168179   875928.142829     0.800221  164340KB/s       0KB/s

--io=max additionally provides information about swap bandwidth.

$ multimonitor --utc_nice --io=max
              DATETIME-UTC            TIME      RELTIME      RDkB/s      WRkB/s SWAPRDkB/s SWAPWRkB/s
2021-02-02T05:43:45.804091   875924.776632     0.200065  164425KB/s       0KB/s      0KB/s      0KB/s
2021-02-02T05:43:46.004096   875924.976592     0.400025  163875KB/s       0KB/s      0KB/s      0KB/s
2021-02-02T05:43:46.204101   875925.176618     0.600051  165097KB/s       0KB/s      0KB/s      0KB/s
2021-02-02T05:43:46.404107   875925.376602     0.800034  164493KB/s       0KB/s      0KB/s      0KB/s

IO stats will always be displayed before any of monitored processes (specified
via --pids, --pid, --process or --sub), as well plugins (--pipe,
--exec*).

--pipe

Launches an asynchronous process and reads back lines from each. The process
should output fixed width and consistent output on each line.

$ multimonitor --pipe "while true; do date '+%s.%N'; sleep 1; done" \
               --pipe "while true; do cat /proc/loadavg; sleep 1; done" \
               --pipe "while true; do cat /proc/uptime; sleep 1; done"
        UNIX-TIME            TIME      RELTIME                 PIPE           PIPE                                 PIPE
1609465621.259576   721564.812290     0.200065 1609465621.112124784 3.44 3.73 3.86 4/1628 2125166 721583.31 22579423.26
1609465621.459583   721565.012258     0.400033 1609465621.112124784 3.44 3.73 3.86 4/1628 2125166 721583.31 22579423.26
1609465621.659589   721565.212438     0.600213 1609465621.112124784 3.44 3.73 3.86 4/1628 2125166 721583.31 22579423.26

You can think of these processes as multimonitor plugins.

--exec

Executes synchronously external command on each sample. Newline characters from
the output are converted to spaces.

$ multimonitor --exec "awk '/^(nr_free_pages|nr_zone_inactive_anon)/ {print \$2;}' /proc/vmstat"
        UNIX-TIME            TIME      RELTIME             EXEC
1609478558.602970   734501.821340     0.200089 13816115 8242963
1609478558.802977   734502.021271     0.400020 13816141 8243038
1609478559.002983   734502.221306     0.600056 13816141 8243046
1609478559.202989   734502.421289     0.800038 13816141 8243050
1609478559.402995   734502.621292     1.000041 13816177 8243169

You can think of these processes as multimonitor plugins, to augment with extra
capabilities quickly.

--exec_async

Executes asynchronously external command. Newlines from the output are converted
to spaces.

$ multimonitor --exec_async 'echo 42 $(date +%s.%N)'
        UNIX-TIME            TIME      RELTIME                    EXEC
1609476390.052024   732333.326417     0.200084 42 1609476390.058566475
1609476390.252030   732333.526350     0.400016 42 1609476390.258601844
1609476390.452037   732333.726391     0.600057 42 1609476390.458978396
1609476390.652043   732333.926383     0.800050 42 1609476390.658840135

Caching behaviour can be changed with --async_delay_msec (default: 200ms).

A difference between the two can be ilustrated here:

$ multimonitor --exec "date +%s.%N" \
               --exec_async "date +%s.%N" \
               --async_delay_msec=1000
        UNIX-TIME            TIME      RELTIME                 EXEC                 EXEC
1609478808.394682   734751.606517     0.200071 1609478808.401597625 1609478808.200443789
1609478808.594688   734751.806484     0.400039 1609478808.601534096 1609478808.200443789
1609478808.794694   734752.006507     0.600062 1609478808.801675976 1609478808.200443789
1609478808.994700   734752.206494     0.800049 1609478809.001560833 1609478808.200443789
1609478809.194707   734752.406500     1.000055 1609478809.201414051 1609478808.200443789
1609478809.394713   734752.606497     1.200052 1609478809.401672778 1609478809.202125682
1609478809.594719   734752.806509     1.400064 1609478809.601640469 1609478809.202125682
1609478809.794725   734753.006488     1.600043 1609478809.801603550 1609478809.202125682
1609478809.994731   734753.206505     1.800060 1609478810.001602391 1609478809.202125682
1609478810.194737   734753.406499     2.000053 1609478810.201564425 1609478809.202125682
1609478810.394744   734753.606650     2.200204 1609478810.401723765 1609478810.203523733
1609478810.594750   734753.806403     2.399958 1609478810.601920961 1609478810.203523733
1609478810.794756   734754.006529     2.600083 1609478810.801647833 1609478810.203523733
1609478810.994762   734754.206483     2.800037 1609478811.001605257 1609478810.203523733
1609478811.194768   734754.406504     3.000059 1609478811.201543125 1609478810.203523733
1609478811.394774   734754.606501     3.200056 1609478811.401612507 1609478811.204954820
1609478811.594781   734754.806497     3.400051 1609478811.601693833 1609478811.204954820

Notice how the --exec part is executed every sample, but --exec_async on
average is executed every 5 samples.

--exec_async is excellent for more expensive computations, like calculating a
hash of a file, traversing big filesystem tree, reading from network, or reading
sysfs files that could be very slow. While the command is executing, the previous
saved state will be displayed, allowing one to continue logging uninterrupted
other things, as well asynchronously execute other command that use
--exec_async.

--pipe with --interval_msec=0

multimonitor can act as a multiplexer from multiple sources in parallel. It can
for example prefix log files or outputs of a command with timestamps, but do so
even when using multiple files and processes in parallel. And monitor other
system metrics in parallel if desired. This is not a main usage of
multimonitor, but it is sometimes handy. A bit more convenient tool is ts
from moreutils is worth looking at (there are other handy tools there that
could be used together with multimonitor, for example ifdata, pee and
sponge).

When using --interval_msec=0, instead of sleeping, a pooling on all pipes to
consume their output will be performed.

$ sudo multimonitor --pipe "exec tail -f /var/log/syslog" \
                    --pipe "exec tail -f /var/log/kern.log" \
                    --interval_msec=0
...
...

This is not a super useful feature at the moment, because by default text on each
column is right aligned, making it somehow silly to use.

Example of --pipe to monitor extra metrics for the process

Here we monitor GPU, process CPU and RSS, but additionally we monitor
number of memory mappings and VmSize (in KiB) using external script, run
continously. The --pipe scripts has a bit of extra boilerplate to find
automatically the process, and preserve the format (number of columns),
if the process is gone (so it is easier to parse later in other tools).

$ multimonitor \
  --gpu=med \
  --process cat \
  --pipe 'while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while wc -l "/proc/$P/maps" 2>/dev/null; do sleep 0.22; done; else echo "nan" "-"; fi; done' \
  --pipe 'while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while awk "/^VmSize/ {print \$2;}" "/proc/$P/status" 2>/dev/null; do sleep 0.22; done; else echo "nan"; fi; done'
# ProcStat Initializing ticks_per_second = 100
# ProcStat Initializing page_size_kb = 4
# Arguments: ["multimonitor", "--gpu=med", "--process", "cat", "--pipe", "while true; do P=$(pidof -s cat); if [ \"$P\" != \"\" ]; then while wc -l \"/proc/$P/maps\" 2>/dev/null; do sleep 0.22; done; else echo \"nan\" \"-\"; fi; done", "--pipe", "while true; do P=$(pidof -s cat); if [ \"$P\" != \"\" ]; then while awk \"/^VmSize/ {print \\$2;}\" \"/proc/$P/status\" 2>/dev/null; do sleep 0.22; done; else echo \"nan\"; fi; done"]
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# For process name cat found pids: [451314]
# Spawned 451315 for --pipe: while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while wc -l "/proc/$P/maps" 2>/dev/null; do sleep 0.22; done; else echo "nan" "-"; fi; done
# Spawned 451320 for --pipe: while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while awk "/^VmSize/ {print \$2;}" "/proc/$P/status" 2>/dev/null; do sleep 0.22; done; else echo "nan"; fi; done
        UNIX-TIME            TIME      RELTIME GPU%       VRAM    SCLK  GPUT    GPUP      CPU%        RSS                 PIPE PIPE
1620578670.951151    30070.936277     0.200060   0%   209.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB  
1620578671.151156    30071.136242     0.400025   0%   209.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578671.351161    30071.336259     0.600041   0%   201.9MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578671.551167    30071.536250     0.800033   0%   201.9MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578671.751172    30071.736255     1.000037   0%   201.9MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578671.951177    30071.936253     1.200035   0%   201.9MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578672.151183    30072.136258     1.400041   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578672.351188    30072.336251     1.600034   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578672.551194    30072.536254     1.800036   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578672.751199    30072.736253     2.000035   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578672.951204    30072.936253     2.200036   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578673.151210    30073.136254     2.400036   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578673.351215    30073.336257     2.600040   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578673.551220    30073.536251     2.800034   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578673.751226    30073.736254     3.000036   0%   202.4MiB 1050MHz  35°C 112.13W     0.00%       0MiB 24 /proc/451314/maps 5440
1620578673.951231    30073.936253     3.200036   0%   202.4MiB 1050MHz  35°C 112.13W      nan%       0MiB 24 /proc/451314/maps 5440
1620578674.151236    30074.136263     3.400046   0%   202.4MiB 1050MHz  35°C 112.13W      nan%       0MiB                nan -  nan
1620578674.351242    30074.336259     3.600042   0%   202.4MiB 1050MHz  35°C 112.15W      nan%       0MiB                nan -  nan
1620578674.551247    30074.536256     3.800039   0%   202.4MiB 1050MHz  35°C 112.15W      nan%       0MiB                nan -  nan
1620578674.751252    30074.736259     4.000041   0%   202.4MiB 1050MHz  35°C 112.15W      nan%       0MiB                nan -  nan
1620578674.951258    30074.936257     4.200040   0%   203.7MiB 1050MHz  35°C 112.13W      nan%       0MiB                nan -  nan
1620578675.151263    30075.136259     4.400041   0%   203.7MiB 1050MHz  35°C 112.13W      nan%       0MiB                nan -  nan
1620578675.351269    30075.336263     4.600045   0%   203.7MiB 1050MHz  35°C 112.13W      nan%       0MiB                nan -  nan

Note, how we use single quotes to pass --pipe script - this allows
easier use of $ inside the script.

Often, it might be easier to put such --pipe scripts into own script
files, for easier reuse.

Also note that we use slightly bigger sleep (220ms), compared to the
interval of multimonitor (200ms), so we are not flooded by output of
--pipe script, which by the time is display is stale / old (by many
lines). Instead, we allow the reader to block asynchronously, and display
previous value for one cycle, but updated value will come just on the
next cycle / line, not 10 or 100 lines later, depending on pipe/fifo
buffering. (PS. Note that in some locales sleep 0.22 will not work,
use a comma, like sleep 0,22, or switch to more sane locale).

Alternatively a simple approach is to use --exec_async, which is easier
to use, but has a bit extra overhead, and might be delayed a bit extra
due to internal caching of commands executed using --exec_async:

$ multimonitor \
  --gpu=med \
  --process cat \
  --exec_async 'wc -l "/proc/$(pidof -s cat)/maps" 2>/dev/null || echo nan -' \
  --exec_async 'awk "/^VmSize/ {print \$2;}" "/proc/$(pidof -s cat)/status" || echo nan'

Of course, if you know the pid before hand and the target process is
already running, you can compute it and pass once before launching the
multimonitor, to make script even simpler and slightly faster too.

$ P=$(pidof -s cat)
$ multimonitor \
  --gpu=med \
  --pid $P \
  --exec_async "wc -l '/proc/$P/maps' 2>/dev/null || echo nan -" \
  --exec_async "awk '/^VmSize/ {print \$2;}' '/proc/$P/status' || echo nan"

Notice the inversion of single quotes and double quotes, to make it work
correctly.

--exit_when_dead with --sub, --pids or --process

When --exit_when_dead=true is used, then multimonitor will finish
(and terminate other processes) as soon as any of the monitored processes
is in zombie, dead state or gone. Even if that is before the end of
--duration_sec settings.

$ multimonitor --exit_when_dead --sub="exec sleep 2" --duration_sec=3600
# Spawned 2708230 for --sub: exec sleep 2
        UNIX-TIME            TIME      RELTIME      CPU%        RSS
1614694484.970314   768700.437572     0.200172     0.00%       0MiB
1614694485.170316   768700.637394     0.399994     0.00%       0MiB
1614694485.370317   768700.837564     0.600165     0.00%       0MiB
1614694485.570318   768701.037498     0.800098     0.00%       0MiB
1614694485.770320   768701.237419     1.000019     0.00%       0MiB
1614694485.970321   768701.437574     1.200174     0.00%       0MiB
1614694486.170322   768701.637485     1.400085     0.00%       0MiB
1614694486.370324   768701.837537     1.600137     0.00%       0MiB
1614694486.570325   768702.037491     1.800092     0.00%       0MiB
1614694486.770326   768702.237535     2.000135     0.00%       0MiB
$
$ multimonitor_ldc --exit_when_dead --sub="exec sleep 2" --sub="exec sleep 100" --duration_sec=3600
# Spawned 2727867 for --sub: exec sleep 2
# Spawned 2727868 for --sub: exec sleep 100
        UNIX-TIME            TIME      RELTIME      CPU%        RSS      CPU%        RSS
1614694874.680933   769090.149657     0.200049     0.00%       0MiB     0.00%       0MiB
1614694874.880935   769090.349626     0.400018     0.00%       0MiB     0.00%       0MiB
1614694875.080936   769090.549686     0.600078     0.00%       0MiB     0.00%       0MiB
1614694875.280938   769090.749626     0.800018     0.00%       0MiB     0.00%       0MiB
1614694875.480940   769090.949682     1.000074     0.00%       0MiB     0.00%       0MiB
1614694875.680942   769091.149658     1.200050     0.00%       0MiB     0.00%       0MiB
1614694875.880943   769091.349651     1.400043     0.00%       0MiB     0.00%       0MiB
1614694876.080945   769091.549656     1.600048     0.00%       0MiB     0.00%       0MiB
1614694876.280947   769091.749663     1.800055     0.00%       0MiB     0.00%       0MiB
1614694876.480949   769091.949641     2.000033     0.00%       0MiB     0.00%       0MiB
# Sending SIGTERM to not yet terminated pid 2727868

Without this argument, the multimonitor will keep reporting (including
possibly other metrics) until duration is finished, even if any or all
processes are done:

$ ./multimonitor_ldc --exit_when_dead=false --sub="exec sleep 2" --duration_sec=3600
# Spawned 2708485 for --sub: exec sleep 2
        UNIX-TIME            TIME      RELTIME      CPU%        RSS
1614694563.114823   768778.583917     0.200051     0.00%       0MiB
1614694563.314824   768778.783901     0.400035     0.00%       0MiB
1614694563.514825   768778.983913     0.600046     0.00%       0MiB
1614694563.714827   768779.183920     0.800053     0.00%       0MiB
1614694563.914828   768779.383904     1.000038     0.00%       0MiB
1614694564.114829   768779.583923     1.200057     0.00%       0MiB
1614694564.314831   768779.783897     1.400031     0.00%       0MiB
1614694564.514832   768779.983923     1.600056     0.00%       0MiB
1614694564.718833   768780.184043     1.800176     0.00%       0MiB
1614694564.914835   768780.383834     1.999968     0.00%       0MiB
1614694565.114836   768780.583934     2.200068     0.00%       0MiB
1614694565.314837   768780.783898     2.400032     0.00%       0MiB
1614694565.514838   768780.983894     2.600028     0.00%       0MiB
1614694565.714840   768781.183896     2.800029     0.00%       0MiB
1614694565.914841   768781.383904     3.000037     0.00%       0MiB
1614694566.114842   768781.583919     3.200052     0.00%       0MiB
1614694566.314844   768781.783887     3.400021     0.00%       0MiB
1614694566.514845   768781.983901     3.600034     0.00%       0MiB
1614694566.714846   768782.183894     3.800028     0.00%       0MiB
1614694566.914848   768782.383899     4.000033     0.00%       0MiB
1614694567.114849   768782.583880     4.200013     0.00%       0MiB
1614694567.314850   768782.783921     4.400055     0.00%       0MiB
...
...
...

During that period, dead processes might have their CPU% figure
reported as NaN% or nan% or 0.00%. For external processes also the
RSS might be reported as NaNMiB or nanMiB or 0MiB, after they are
gone, for example:

$ ( sleep 3 & );  # Process we will be monitoring.
$ sleep 1;        # Give it some time to start.
$ multimonitor --exit_when_dead=false --process="sleep" --duration_sec=4
# For process name sleep found pids: [2717459]
        UNIX-TIME            TIME      RELTIME      CPU%        RSS
1614694744.711994   768960.183894     0.200052     0.00%       0MiB
1614694744.911996   768960.383852     0.400010     0.00%       0MiB
1614694745.111997   768960.583890     0.600047     0.00%       0MiB
1614694745.311998   768960.783867     0.800024     0.00%       0MiB
1614694745.512000   768960.983882     1.000039     0.00%       0MiB
1614694745.712001   768961.183871     1.200029     0.00%       0MiB
1614694745.912002   768961.383893     1.400050     0.00%       0MiB
1614694746.112003   768961.583865     1.600023     0.00%       0MiB
1614694746.312005   768961.783884     1.800041      nan%       0MiB
1614694746.512006   768961.983868     2.000026      nan%       0MiB
1614694746.712007   768962.183880     2.200038      nan%       0MiB
1614694746.912009   768962.383890     2.400047      nan%       0MiB
1614694747.112010   768962.583872     2.600029      nan%       0MiB
1614694747.312011   768962.783895     2.800052      nan%       0MiB
1614694747.512012   768962.983883     3.000040      nan%       0MiB
1614694747.712014   768963.183895     3.200053      nan%       0MiB
1614694747.912015   768963.383888     3.400046      nan%       0MiB
1614694748.112016   768963.583888     3.600046      nan%       0MiB
1614694748.312018   768963.783887     3.800045      nan%       0MiB
1614694748.512019   768963.983889     4.000047      nan%       0MiB
$

TODO(baryluk): Implement --exit_when_dead=any and --exit_when_dead=all.

TODO(baryluk): Implement --exit_when_dead=anysub and
--exit_when_dead=allsub, which would only take into account the --sub
processes in the early termination logic, not the other ones.

TODO(baryluk): If all (at least one) monitored processes are of --sub
type, automatically use --exit_when_dead=allsub.

TODO(baryluk): Implement / fix this for --pipe too?

--buffered

By default multimonitor flushes unconditionally output after each line. This
makes it nicer to use with other tools like tee, grep, awk, kolumny or
tail -f, for post-processing output in real time, as well saving to a file and
displaying in terminal at the same time. Without this, it would often appear
there is no output, despite small interval used, and for reasonable intervals it
makes sense to disable full output buffering, to make multimonitor more
convenient to use.

However, for very small intervals, it might come at somehow high overhead. If you
use --interval_msec=50, or less, it might be better to use buffered output. To
enable it use --buffered. A standard C library FILE buffers will be used.
This is usually 4KiB, and the behavior can be manipulated using stdbuf command
for example (stdbuf --output=1M multimonitor --buffered --interval_msec=10
can be used to use large - 1MiB - output buffer). Note however, that because
multimonitor does syscalls to clock_nanosleep and clock_gettime at least
once per each output line, the overheads savings aren't going to that
significant.

Consuming output

You are free to do whatever you like with the output. Very often it will be just
scrolling in the terminal for human consumption. However, often it will be saved
to a file or processed in real time by other tools. Here are some suggestions.

  • Usually column 3 will be used as x axis, as it is easiest to use,
    but there are cases where other columns will be more appropriate.

gnuplot

Gnuplot is an advanced command-line driven graphing utility.
https://www.gnuplot.info/

This is just scratching the surface of what is possible with gnuplot, but should
provide some ideas what is possible. The author is using Gnuplot for 20 years and
still discovering new useful features in it.

  • As an input to gnuplot. For example gnuplot -e "plot 'mm.txt' using 3:4; pause -1"
    A good idea for long runs is to output multimonitor stdout to file
    (redirecting in shell, using --auto_output or using tee). Then as it
    grows, see results in gnuplot by doing replot, or re executing your own
    gnuplot script. This makes it a very interactive and insightful, even while
    the output is still being created by multimonitor.
  • Gnuplot can process multiple columns, either as single plot, or multiple
    plots. For example: plot "mm.txt" using 3:($4+$6) will sum up columns
    4 and 6 and plot them as one line. And plot "mm.txt" u 3:4, "" u 3:6
    will plot them as two separate lines on the same plot ("" means to use
    same input file as previous one).
  • For very short runs, especially when using --duration and/or --sub, it
    might be possible to use multimonitor directly in the gnuplot using its
    popen functionality, like this:
    plot "<multimonitor --gpu --duration_sec=5 --sub 'exec glxgears'" using 3:4,
    but this do have limited usability (it is hard to plot more than one column
    from single multimonitor run), and usually it is better to save output to
    the file instead (this could be done using gnuplot system of course,
    or some external script or by hand). In general I don't recommend it.
  • When using gnuplot, and trying to plot multiple separate runs, it is handy
    to use for construct. Like
    plot for [filename in system("ls -1 mm*.txt")] filename u 3:4 w step title filename noenhanced
    or similar. This will automatically create multiple plots for multiple files
    using same column specifiers. Of course multiple other plots can be ploted
    on the same plot, some possible using rigth axis, or using multiplot
    functionality. noenhahced is to not convert underscore as a hint to do
    subscripts. Without it mm_1.txt, would convert into mm₁.txt instead,
    which is ugly, and not what you usually want.
  • When processing multiple files in gnuplot using for, it might be handy to
    modify the title based on a part of filename, for example:
    … t sprintf("CPU %s", strstrt(filename, "zink") > 0 ? "Z" : "R"))
    to extract some details from a filename, instead of a full filename.
  • Similar trick can be applied to offsets of time or normalization of values:
    plot for [...] filename u (offset(filename, $3):4 w step
    where offset, could be defined similar to this:
    offset(filename, t) = strstrt(filename, "zink" > 0 ? 123.1 : 125.0.
  • When using multiplot feature. It often is a good idea to make left and
    right margins a fixed size, so the time axis aligns properly and perfectly.
    Combined with grid, and disabling xtics for all but the last plot, is
    another neat trick to increase the information density.
  • Using set term sixelgd (or GNUPLOT_TERM=sixelgd) in a terminal emulator,
    supporting sixel protocol, allows to easily view graphs directly in your
    terminal (even over SSH). mlterm, xterm, terminology for X11 do support it.
    yarf for Linux framebuffer (without X11). libvte (Gnome/Mate Terminal,
    Terminator) support is coming. Otherwise, using pngcairo, pdfcairo,
    svg for saving to files, or using wxt for interactive display is
    recommended.

See example *.gnuplot files for some inspirations.

http://gnuplot.sourceforge.net/demo_5.5/ is also helpful for new users.

awk, sed, grep, ...

  • As input to awk (or sometimes grep or sed), for example to do simple
    calculations between columns, or to detect specific patterns. Example:
    multimonitor --gpu | awk '{ if ($4 > 80) print $0; }' will only display
    lines with high GPU usage. Doing sums, moving averages, or ratios between
    different columns, or computing own rates is another option.
  • The awk (or other tools) could be processing that in real time using
    pipes, or as a post-processing step later, either in a script, or i.e. in
    popen construct of gnuplot.
    Example: plot "<awk '{print $3, $5+$7;}' mm.txt using 1:2

kolumny

kolumny (https://github.com/baryluk/kolumny) is a type of streaming command
line spreadsheet, primarly used for processing multiple input files in parallel.
It is a good match to processing multimonitor output, as well for many many
more uses.

  • It is extremely handy for doing comparisons between separate runs of
    multimonitor, because of its ability to do mathematical
    post-processing between multiple files. Example:
    kolumny mm1.txt u t1:=3,~a:=4 mm2.txt u ~t2:=3,~b:=4 :~check(isclose(t1,t2)) :a/b
    • Here, the t1:=3 means to use column 3 from file mm1.txt and assign it to
      variable t1, and print it.
    • ~a:=4 means to read column 4 from file
      mm1.txt and assign it to variable a, without printing it.
    • Reading multiple columns at once is possible using arrays too, for example
      ~pids:=4...12.
    • After input file definitions and main variables, various statements and
      expressions are used. Each expression starts with : (or
      with #, which means to turn it off).
    • :~check(isclose(t1,t2)) is to ensure t1 and t2 are close (which use
      3rd column, so that RELTIME from both files are aligned on each row).
    • ~ is to suppress output of True to the output.
    • :a/b computes value of a/b (ratio between 4th columns) and outputs it.
    • It is also possible to declare new variables, i.e. :~ratio:=a/b, and use
      them in other expression (as long there are no cycles, they will be
      evaluated in proper order, while maintaining same print order as given on
      command line - forward references are supported). For example
      :(1.0-ratio)**2 to output a new column that uses other column, columns
      or expressions defined on command line.
    • Any Python expression can be used, and extra functions can be imported
      using --import.
    • There are also tricks to carry variables and state across rows, do dynamic
      column lookups, arrays (multiple columns) and other tricks.
    • u is shorthand for using (just like Gnuplot).
    • Similarly s is shorthand for skip (just like Gnuplot).
    • kolumny also supports reading directly from standard input, or
      subprocesses <program (Just like Gnuplot), so
      multimonitor --pids 1,2,3 | kolumny - u t:=3,~cm:=4...9 ':sum(cm[0:2:])'
      is an option for example (to sum CPU usage of 3 processes).
      Or equivalently:
      kolumny "<multimonitor --pids 1,2,3" u t:=3,~cm:=4...9 ':sum(cm[0:2:])'"
      The second form supports multiple conccurent input processes (and other
      input files), if required.
    • kolumny can also read variable amount of columns into its arrays, by
      indexing from right:
      kolumny "<multimonitor --process firefox" u t:=3,~cpu_mem:=4...-1 ':~cpu:=cpu_mem[0:2:]' ':sum(cpu)'
      will sum CPU usage of all firefox processes, no matter how many there are.
  • Combining kolumny and gnuplot in one, example:
    plot "<kolumny mm1.txt using t1:=3,~a:=4 mm2.txt using ~t2:=3,~b:=4 :~check(isclose(t1,t2)) :a/b" using 1:2 with lines
    to compute ratio between column 4 from file mm1.txt and mm2.txt,
    without needing to code this in a separate file or script. The CLI syntax
    of kolumny was optimized to not require extensive use of quoting and
    escaping when using with tandem with gnuplot (this is why usage of spaces
    is limited, and varaiables doesn't use $ like awk for example).
  • Of course it is also to combine all in one gnuplot+multimonitor+kolumny,
    but even on a single command line, but these can get out of hand quickly,
    if you are not very familiar with these tools.

Other simple uses, paste, Python

  • For extremely simple cases, i.e. two files being compared, as primitive
    substitute for kolumny is to use paste, then combine with other tools
    (like awk or gnuplot). Example:
    paste mm1.txt mm2.txt | awk '{print $3, $4/$9; }' could be roughly
    equivalent to example above, assuming 5 columns in mm1.txt file.
    kolumny has advantage of easier processing of bigger number of files,
    and not needing to manually count columns in paste output, which is very
    tedious and errorprone, especially when changing output format.
  • Consuming data in various programming languages like Python or R, should
    be equally easy. Often as easy as doing line.split().

Spreadsheets

Importing or just pasting into your favorite spreadsheet (LibreOffice Calc,
Google Sheets, Microsoft Excel, Gnumeric, Calligra Sheets, etc) is obviously also
a reasonable option, if you are into that.

  • Be aware of date handling and autoformating / "autocorrecting" that many of
    these programs do. Usually incorrectly!
    • For example nan will not be recognized usually.
    • Similarly Unix time might not be fully preserved with full accuracy,
      and usually microseconds will be silently dropped.
    • Also some software might convert Unix time back to date and time not
      correctly, so double check your software (Example 1609631793.656 should
      translate to 2021-01-02T23:56:33.656+00:00).
  • These tools does not like too much units attached directly to numbers,
    so using --human_friendly=false is probably good idea. If you already
    captured some files, something like
    sed -E -e 's/[KMGT]i?(B|Hz|)\b//gi' -e 's/%//g' could often do
    a decent job removing these units.

Other formats

Right now multimonitor doesn't support output in other formats. Author thinks
that the simple column and white space design is super easy to use with many
tools (as can be seen above with various example, like gnuplot, awk,
kolumny). If there is sufficient demand, from users, it might be possible to
easily add csv, tsv or ods output formats for example.

For now, using output as is, or with --human_friendly=false, should work well.

Tips and tricks

In --pipe, --exec and --exec_async be aware of shell escaping rules.
This is quite important for example when using awk. See examples above how it
could be made to work.

If the text returned by --pipe, --exec and --exec_async, is going to have a
variable number of elements, or a text with unknown number of words, AND it is
not a last column of multimonitor, it is recommended to put such output into
double quotes, so Gnuplot (and CSV) can consider it a single column anyway.

Example:

$ multimonitor --exec 'echo "\"$(date)\""'
SECONDS-FROM-EPOCH            TIME      RELTIME                              EXEC
 1609563691.299229   819653.500171     0.200085 "Sat 02 Jan 2021 05:01:31 AM UTC"
 1609563691.499235   819653.700140     0.400054 "Sat 02 Jan 2021 05:01:31 AM UTC"
 1609563691.699241   819653.900147     0.600061 "Sat 02 Jan 2021 05:01:31 AM UTC"
 1609563691.899248   819654.100148     0.800062 "Sat 02 Jan 2021 05:01:31 AM UTC"

$ LC_ALL=pl_PL.UTF-8 TZ=Europe/Zurich multimonitor --exec 'echo "\"$(date)\""'
SECONDS-FROM-EPOCH            TIME      RELTIME                            EXEC
 1609563713.831924   819676.035970     0.200090 "sob, 2 sty 2021, 06:01:53 CET"
 1609563714.031930   819676.235925     0.400045 "sob, 2 sty 2021, 06:01:54 CET"
 1609563714.231936   819676.435947     0.600067 "sob, 2 sty 2021, 06:01:54 CET"
 1609563714.431942   819676.635932     0.800052 "sob, 2 sty 2021, 06:01:54 CET"

Similar, it might be useful when using ls -l, which might format modification
times, in multitude of ways (affected by relative time, time zone, locale
installed, locale set, user preferences set in shell aliases or environment
variables etc, etc).

Usually these problems can be completely avoided, by properly constructing
environment, or command line options. In other cases (i.e. reading actual content
of some dynamic file, or output from network), it might still be good to do
quoting.

TODO(baryluk): Add --qexec (quoted exec), to do it automatically (as well turn
any other quotes into escaped quotes in the output).

Sometimes when using shell pipes in --pipe, it might be wise to unbuffer the
output, this can be done using unbuffer or stdbuf -oL, example:

$ multimonitor --pipe 'sudo stdbuf -oL tail -f /var/log/syslog | stdbuf -oL sed -E -e "s/^(.* kernel: )//"'
SECONDS-FROM-EPOCH            TIME      RELTIME                                                                       PIPE
 1609565358.558647   821320.761188     0.200214 [821272.965974] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
 1609565358.758653   821320.960940     0.399966 [821277.038003] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
 1609565358.958659   821321.161257     0.600282 [821281.117976] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
 1609565359.158665   821321.360902     0.799928 [821285.201977] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
 1609565359.358672   821321.561090     1.000115 [821289.273981] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
...

TODO(baryluk): Maybe add option to suppress new line output, or compress it to
empty, if the output of --exec, --pipe, etc, is same as before. Could be
interesting to mark event transitions, while maintaining a readability.

TODO(baryluk): It would be nice to interpret numbers from the output as
cumulative, and let the multimonitor calculate rates. There is many sources
that would work really well for this, for example: /proc/<pid>/io,
/proc/vmstat, ip -s l. Maybe --exec_rate ?

TODO(baryluk): Conversely, how about also --exec_cumul, so we SUM the numbers
from the pipe, or integrate in time, interpreting samples as average value, but
also taking into account the delay between calls. We can also do version that
just does sum (i.e. if the command returns current cumulative number, and then
resets it to zero).

TODO(baryluk): A more modular approach would be better, for example
--exec:async:in_cumul:out_rate:cache=5s "exec something", this would allow
passing extra options and flags easily to each exec, and also allow for changing
order of options, without needing to remember all things. Other options sync,
text, format, etc. This might be handing when reading multiple values from a
single command, where some values are rates, some others are not, some should be
ignored, etc. Also, :off, could be used to keep it on the command line, parse,
but ultimately ignore. (similar to '#' prefix in kolumny, could be supported
too probably).

Notes

multimonitor is written in D programming language, and can be compiled using
GDC, LDC2 or DMD compilers. multimonitor does use D standard library Phobos,
included with these compilers. There are no other dependencies. Some parts of the
code are written with @nogc to ensure smooth and predictable performance. Other
parts do use GC, but very little allocations are actually performed. It is normal
to see about 1 GC collection cycle (each ≈1ms) per hour in steady state. Code is
optimized for correctness, and speed, with extensive usage of meta programming
facilities of D programming language.

Only Linux is supported. Linux kernel 2.6.32+ required. There are no plans to
support other operating system, as there is a lot of Linux specific code. FreeBSD
version is a reasonable option tho, but would require some porting and testing.
multimonitor extensively use pread syscall, but readfile from Linux 5.11 is
also a possibility.

multimonitor is not a replacement for generic monitoring frameworks and tools
like Prometheus (including node-exporter), collectd, Nagios, etc. These tools are
extensible, support very long collections (even years), from multiple sources and
machines, query languages, dashboards, alerting, application specific
instrumentation, dynamic configuration changes, resampling, time realignment of
multiple metrics, interpolation, etc. Also most of these tools do sample about
once a minute, sometimes once every 5 minutes.

multimonitor instead is used for ad-hoc high-precision high-rate logging of
specific metrics, especially for monitoring few apps, without extra
instrumentation. multimonitor output is also designed to be easily consumed by
other tools like gnuplot, awk, kolumny, etc.

multimonitor is not indented for extremely high rate sampling. It is not a
general data acquisition system, nor it is suitable for isochronous data sampling
with extremely low jitter. 100Hz would be practical limit of the multimonitor.
Higher is possible, but not recommended, due to better tools available (with
lower disk usage, and lower jitter, and lower CPU overheads). But i.e. sampling
external sensors once per second, like temperature, or power usage, are
reasonable use cases, as long as the fact that the output format is somehow
verbose is acceptable. For very constrained systems (microcontrollers or small
SBC systems, with limited storage, small write throughput or slow network
connectivity) other solutions should be considered.

multimonitor is also not well suited for very big number of metrics collected.
40-60 is probably reasonable max, but is already hard to keep track. Sure, more
can be done, but there are better tools available for this. multimonitor is
also not well suited for gathering variable length statistics of generic type.
I.e. want to see temperature of all SCSI drives in the system, or core frequency
of all 32 core on the CPU? Not the best idea, as the number of columns or order
in the output will very between systems, making it very hard to process such
files.

The author use it mostly to monitor GPU benchmarks during run, and log CPU load,
GPU load, GPU frequency, GPU temperature, GPU VRAM, benchmark CPU load, benchmark
RSS usage, benchmark thread count, benchmark active thread count, benchmark
min-avg-max frame rates. These can be then fed to Gnuplot to do plots, possibly
from different runs, for example with different GPU driver version, different
compiler options, etc.

Author was just tired of running 3 or 4 different tools concurrently (from
command line or ad-hoc scripts, which always got lost and forgotten, so needed to
be reinvented each time, usually with slightly different format), all with own
latencies, timestamps formats (or lack of them), significant measurement
inaccuracies, different units, too many columns to easily count in gnuplot, then
needing to do time offsets between multiple input files to align them in time,
and between different runs to correlate changes, setup weird awk or kolumny
scripts to, bring the formats to sanity, do computations between them, or plot
with titles, and labels, etc. multimonitor makes things more uniform, faster,
more accurate and easier to use.

Other tools, were too limiting, too high overhead or too much time to setup. I.e.
Prometheus can't really do sub-second scraping, and was too inaccurate at high
repetition rates, and took hours to setup or fully automate.

With multimonitor graphs like this to do comparisons of many metrics between
different setups are easy to do:

Future / TODO / Ideas:

  • Linux perf integration, i.e. IPC, context switches, CPU migrations, cache
    hits / misses, TLS misses, branch mispredicions, etc.
  • Use pidfd or dirfd/openat for processing processes in /proc and
    /sys. Similarly for searching hwmon entries.
  • Disk usage.
  • Output to file with file rotation / compression.
  • Trigger external tools periodically (i.e. generate webpage or image
    with plot) with new data.
  • More testing on Intel and Nvidia GPUs
  • More testing on SBCs, like Raspberry Pi, Orange Pi, Banana Pi, Odroid,
    etc.
  • Multi-arch QA (i386, amd64, arm64, ppc64el, riscv64, alpha, m68k, s390x).
  • More generic plugin framework, where each "Reader" class can dynamically
    register, providing own formatting of header and columns, adding own
    command line option parsing, own preferences of sync vs async, caching
    policy for async, preferred rates, etc.
  • Support for BSD systems (FreeBSD, NetBSD, OpenBSD, DragonFly).
  • More networking statistics (i.e. routing, iptable, etc)
  • NTP and general time and realtime offset / jitter monitoring.
  • Sampling metrics from Prometheus
  • Netlink TASKSTAT metrics
  • More Linux VM statistics for memory
  • Per-core CPU metrics
  • Better multi GPU measurements, i.e. iGPU + dGPU, 2x dGPU
  • Per-file system statistics, i.e. IO/s, kB/s.
  • Support more hwmon stuff, i.e. NVMe temperature.
  • smartmontools integration, i.e. HDD temperature.
  • Linux namespaces support.
  • Linux cgroups support.
  • KVM and Xen guest monitoring.
  • PCIe power status / link status / speed
  • Battery level readback. Charging power. Input / output voltage for USB PD, etc.
  • USB power / current output from host.
  • IMPI / OpenBMC integration for power
  • PMBus integration
  • A framework for I2C and One-Wire sensors maybe.
  • Ability to compute number of OTHER processes / threads active
    in the system, to estimate any background noise when doing benchmarks.
  • Lower overhead output formats, with nesting and per-object labels:
    • Protocol buffer output
    • JSON output
    • These could help with some stuff, like supporting multiple CPU cores,
      multiple network interfaces, multiple processes, multiple hard drives,
      while supporting order independence by including proper names, or
      other stable ids, etc.
  • GPIB / IEEE 488 and LX support, i.e. reading voltage, current, resistance
    from multimeters, frequency counters, power supplies, electronic loads,
    etc. etc.
  • Logic analyzer support.
  • MIDI input, i.e. reading rotary encoders, pressure sensitive keyboards, etc.
  • SNMP readback sampling. It is an atrocious format, but could be useful to
    read metrics for example from network switches, or printers.
  • Octoprint / 3D printer status sampling, i.e. progress of 3D printing,
    amount of used filament, temperature, etc.
  • Execute read queries in databases, i.e. arbitrary SELECT in SQLite, MySQL
    or PostgreSQL to track something, or performance of these queries
    (i.e. latency).
  • ODB 2 (automotive) support.
  • Configurable system and per-user daemon to capture some log with fixed
    format, with output file rotation.

Explicitly not planned:

  • Generic logging.
  • Event logging.
  • Streaming to databases, or such. You can use other tools to do that with
    multimonitor output.
  • Query language integration. Instead you can feed data from multimonitor
    to lets say kolumny, Python's pandas or numpy, or SQL database of your
    choice.
  • Built-in data processing. These can be done using other tools easily,
    like gnuplot, kolumny, awk, or small Python program.
  • GUI integrated into it. It should be possible to create a separate
    GUI program ("driver") that launches multimonitor and shows results
    on graphs, possibly in real-time, and allows to easily configure
    command line, if needed.
  • Cross-machine integration. You can just call multimonitor on different
    machines using ssh and merge files if needed, either directly, using
    kolumny, or feed directly to tools that don't require time alignment,
    like Gnuplot.
  • Windows or MacOS support. Too much work, complications to the code and
    author doesn't use them.
  • Config files - all options should be passed explicitly on the command line
    for full reproducibility. (However: It might be wise to allow setting
    --utc_nice or --interval_msec by default using an environment variable,
    without loosing column numbering).
baryluk/multimonitor | GitHunt