Memory Logger is a simple tool for gathering information about a running process. Such things as memory usage, stdout and stack traces can be sampled, either on a single process or on multiple nodes in a job scheduling environment. Currently PBS is the only job scheduling system supported.

Basic usage

In its simplest use case scenario, you encapsulate the command you would normally run with out Memory Logger as an argument to Memory Logger:

memory_logger.py --run "mpiexec -n 2 /absolute/path/to/moose_test-opt -i simple_diffusion.i -r 5"

Note: Important! Always use absolute paths between those --run double quotes. Memory Logger has no knowledge on paths like: '../../ or ~/some/path'

Memory Logger will launch your process as normal, and begin to track the two processes that launched (according to mpiexec -n 2). When Memory Logger detects that the process/s has exited (for good or for ill), it will explain where the log was saved. You can then instruct Memory Logger to open this file, and display its information either using Gnuplot (--plot), or by formatting the information and outputting it to stdout (--read).

memory_logger.py --read simple_diffusion_memory.log
Date Stamp                 Memory Usage | Percent of MAX memory used: ( 264,260 K )
12 May 09:24:30.972598        118,012 K | -------------------| 44.65%
12 May 09:24:31.236609        118,260 K | -------------------| 44.75%
12 May 09:24:31.503649        143,272 K | --------------------------| 54.21%
12 May 09:24:31.768457        149,708 K | ----------------------------| 56.65%
12 May 09:24:32.032284        195,968 K | ----------------------------------------| 74.15%
12 May 09:24:32.297472        216,764 K | ---------------------------------------------| 82.02%
12 May 09:24:32.561243        221,748 K | ----------------------------------------------| 83.91%
12 May 09:24:32.824947        251,940 K | ------------------------------------------------------| 95.33%
12 May 09:24:33.088981        259,356 K | --------------------------------------------------------| 98.14%
12 May 09:24:33.353685        261,136 K | ---------------------------------------------------------| 98.81%
12 May 09:24:33.617438        263,812 K | ---------------------------------------------------------| 99.83%
12 May 09:24:33.881309        264,208 K | ---------------------------------------------------------| 99.98%
12 May 09:24:34.145606        264,260 K | ------------------------------------------------------------| 100%
12 May 09:24:34.410900              0 K | | 0%
Date Stamp                 Memory Usage | Percent of MAX memory used: ( 264,260 K )

The items displayed are self-explanatory except perhaps for the percentage. The percentage is the current percent of memory used at this point in time out of the total that will be consumed. This is a post processing technique on the log file itself.

If we wanted, we can ask memory_logger to increase its sample rate to obtain more accurate usage:

memory_logger.py  --repeat-rate .01 \
--run "mpiexec -n 2 /absolute/path/to/moose_test-opt -i simple_diffusion.i -r 5"
memory_logger.py --read simple_diffusion_memory.log
Date Stamp                 Memory Usage | Percent of MAX memory used: ( 264,256 K )
12 May 09:43:43.159996        118,904 K | --------------------| 44.99%
12 May 09:43:43.185155        118,908 K | --------------------| 44.99%
12 May 09:43:43.209589        118,908 K | --------------------| 44.99%
12 May 09:43:43.233664        118,912 K | --------------------| 44.99%
12 May 09:43:43.260108        118,916 K | --------------------| 45.00%
12 May 09:43:43.285351        118,932 K | --------------------| 45.00%
12 May 09:43:43.311781        118,968 K | --------------------| 45.01%
12 May 09:43:43.336671        118,976 K | --------------------| 45.02%
12 May 09:43:43.361786        119,004 K | --------------------| 45.03%
12 May 09:43:43.387484        119,032 K | --------------------| 45.04%
12 May 09:43:43.411622        119,040 K | --------------------| 45.04%
12 May 09:43:43.434505        119,040 K | --------------------| 45.04%
12 May 09:43:43.457495        119,356 K | --------------------| 45.16%
12 May 09:43:43.481471        120,484 K | --------------------| 45.59%
12 May 09:43:43.505902        121,644 K | --------------------| 46.03%
12 May 09:43:43.530162        122,808 K | ---------------------| 46.47%
12 May 09:43:43.553933        123,992 K | ---------------------| 46.92%
12 May 09:43:43.577552        125,148 K | ---------------------| 47.35%
12 May 09:43:43.602607        126,196 K | ---------------------| 47.75%
12 May 09:43:43.627065        131,136 K | -----------------------| 49.62%
12 May 09:43:43.652713        138,600 K | -------------------------| 52.44%
12 May 09:43:43.676421        138,608 K | -------------------------| 52.45%
12 May 09:43:43.700441        139,004 K | -------------------------| 52.60%
12 May 09:43:43.725156        140,740 K | -------------------------| 53.25%
12 May 09:43:43.749215        138,880 K | -------------------------| 52.55%
12 May 09:43:43.773056        138,880 K | -------------------------| 52.55%
12 May 09:43:43.796269        144,932 K | --------------------------| 54.84%
12 May 09:43:43.820610        145,744 K | ---------------------------| 55.15%
12 May 09:43:43.844566        147,432 K | ---------------------------| 55.79%
12 May 09:43:43.867456        147,440 K | ---------------------------| 55.79%
12 May 09:43:43.890581        147,440 K | ---------------------------| 55.79%
12 May 09:43:43.915913        147,456 K | ---------------------------| 55.80%
12 May 09:43:43.939711        147,472 K | ---------------------------| 55.80%
12 May 09:43:43.962963        147,476 K | ---------------------------| 55.80%
12 May 09:43:43.988611        147,480 K | ---------------------------| 55.80%
12 May 09:43:44.013140        148,536 K | ---------------------------| 56.20%
12 May 09:43:44.036609        148,712 K | ---------------------------| 56.27%
12 May 09:43:44.059794        157,524 K | ------------------------------| 59.61%
12 May 09:43:44.083616        161,044 K | -------------------------------| 60.94%
12 May 09:43:44.108447        171,956 K | ---------------------------------| 65.07%
12 May 09:43:44.133661        171,956 K | ---------------------------------| 65.07%
12 May 09:43:44.157677        173,428 K | ----------------------------------| 65.62%
12 May 09:43:44.181663        175,576 K | ----------------------------------| 66.44%
12 May 09:43:44.206320        179,792 K | -----------------------------------| 68.03%
12 May 09:43:44.230308        182,856 K | ------------------------------------| 69.19%
12 May 09:43:44.254167        186,556 K | -------------------------------------| 70.59%
12 May 09:43:44.278363        190,628 K | --------------------------------------| 72.13%
12 May 09:43:44.302360        193,552 K | ---------------------------------------| 73.24%
12 May 09:43:44.324826        194,168 K | ---------------------------------------| 73.47%
12 May 09:43:44.349295        194,168 K | ---------------------------------------| 73.47%
12 May 09:43:44.373849        194,168 K | ---------------------------------------| 73.47%
12 May 09:43:44.398137        194,232 K | ---------------------------------------| 73.50%
12 May 09:43:44.421162        194,320 K | ---------------------------------------| 73.53%
12 May 09:43:44.444478        194,320 K | ---------------------------------------| 73.53%
12 May 09:43:44.468703        196,940 K | ----------------------------------------| 74.52%
12 May 09:43:44.492602        203,156 K | ------------------------------------------| 76.87%
12 May 09:43:44.516616        209,884 K | -------------------------------------------| 79.42%
12 May 09:43:44.540679        212,052 K | --------------------------------------------| 80.24%
12 May 09:43:44.565432        214,568 K | ---------------------------------------------| 81.19%
12 May 09:43:44.590527        216,984 K | ---------------------------------------------| 82.11%
12 May 09:43:44.615556        218,756 K | ----------------------------------------------| 82.78%
12 May 09:43:44.640158        219,604 K | ----------------------------------------------| 83.10%
12 May 09:43:44.664301        220,772 K | ----------------------------------------------| 83.54%
12 May 09:43:44.688804        220,772 K | ----------------------------------------------| 83.54%
12 May 09:43:44.713501        220,772 K | ----------------------------------------------| 83.54%
12 May 09:43:44.739387        221,340 K | ----------------------------------------------| 83.75%
12 May 09:43:44.762476        221,360 K | ----------------------------------------------| 83.76%
12 May 09:43:44.787937        221,364 K | ----------------------------------------------| 83.76%
12 May 09:43:44.812091        221,364 K | ----------------------------------------------| 83.76%
12 May 09:43:44.836021        221,736 K | ----------------------------------------------| 83.90%
12 May 09:43:44.860136        222,304 K | -----------------------------------------------| 84.12%
12 May 09:43:44.883518        222,872 K | -----------------------------------------------| 84.33%
12 May 09:43:44.907631        223,432 K | -----------------------------------------------| 84.55%
12 May 09:43:44.931405        224,000 K | -----------------------------------------------| 84.76%
12 May 09:43:44.955857        224,568 K | -----------------------------------------------| 84.98%
12 May 09:43:44.980299        225,228 K | -----------------------------------------------| 85.23%
12 May 09:43:45.005989        225,476 K | -----------------------------------------------| 85.32%
12 May 09:43:45.030230        225,476 K | -----------------------------------------------| 85.32%
12 May 09:43:45.055491        225,476 K | -----------------------------------------------| 85.32%
12 May 09:43:45.081710        242,036 K | ----------------------------------------------------| 91.59%
12 May 09:43:45.106176        248,488 K | -----------------------------------------------------| 94.03%
12 May 09:43:45.131717        252,436 K | ------------------------------------------------------| 95.52%
12 May 09:43:45.154943        252,864 K | -------------------------------------------------------| 95.68%
12 May 09:43:45.180223        252,864 K | -------------------------------------------------------| 95.68%
12 May 09:43:45.204101        252,864 K | -------------------------------------------------------| 95.68%
12 May 09:43:45.231400        252,908 K | -------------------------------------------------------| 95.70%
12 May 09:43:45.256618        252,952 K | -------------------------------------------------------| 95.72%
12 May 09:43:45.280726        252,952 K | -------------------------------------------------------| 95.72%
12 May 09:43:45.305591        252,956 K | -------------------------------------------------------| 95.72%
12 May 09:43:45.329648        252,956 K | -------------------------------------------------------| 95.72%
12 May 09:43:45.352770        260,284 K | --------------------------------------------------------| 98.49%
12 May 09:43:45.378819        260,284 K | --------------------------------------------------------| 98.49%
12 May 09:43:45.402456        260,284 K | --------------------------------------------------------| 98.49%
12 May 09:43:45.427618        260,300 K | --------------------------------------------------------| 98.50%
12 May 09:43:45.453548        260,320 K | --------------------------------------------------------| 98.51%
12 May 09:43:45.479022        260,320 K | --------------------------------------------------------| 98.51%
12 May 09:43:45.501936        260,320 K | --------------------------------------------------------| 98.51%
12 May 09:43:45.528891        260,336 K | --------------------------------------------------------| 98.51%
12 May 09:43:45.552825        262,052 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.577857        262,052 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.601203        262,052 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.625882        262,060 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.649436        262,060 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.673419        262,060 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.698650        262,060 K | ---------------------------------------------------------| 99.16%
12 May 09:43:45.724584        262,076 K | ---------------------------------------------------------| 99.17%
12 May 09:43:45.748744        262,076 K | ---------------------------------------------------------| 99.17%
12 May 09:43:45.772850        262,076 K | ---------------------------------------------------------| 99.17%
12 May 09:43:45.796818        262,076 K | ---------------------------------------------------------| 99.17%
12 May 09:43:45.821326        262,076 K | ---------------------------------------------------------| 99.17%
12 May 09:43:45.845563        262,076 K | ---------------------------------------------------------| 99.17%
12 May 09:43:45.869381        262,128 K | ---------------------------------------------------------| 99.19%
12 May 09:43:45.894707        262,128 K | ---------------------------------------------------------| 99.19%
12 May 09:43:45.918825        262,128 K | ---------------------------------------------------------| 99.19%
12 May 09:43:45.944556        262,128 K | ---------------------------------------------------------| 99.19%
12 May 09:43:45.968172        262,152 K | ---------------------------------------------------------| 99.20%
12 May 09:43:45.991548        263,108 K | ---------------------------------------------------------| 99.56%
12 May 09:43:46.014727        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.040446        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.066373        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.093189        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.119405        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.143494        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.169162        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.193289        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.218197        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.243629        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.268366        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.293263        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.316888        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.341042        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.365135        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.393536        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.417729        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.443079        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.466258        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.492176        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.518379        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.544906        264,256 K | -----------------------------------------------------------| 100%
12 May 09:43:46.568232        237,440 K | --------------------------------------------------| 89.85%
12 May 09:43:46.592186        239,492 K | ---------------------------------------------------| 90.62%
12 May 09:43:46.615820        240,656 K | ---------------------------------------------------| 91.06%
12 May 09:43:46.640438        244,368 K | ----------------------------------------------------| 92.47%
12 May 09:43:46.665082        235,456 K | --------------------------------------------------| 89.10%
12 May 09:43:46.688966        230,344 K | -------------------------------------------------| 87.16%
12 May 09:43:46.713106        195,528 K | ----------------------------------------| 73.99%
12 May 09:43:46.739572        177,140 K | -----------------------------------| 67.03%
12 May 09:43:46.767171              0 K | | 0%
Date Stamp                 Memory Usage | Percent of MAX memory used: ( 264,256 K )

Okay... A bit too accurate, as several samples remained unchanged during tracking. However sometimes more is desirable, though just not printed in this fashion. For data dumps like this, using Gnuplot is far more efficient.

Using Gnuplot

plot_multi.png
We can visualize the results by plotting the data with Gnuplot:

memory_logger.py --plot simple_diffusion-r4_memory.log simple_diffusion-r6_repeat-rate0.01_memory.log

We can render multiple logs simultaneously to allow an easy comparison.

Tracking PBS jobs

Memory Logger has the ability to track your processes across multiple nodes. In order for this to work correctly, we must launch an interactive job (qsub -I). The reason for this, is we can not have PBS launch a bunch of memory_loggers all in the same fashion... Instead we need one memory_logger, acting as the server, while a bunch of others acting as agents gathering data. The only thing we need to do is provide the --pbs argument.

headnode #> qsub -I

node #> memory_logger.py --pbs \
--run "mpiexec /absolute/path/to/moose_test-opt -i simple_diffusion.i -r 5"

When Memory Logger encounters a --pbs argument, it will look at the contents of your $PBS_NODEFILE, to determine what other machines will be used to process your job. Memory Logger will remote into these machines (SSH), and launch its own memory_logger process, instructing it, how to communicate back to the original memory_logger you launched interactively.

Stack Traces, Dark Mode and other cool things

Obtaining stack traces while tracking memory usage on a single machine or across PBS nodes is the key feature of this tool. In order to do so we need to supply an additional two arguments; --pstack and --debugger gdb|lldb. Memory Logger supports two debuggers for the purpose of obtaining a stack trace (gdb or lldb).

memory_logger.py --pstack --debugger gdb \
--run "mpiexec /absolute/path/to/moose_test-opt -i simple_diffusion.i -r 5"

The --pstack argument is used for both tracking and plotting. It tells Memory Logger to actually display stack trace information (if available in the log file).

memory_logger.py --pstack --plot simple_diffusion_memory.log

Stack traces are represented in the form of points along the Gnuplot line graph. Clicking on these dots will cause memory_logger.py to print the actual stack trace gleamed at that time to stdout. However, if you choose to display all this junk on the terminal, you can with --read:

memory_logger.py --pstack --read simple_diffusion_memory.log
<not going to paste this data here. I am telling ya, its pages and pages>

You can also display stdout along the Gnuplot graph:

memory_logger.py --pstack --stdout --plot simple_diffusion_memory.log

That white back ground to bright for you? Try dark mode:

memory_logger.py --pstack \
--darkmode \
--plot simple_diffusion-r4_memory.log simple_diffusion-r6_repeat-rate0.01_memory.log

darkmode.png