Logging all C++ destructors, poor mans run-time tracing

89 points by jandeboevrie 3 days ago

loeg 3 days ago

But what was the shutdown bug you were trying to identify? Was this destructor logging actually useful? The article teases the problem and provides detailed instructions for reproducing the logging, but doesn't actually describe solving the problem.

kccqzy 2 days ago

One of my favorite hacky workarounds is to simply call _exit(0) for an immediate exit without running destructors. Most of the time, the destructors are just freeing memory that will be reclaimed by the OS anyways so they are not worth running if you know the program is exiting. And even if the destructor does more than just freeing memory, maybe the work it's doing isn't needed if you know the process is ending soon: maybe it's joining threads or releasing mutexes or deleting timers.

You will find that in a typical C++ codebase, the destructors that do useful things (say, flushing useful buffers and closing files etc) are much fewer.

ignoramous 2 days ago

Sub-optimal if the exiting process must relinquish resources external to the current process or the host.
- gpderetta 2 days ago
  
  Optimal of you are writing crash-only, i.e. reliable, software.
  - TeMPOraL 2 days ago
    
    Suboptimal if you're relying on RAII not for "regular" resources, but more abstract ones like "correctly bookended piece of persisted data".
    Then again, I guess it's optimal if you're writing fault-resistant software.
- akira2501 2 days ago
  
  And you guarantee you can release those resources with out error or exception all the time? If this is not the case then a destructor is the wrong place to be doing this work anyways. If an external service requires a client experiencing an error to manually release resources than that external service has significant design or protocol issues.
  - ignoramous a day ago
    
    > guarantee you can release those resources with out error or exception all the time
    releasing resources (optimally) and resource exhaustion / ageing (crash recovery) are orthogonal.
    > a destructor is the wrong place
    I commented on GP's use of exit() not on finalizers / destructors.
    > If an external service requires a client experiencing an error to manually release resources...
    You perhaps missed what I wrote: It isn't optimal, if a failing client never signals release (think: distributed lock).
    
    akira2501 15 hours ago
    
    > releasing resources (optimally) and resource exhaustion / ageing (crash recovery) are orthogonal.
    In concept. In the reality of implementation they are not.
    > I commented on GP's use of exit() not on finalizers / destructors.
    You said they were "suboptimal." Implying there is an available optimal solution. I'm challenging that exact notion.
    > It isn't optimal,
    You perhaps missed the point. It _can't_ possibly be optimal given the nature of the problem itself. These are the wrong terms to understand the problem in.

jprete 3 days ago

Address/MemorySanitizer are also meant for this kind of problem. https://github.com/google/sanitizers/wiki/AddressSanitizer https://github.com/google/sanitizers/wiki/MemorySanitizer

Also valgrind, but I'm more familiar with the first two.

richardwhiuk 2 days ago

The author explicitly name checks valgrind.
I think plain gdb would have been sufficient if it's exiting with a segfault or terminating...

MontagFTB 2 days ago

I consider Tracy the state of the art for profiling C++ applications. It’s straightforward to integrate, toggle, gather data, analyze, and respond. It’s also open source, but rivals any product you’d have to pay for:

https://github.com/wolfpld/tracy

Veserv 2 days ago

Looks fine, but it does not look like there is a automatic full function entry/exit trace, just sampling. The real benefit is when you do not even need to insert manual instrumentation points, you just hit run and you get a full system trace.
How well does the visualizer handle multi-TB traces? Usually pretty uncommon, but a 10-100 GB is not that hard to produce when doing full tracing.
- jms55 2 days ago
  
  Of note is that tracy is aimed at games, where sampling is often too expensive and not fine-grained enough. Hence the manual instrumenting.
  For the Bevy game engine, we automatically insert tracy spans for each ECS system. In practice, users can just compile with the tracy feature enabled, and get a rough but very usable overview of which part of their game is taking a long time on the CPU.
  - Veserv 2 days ago
    
    I was talking about automatic instrumentation of every single function call by default. No manual instrumentation needed because everything is already instrumented.
    To be fair, you do still want some manual instrumentation to correlate higher level things, but full trace everywhere answers most questions. You also want to be able to manually suppress calls for small functions since that can be performance relevant or distorting, but the point is “default on, manual off” over “default off, manual on”.
    
    jpc0 2 days ago
    
    How would you implement this may I ask? C++ does not have reflection in the language so you at best can do that by hooking into the running application, but C++ also aggressively inlines functions on anything except -O0 which mean your function call might never be a function call. Running at -O0 is alsp just generally a bad idea since many many instances of UB will never get caught.
    The only way I can see doing this at compile time is with a compiler extension but then you are entirely locked in to 1 compiler.
    Maybe if you compile with debug symbols but then well, you are shipping debug symbols...
    
    TickleSteve 2 days ago
    
    gcc has "-finstrument-functions". This calls your code on every function entry and exit. I've used this previously for tracing as described here and to move memory-protection windows around based on the running code.
    
    Veserv a day ago
    
    If you want something passable, most compilers have function prologue/epilogue hooks that you can write in plain code. Realistically, development and test systems are only going to target like 1 or 2 compilers, so it is not very much work. Unless you are distributing a source library and you want to get full-trace telemetry from customer systems in-development, that is probably all you really need to do.
    If you want to really zoom, you need to get the hooks inlined and probably just written in straight assembly. You then need to optimize your binary format and recording system. You then need to start optimizing your memory bandwidth usage when that becomes the bottleneck. Your overhead in the end is basically limited by memory bandwidth; you can only shovel so many tens of GB/s of logging into memory. Note that persistence has likely been infeasible for the last 2 or so orders of magnitude; RAM is likely the only storage consistently fast enough for the data rates you want to generate when doing this.
    
    jpc0 a day ago
    
    Makes sense and that's for three detailed answer. May put that one on the list of things to play with in the future.
    
    andersa 2 days ago
    
    This would be unbelievably inefficient, game engines will be running hundreds of millions of functions per second. And if the code runs 10x slower with the trace active, then it's no longer sensible.
    We use sampling for the cases where this level of detail is needed as it has lower overhead.
    What use case did you find this useful for?
    
    Veserv 2 days ago
    
    No, it is quite reasonable with efficient implementation. 10-50% overhead or so depending on function size distribution (since it is small fixed overhead per call, smaller functions result in a greater fraction of overhead). 10x for just function entry/exit recording would be grotesquely inefficient. You can do inefficient time travel debugging recording for less than that.
    You do need to allocate a ton of memory for the recording buffer to record sizable amounts of trace data. GB per core-second of trace or so (ring buffer so you get to see the last N seconds, not you need to run for less than N seconds) but that is fine during development on normal dev machines.
    It is useful for everything. Why would you not want full traces for everything? It is amazing. We use it for everything internally where I work. Or rather, it is part of it. We actually prefer full time travel debugging during development and automated testing (again, overhead is low enough) but it is not available for everything. So sometimes we are stuck with just traces.
    
    donadigo 2 days ago
    
    Every function would be pretty overkill, but you can automatically install instrumentation on functions on demand or with a pre-determined user selection. You can even instrument on a line basis if your instrumentation is cheap enough. I've experimented with this in a VS extension I'm developing and I could easily browse through a non-trivial game codebase without causing noticeable performance overhead [1]. In the demo, the instrumentation is auto-installed on all functions within the file you opened. Obviously, this is just one project I was testing on but it shows that this type of tracing is feasible.
    [1] https://www.youtube.com/watch?v=3PnVG49SFmU
rerdavies 2 days ago

Alas, not for Linux. I've been using the unloved and mostly abandoned (and mostly awful) google perf tools on Linux. :-(
- jchw 2 days ago
  
  Hmm? I haven't used Tracy yet but the demo trace they show at the URL linked on GitHub[1] sure looks like a trace from an application running on Linux. The documentation[2] also seems to reference what you need to run it on Linux, and the NixOS derivation[3] also suggests it runs on at least Linux and macOS, and I was able to run several of the binaries including the UI and capture binary. I still hesitate to doubt you on this because I haven't figured out how one is supposed to actually use it but it surely seems to support Linux. (I will definitely find a use for this, it looks amazing.)
  [1]: https://tracy.nereid.pl/
  [2]: https://github.com/wolfpld/tracy/releases/latest/download/tr...
  [3]: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/devel...
- dkersten 2 days ago
  
  I’ve used Tracy on Linux about two months ago. Works fine.
- HellsMaddy 2 days ago
  
  Tracy works fine on Linux! Just used it today.
  - rerdavies a day ago
    
    OK. Thanks for straightening me out. I saw the .msi, but no linux packages. I will definitely check it out!
    
    las_balas_tres a day ago
    
    That github repo contains the sources. You can also compile tracy by cloning the repo.

rqtwteye 3 days ago

I did this a long time ago with macros. It helped me to find a ton of leaks in a huge video codec codebase.

I still don't understand the hate for the C preprocessor. It enables doing this like this without any overhead. Set a flag and you get constructor/destructor logging and whatever else you want. Don't set it and you get the regular behavior. Zero overhead.

jonathrg 3 days ago

The hate might have to do with it being such a primitive and blunt tool; doing anything moderately complex becomes extremely complicated and fragile.
- akira2501 2 days ago
  
  Complex arrangements _are_ fragile.
  There is no magic environment that can fix this for you. If you feel you've seen one, then you've focused on the parts of it that were important to you, while ignoring the parts of it everyone else actually needs.
- tialaramex 2 days ago
  
  Yeah, this very primitive tool easily creates the programming equivalent of the "iwizard problem".
  [You replace straight forwardly "mage" with "wizard" and oops, now your images are "iwizards" and your "magenta" is "wizardnta"]
- immibis 2 days ago
  
  By contrast something like AspectJ for C++ (if such a thing existed) could express this requirement cleanly.
synergy20 2 days ago

do you have a write-up how you did it? I'm interested, thanks.
- naruhodo 2 days ago
  I did a similar thing, in C++, 3 decades ago. I used a macro, FUNC(), that I would put at the start of functions. It took no arguments and declared a local instance, using the __FUNC__ preprocessor builtin to pass the function name to the Trace constructor:
  Trace trace##__LINE__(__FUNC__);
  The Trace instance would generate one log on construction and another on destruction. It also kept track of function call nesting (a counter) in a static member that would increment in the constructor and decrement in the destructor. It was inherently single-threaded, because I used a static member, but it could be adapted to multiple threads using thread local storage. I paired it with a LINE("Var x is " << x); macro for arbitrary ostreams-style logging. And building on that, EXPR(x) would do LINE(#x " = " << (x)). The output was along the lines of:
  ,- A::f() | ,- A::g() | | ,- B::B() | | `- B::~B() | | x = 12 | | About to do a thing... | | ,- A::doAThing() | | `- A::doAThing() | `- A::g() `- A::f()
  The macros could be disabled (defined to do nothing) by a preprocessor symbol.

neverartful 3 days ago

I did something similar once but my implementation didn't rely on any compiler features. I made tracing macros for constructors, destructors, and regular c++ methods. If the tracing was turned on in the macros, the information given to the macro (class name, method name, etc.) would be passed to the tracing manager. The tracing manager would serialize to a string and send it through a TCP socket. I also wrote a GUI tracing monitor that would listen on a socket for tracing messages and then display the trace messages received (including counts by class and method). The tracing monitor had filters to tweak. It was a nice tool to have and was very instrumental in finding memory leaks and obscure crashes. This was back in the late 1990s or early 2000s.

rerdavies 2 days ago

Just spent three days of debugging hell getting my app to shut down gracefully, so that it gracefully turns off all the things that it asynchronously turned on without performing use-after deletes). I can sympathise with that.

meindnoch 3 days ago

And what was the bug in the end?

meisel 3 days ago

I’d say address sanitizer is a better starting point, and likely to show memory issues faster than this

tempodox 2 days ago

Yep, ASAN to find use-after-free, and valgrind memcheck to find forgotten-to-free.
- manwe150 2 days ago
  
  ASAN also checks for memory leaks like valgrind, the main difference with the tools is whether you can recompile all of libraries to get the compiler support for detection or whether binary instrumentation is better (https://github.com/google/sanitizers/wiki/AddressSanitizerLe...)
  - tempodox 2 days ago
    
    Thanks, I didn't know the `ASAN_OPTIONS` part for macOS.

tempodox 2 days ago

Too bad, it doesn't work like this on macOS.

grahamj 3 days ago

[flagged]