Tracking down macOS memory error(s) with ASAN [closed]

3 weeks ago 24
ARTICLE AD BOX

I'm casting a bit about in the dark here. I'm experiencing a memory corruption error unpredictably/rarely on macOS that I am having a difficult time tracking down.

The crash in the debugger (with ASAN enabled), when it randomly occurs, looks like this:

==51174==ERROR: AddressSanitizer: stack-use-after-scope on address 0x000114f5cc40 at pc 0x00010f6d1576 bp 0x7ff7b0889ba0 sp 0x7ff7b0889b98 READ of size 1 at 0x000114f5cc40 thread T0 #0 0x00010f6d1575 in std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>::__is_long[abi:de200100]() const+0x45 (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x100060575) #1 0x00010f6d14f8 in std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>::size[abi:de200100]() const+0x18 (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x1000604f8) #2 0x00010f6e3524 in std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>::length[abi:de200100]() const+0x14 (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x100072524) #3 0x00010f6a39a4 in wxString::length() const+0x14 (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x1000329a4) #4 0x00010f6a5db4 in wxString::Length() const+0x14 (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x100034db4) #5 0x000135e59709 (<unknown module>) #6 0x00010f712c0f in _GLOBAL__sub_I_myfile.cpp+0x4f (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x1000a1c0f) #7 0x000110631beb in ...normal stack trace from here down... (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x100fc0beb)

(I'm using wxWidgets, where for instance wxString is a thin wrapper around std::string. I'm assuming at this point because wxWidgets is widely used and well-tested that this is not a library issue, because I haven't seen it reported elsewhere and there's nothing in the wx code that looks at all complicit.)

or:

==2272==ERROR: AddressSanitizer: heap-use-after-free on address 0x604002385158 at pc 0x00010f6e041b bp 0x7ff7b088aea0 sp 0x7ff7b088ae98 READ of size 8 at 0x604002385158 thread T0 #0 0x00010f6e041a in std::__1::vector<TextLeaf*, std::__1::allocator<TextLeaf*>>::size[abi:de200100]() const+0x3a (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x10006e41a) #1 0x00010f94e254 in wxBaseObjectArray<TextLeaf, wxObjectArrayTraitsForTextLeaves>::GetCount() const+0x14 (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x1002dc254) #2 0x00010f957c3b in Paragraph::GetText()+0x4b (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x1002e5c3b) #3 0x000115e4085e (<unknown module>) #4 0x00010f713b9f in _GLOBAL__sub_I_myfile.cpp+0x4f (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x1000a1b9f) #5 0x00010fc0b54d in ...normal stack trace from here down... (/MyApp.app/Contents/MacOS/MyApp:x86_64+0x10059954d)

In either case, "...normal stack trace..." is some random yet easily traceable point in MyApp.

When a user hits it in a release build (as I'm assuming it's the same thing), the macOS crash reporter output looks more like:

Thread 0:: Dispatch queue: com.apple.main-thread Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x7ff81929c82e __pthread_kill + 10 1 libsystem_pthread.dylib 0x7ff8192d8b5a pthread_kill + 259 2 libsystem_c.dylib 0x7ff8191e93a6 abort + 126 3 libsystem_malloc.dylib 0x7ff8190ddbff malloc_vreport + 857 4 libsystem_malloc.dylib 0x7ff81910b84f malloc_zone_error + 183 5 libsystem_malloc.dylib 0x7ff8190f749c nanov2_guard_corruption_detected + 34 6 libsystem_malloc.dylib 0x7ff8190f7466 nanov2_allocate_outlined + 407 7 libsystem_malloc.dylib 0x7ff8190f5d4d nanov2_malloc_type + 525 8 libc++abi.dylib 0x7ff81929071b operator new(unsigned long) + 52 9 MyApp 0x10a028ca3 std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>::__init_copy_ctor_external(wchar_t const*, unsigned long) + 99

Or, for a more interesting one:

Thread 8 Crashed:: Dispatch queue: CI::LoadLibrariesQueue 0 libsystem_kernel.dylib 0x7ff80d0f982e __pthread_kill + 10 1 libsystem_pthread.dylib 0x7ff80d135b5a pthread_kill + 259 2 libsystem_c.dylib 0x7ff80d0463a6 abort + 126 3 libsystem_malloc.dylib 0x7ff80cf3abff malloc_vreport + 857 4 libsystem_malloc.dylib 0x7ff80cf6884f malloc_zone_error + 183 5 libsystem_malloc.dylib 0x7ff80cf5449c nanov2_guard_corruption_detected + 34 6 libsystem_malloc.dylib 0x7ff80cf54466 nanov2_allocate_outlined + 407 7 libsystem_malloc.dylib 0x7ff80cf533b6 nanov2_calloc_type + 617 8 CoreFoundation 0x7ff80d1793ee _CFRuntimeCreateInstance + 397 9 CoreFoundation 0x7ff80d178bb3 __CFStringCreateImmutableFunnel3 + 2542 10 CoreFoundation 0x7ff80d1781b7 CFStringCreateWithCString + 62 11 Metal 0x7ff819491283 MTLLibraryDataWithArchive::allocateFunctionNames() + 153 12 Metal 0x7ff81949137c MTLLibraryDataWithArchive::functionNames() + 14 13 CoreImage 0x7ff81a41335b invocation function for block in CI::StitchableKernels::StitchableKernels(CI::MetalContext const*) + 242 14 libdispatch.dylib 0x7ff80cf8849a _dispatch_block_async_invoke2 + 85 15 libdispatch.dylib 0x7ff80cf91bee _dispatch_client_callout + 6 16 libdispatch.dylib 0x7ff80cf8245b _dispatch_lane_serial_drain + 779 17 libdispatch.dylib 0x7ff80cf82ea9 _dispatch_lane_invoke + 382 18 libdispatch.dylib 0x7ff80cf8bd42 _dispatch_root_queue_drain_deferred_wlh + 275 19 libdispatch.dylib 0x7ff80cf8b705 _dispatch_workloop_worker_thread + 871 20 libsystem_pthread.dylib 0x7ff80d132869 _pthread_wqthread + 298 21 libsystem_pthread.dylib 0x7ff80d131843 start_wqthread + 15

(interesting because it's not Thread 0, or MyApp itself, but a CoreImage thread)

So here's where I'm at:

In the debugger stack traces, there's a call into _GLOBAL__sub_I_myfile.cpp, which as I understand it is static initialization of something in myfile.cpp. But the thing is, I'm not sure why static initialization is happening at that point, if I understand correctly what it's doing, and nothing in the preceding "...normal stack trace from here down..." calls anything anywhere near myfile.cpp. (In fact, myfile.cpp is not normally used, and will mostly just return from any calls into it.)

In the past I've noticed ASAN will flag a stack use after scope error with "allocated here..." or a heap error with "previously allocated by..." but in these cases the ASAN output indicates only something like:

[2976, 3016) 'ref.tmp364' (line 385) [3056, 3096) 'ref.tmp372' (line 385) [3136, 3176) 'msg' (line 413) <== Memory access at offset 3136 is inside this variable

where, unfortunately, I cannot find any misuse.

I can only assume I'm walking on memory at some point and it's jumping up randomly later, but cannot for the life of me find where that could possibly be. This isn't a new application: it's been heavily used in the world for 10+ years. The issue is relatively new, but I've been poring over commits without anything popping up as a culprit.

I don't know if any of that is useful but, as I mentioned, I'm at a loss. Given what I have here, is there anything else I should be looking at to try to track down this sort of error?

Read Entire Article