ValidateCommandSequence error message exceeds gRPC metadata limit on large command lists#10810
Open
goingforstudying-ctrl wants to merge 1 commit into
Conversation
9a11487 to
5d064e8
Compare
…ta overflow When ValidateCommandSequence reports an invalid command order, the error message includes the full command list. For workflows with hundreds of commands, this can exceed gRPC's default HTTP/2 header limit (~8 KB), causing the server to send RST_STREAM or a HeaderListSizeException instead of a useful error. Add a truncatedCommandTypes helper that caps the serialized command list at 2048 characters, cutting on a comma boundary so individual command names are never broken. If the list is short, the output is unchanged.
5d064e8 to
f904a15
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Was looking through the shutdown path after a crash report (NETDATA-AGENT-43R) and realized the thread cleanup has a nasty race.
The problem is
nd_thread_join_threads()auto-drain and directnd_thread_join()callers can both touch the same exited thread. The auto-drain pulls it off the exited list and callsnd_thread_join(), which frees theND_THREADstruct. But if another code path (likecancel_main_threads()) still holds the same pointer and callsnd_thread_join()a moment later, it's reading freed memory. That's exactly what the crash trace shows —freez(nti)insidend_thread_joinaborting because the chunk header is already corrupted.The CAS guard on
NETDATA_THREAD_STATUS_JOINEDonly stops two threads from joining the same struct simultaneously. It doesn't help when one caller already freed it and another still has a stale pointer.Fix: add a refcount to
ND_THREAD.nd_thread_join_threads()bumps it when it pulls a thread from the exited list, so the auto-drain path holds a reference.nd_thread_join()decrements it. Whoever reaches zero frees the struct.This way both paths can safely call
nd_thread_join()and the struct stays alive until the last one finishes.Also added a cmocka test covering:
Fixes #22716