fix(langchain): convert prompt variables with non-word characters#1708
fix(langchain): convert prompt variables with non-word characters#1708i-anubhav-anand wants to merge 2 commits into
Conversation
_get_langchain_prompt_string only matched {{ \w+ }} when converting Langfuse
mustache variables to langchain single-brace placeholders, so variables whose
names contain hyphens, spaces, or unicode (all valid Langfuse variable names)
were left as {{name}} and silently lost: PromptTemplate reads {{ as a literal
brace, so the variable became un-fillable.
Broaden the capture to [^{}"]+? so any variable name converts, while still
excluding already-escaped JSON (which appears as {{"key": ...}} and must stay
doubled). Adds a unit test (fails before: variable left as {{user-name}}).
| # with hyphens, spaces, unicode, etc.), not just \w+. The character class | ||
| # excludes braces and quotes so already-escaped JSON (which appears as | ||
| # {{"key": ...}} after _escape_json_for_langchain) is left untouched. | ||
| return re.sub(r'{{\s*([^{}"]+?)\s*}}', r"{\g<1>}", json_escaped_content) |
There was a problem hiding this comment.
_escape_json_for_langchain doubles braces for both " and '-prefixed content (line 213: text[j] in {"'", '"'}). A prompt like The result is {'key': 'value'} is escaped to {{'key': 'value'}}, but [^{}"]+? does not exclude ', so the new regex matches it and converts it to {'key': 'value'} — a LangChain placeholder — instead of leaving the brace-pair literal. The old \w+ was accidentally safe here; this PR introduces a regression for single-quote–prefixed content. Adding ' to the exclusion class keeps parity with the escaper.
Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/model.py
Line: 170
Comment:
`_escape_json_for_langchain` doubles braces for both `"` **and** `'`-prefixed content (line 213: `text[j] in {"'", '"'}`). A prompt like `The result is {'key': 'value'}` is escaped to `{{'key': 'value'}}`, but `[^{}"]+?` does not exclude `'`, so the new regex matches it and converts it to `{'key': 'value'}` — a LangChain placeholder — instead of leaving the brace-pair literal. The old `\w+` was accidentally safe here; this PR introduces a regression for single-quote–prefixed content. Adding `'` to the exclusion class keeps parity with the escaper.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — fixed in the latest commit. _escape_json_for_langchain doubles both quote styles, so I broadened the exclusion class to [^{}"']+? and added a regression test (test_get_langchain_prompt_leaves_quoted_json_escaped) covering single-quote JSON.
Address review feedback: _escape_json_for_langchain doubles braces for both
"-prefixed and '-prefixed content, so the variable regex must exclude both
quote styles or it un-escapes single-quote JSON like {{'key': 'value'}}.
Broaden exclusion class to [^{}"'] and add a regression test.
When converting a Langfuse prompt to a LangChain template,
_get_langchain_prompt_stringonly matched\w+variable names:But Langfuse variable names are permissive (the template parser just
.strip()s the name), so a valid prompt likeHello {{user-name}}!is left unchanged. LangChain then reads{{as a literal brace, so the variable is silently dropped and can never be filled:Fix broadens the capture to
[^{}"]+?, so any variable name (hyphens, spaces, unicode, …) converts, while still excluding already-escaped JSON — which appears as{{"key": ...}}after_escape_json_for_langchainand must stay doubled. Verified the JSON case stays correctly escaped (the existing JSON-escaping tests still pass).Test
Adds a unit test in
TestLangchainPromptCompilationfor a hyphenated variable (fails before: left as{{user-name}}). All 33 tests in the file pass.ruff check/formatclean.Type of change
Greptile Summary
This PR fixes a silent variable-drop bug in
_get_langchain_prompt_stringwhere Langfuse variable names containing non-word characters (hyphens, spaces, unicode) were never converted to LangChain single-brace format, leaving them as un-fillable{{...}}literals.\w+to[^{}"]+?, so any variable name not containing braces or double-quotes now converts correctly.test_get_langchain_prompt_preserves_special_variable_names) covers the hyphen case end-to-end throughPromptTemplate.format.model.py; all existing JSON-escaping tests continue to pass per the PR description.Confidence Score: 3/5
The core regex fix is correct for double-quoted JSON content, but it introduces a regression for single-quote–prefixed content that
_escape_json_for_langchainalso escapes.The escape function at line 213 doubles braces for both
'and"prefixes, so a prompt containing{'key': 'value'}is escaped to{{'key': 'value'}}. The new[^{}"]+?character class excludes"but not', so that escaped pair is still matched and incorrectly converted to a LangChain placeholder instead of being left as a literal. The old\w+was accidentally immune to this. The fix is one character away ([^{}"']+?), but as written the PR leaves this gap.langfuse/model.py — specifically the character class in
_get_langchain_prompt_stringand whether the new test suite covers single-quote–delimited content going through_escape_json_for_langchain.Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(langchain): convert prompt variables..." | Re-trigger Greptile