Skip to content

fix(langchain): convert prompt variables with non-word characters#1708

Open
i-anubhav-anand wants to merge 2 commits into
langfuse:mainfrom
i-anubhav-anand:fix/langchain-prompt-special-var-names
Open

fix(langchain): convert prompt variables with non-word characters#1708
i-anubhav-anand wants to merge 2 commits into
langfuse:mainfrom
i-anubhav-anand:fix/langchain-prompt-special-var-names

Conversation

@i-anubhav-anand

@i-anubhav-anand i-anubhav-anand commented Jun 14, 2026

Copy link
Copy Markdown

When converting a Langfuse prompt to a LangChain template, _get_langchain_prompt_string only matched \w+ variable names:

re.sub(r"{{\s*(\w+)\s*}}", r"{\g<1>}", json_escaped_content)

But Langfuse variable names are permissive (the template parser just .strip()s the name), so a valid prompt like Hello {{user-name}}! is left unchanged. LangChain then reads {{ as a literal brace, so the variable is silently dropped and can never be filled:

client.get_langchain_prompt()                     # "Hello {{user-name}}!"  (not converted)
PromptTemplate.from_template(...).input_variables  # []
.format(...)                                       # "Hello {user-name}!"   (un-fillable)

Fix broadens the capture to [^{}"]+?, so any variable name (hyphens, spaces, unicode, …) converts, while still excluding already-escaped JSON — which appears as {{"key": ...}} after _escape_json_for_langchain and must stay doubled. Verified the JSON case stays correctly escaped (the existing JSON-escaping tests still pass).

Test

Adds a unit test in TestLangchainPromptCompilation for a hyphenated variable (fails before: left as {{user-name}}). All 33 tests in the file pass. ruff check/format clean.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Greptile Summary

This PR fixes a silent variable-drop bug in _get_langchain_prompt_string where Langfuse variable names containing non-word characters (hyphens, spaces, unicode) were never converted to LangChain single-brace format, leaving them as un-fillable {{...}} literals.

  • The regex capture group is broadened from \w+ to [^{}"]+?, so any variable name not containing braces or double-quotes now converts correctly.
  • A new unit test (test_get_langchain_prompt_preserves_special_variable_names) covers the hyphen case end-to-end through PromptTemplate.format.
  • The fix is one line in model.py; all existing JSON-escaping tests continue to pass per the PR description.

Confidence Score: 3/5

The core regex fix is correct for double-quoted JSON content, but it introduces a regression for single-quote–prefixed content that _escape_json_for_langchain also escapes.

The escape function at line 213 doubles braces for both ' and " prefixes, so a prompt containing {'key': 'value'} is escaped to {{'key': 'value'}}. The new [^{}"]+? character class excludes " but not ', so that escaped pair is still matched and incorrectly converted to a LangChain placeholder instead of being left as a literal. The old \w+ was accidentally immune to this. The fix is one character away ([^{}"']+?), but as written the PR leaves this gap.

langfuse/model.py — specifically the character class in _get_langchain_prompt_string and whether the new test suite covers single-quote–delimited content going through _escape_json_for_langchain.

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
langfuse/model.py:170
`_escape_json_for_langchain` doubles braces for both `"` **and** `'`-prefixed content (line 213: `text[j] in {"'", '"'}`). A prompt like `The result is {'key': 'value'}` is escaped to `{{'key': 'value'}}`, but `[^{}"]+?` does not exclude `'`, so the new regex matches it and converts it to `{'key': 'value'}` — a LangChain placeholder — instead of leaving the brace-pair literal. The old `\w+` was accidentally safe here; this PR introduces a regression for single-quote–prefixed content. Adding `'` to the exclusion class keeps parity with the escaper.

Reviews (1): Last reviewed commit: "fix(langchain): convert prompt variables..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

_get_langchain_prompt_string only matched {{ \w+ }} when converting Langfuse
mustache variables to langchain single-brace placeholders, so variables whose
names contain hyphens, spaces, or unicode (all valid Langfuse variable names)
were left as {{name}} and silently lost: PromptTemplate reads {{ as a literal
brace, so the variable became un-fillable.

Broaden the capture to [^{}"]+? so any variable name converts, while still
excluding already-escaped JSON (which appears as {{"key": ...}} and must stay
doubled). Adds a unit test (fails before: variable left as {{user-name}}).

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Comment thread langfuse/model.py Outdated
# with hyphens, spaces, unicode, etc.), not just \w+. The character class
# excludes braces and quotes so already-escaped JSON (which appears as
# {{"key": ...}} after _escape_json_for_langchain) is left untouched.
return re.sub(r'{{\s*([^{}"]+?)\s*}}', r"{\g<1>}", json_escaped_content)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _escape_json_for_langchain doubles braces for both " and '-prefixed content (line 213: text[j] in {"'", '"'}). A prompt like The result is {'key': 'value'} is escaped to {{'key': 'value'}}, but [^{}"]+? does not exclude ', so the new regex matches it and converts it to {'key': 'value'} — a LangChain placeholder — instead of leaving the brace-pair literal. The old \w+ was accidentally safe here; this PR introduces a regression for single-quote–prefixed content. Adding ' to the exclusion class keeps parity with the escaper.

Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/model.py
Line: 170

Comment:
`_escape_json_for_langchain` doubles braces for both `"` **and** `'`-prefixed content (line 213: `text[j] in {"'", '"'}`). A prompt like `The result is {'key': 'value'}` is escaped to `{{'key': 'value'}}`, but `[^{}"]+?` does not exclude `'`, so the new regex matches it and converts it to `{'key': 'value'}` — a LangChain placeholder — instead of leaving the brace-pair literal. The old `\w+` was accidentally safe here; this PR introduces a regression for single-quote–prefixed content. Adding `'` to the exclusion class keeps parity with the escaper.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in the latest commit. _escape_json_for_langchain doubles both quote styles, so I broadened the exclusion class to [^{}"']+? and added a regression test (test_get_langchain_prompt_leaves_quoted_json_escaped) covering single-quote JSON.

Address review feedback: _escape_json_for_langchain doubles braces for both
"-prefixed and '-prefixed content, so the variable regex must exclude both
quote styles or it un-escapes single-quote JSON like {{'key': 'value'}}.
Broaden exclusion class to [^{}"'] and add a regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant