Skip to content

Don't record private IP literals as outbound hostnames (Zen alert flood)#308

Merged
bitterpanda63 merged 1 commit into
mainfrom
fix/dns-collector-skip-private-ip-literals
Jun 26, 2026
Merged

Don't record private IP literals as outbound hostnames (Zen alert flood)#308
bitterpanda63 merged 1 commit into
mainfrom
fix/dns-collector-skip-private-ip-literals

Conversation

@Mishenevd

@Mishenevd Mishenevd commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

The Zen Java agent flooded the "New outbound connection detected" feature with private/internal IP addresses on port 0 (e.g. 10.0.0.0, 172.16.0.0, 192.168.0.0, 169.254.0.0, 100.64.0.0, 127.0.0.1, 10.20.x.x). This was reported by a customer after a Spring Boot 4.1 upgrade (Minze / "Lutastic API", agent v1.1.29).

This PR stops the agent from recording private IP literals as outbound hostnames. Real DNS names (incl. internal ones that resolve to private IPs) are still recorded; public IPs and all security checks are unaffected.

Why the bug existed

DNSRecordCollector.report() is invoked from the getAllByName hook (InetAddressWrapper) and recorded every getAllByName(host) argument into HostnamesStore — the store that powers the outbound-domains/connections feature — without distinguishing a real domain from a raw IP literal:

// before
if (!ports.isEmpty()) {
    for (int port : ports) HostnamesStore.incrementHits(hostname, port);
} else {
    HostnamesStore.incrementHits(hostname, 0); // <- records IP literals too
}

getAllByName also accepts IP literals (it just parses them, no DNS). So whenever the runtime resolves a private IP literal directly, the agent recorded it as a brand-new "outbound domain". Port is 0 because no HTTP URL/port is associated with these resolutions (URLCollector only registers a pending port for http(s) URLs).

Observed sources of private-IP-literal getAllByName calls:

  • Reactor Netty (WebClient) DNS-resolver bootstrap — resolves the system nameserver/gateway addresses from /etc/resolv.conf and wildcard binds (0.0.0.0, ::, 10.x.x.x, 192.168.x.x). In ECS/Fargate these are exactly 169.254.169.253, VPC DNS 10.x, etc.
  • Service discovery / connect-by-IP — the app connects to already-resolved task IPs (10.20.x.x).
  • A startup library building a private-IP matcher — the *.0.0 CIDR base addresses in the alerts (10.0.0.0, 172.16.0.0, …) match the RFC1918 ranges exactly, i.e. something resolves each range's base address once at startup.

The framework version itself is not the cause (see below); the upgrade changed which HTTP client / resolver path is exercised.

The fix

DNSRecordCollector.report() returns early when the looked-up host is a private IP literal, before it records anything or runs outbound blocking:

Set<Integer> ports = PendingHostnamesStore.getAndRemove(hostname);

// Don't report private/internal IP literals as outbound connections (consistent
// with the other Zen agents). Full return so we also skip outbound blocking;
// otherwise lockdown mode (blockNewOutgoingRequests) would block these internal
// resolutions and break the app.
if (IsPrivateIP.isPrivateIp(hostname)) {
    return;
}
// ... record hostname, outbound blocking, SSRF (unchanged) ...

A first pass only skipped the HostnamesStore record but still fell through to the outbound-blocking check, which blocks private IPs in lockdown mode. The early return skips both.

  • The outbound-domains feature is for domains. Internal infra IPs are not domains, and the other Zen agents already ignore private IPs.
  • Real hostnames that resolve to private IPs (e.g. keycloak.internal...) are not literals, so they still flow through: recorded by name, subject to lockdown, and SSRF-checked.
  • SSRF / stored-SSRF is unaffected. It never fires on an IP literal anyway (hostname == ip is treated as "no resolution, safe").
  • Public IP literals are unaffected and stay visible.

Behaviour

Scenario Result
Resolve or connect to a private IP literal (getAllByName("10.20.11.143"), or Netty bootstrap resolving 0.0.0.0 / nameservers) Fully ignored. Not recorded as an outbound connection, and not run through outbound blocking, so lockdown mode does not block it.
Private IP reached via a URL (http://10.0.0.1:8080) URLCollector registers the pending port, then getAllByName("10.0.0.1") returns early. Nothing recorded, not blocked in lockdown, and the pending port is still consumed.
Outbound to a real domain, including internal names that resolve to a private IP (keycloak.internal...) Unchanged. Hostname recorded, lockdown still applies, SSRF / stored-SSRF still run on the resolved IPs.
Public IP literal Unchanged. Still recorded.

How it reproduces

A plain Spring Boot app + agent, making a WebClient (Reactor Netty) call, is enough — the resolver bootstrap records private infra IPs on port 0. Recording is identical on Spring Boot 3.3.5 and 4.1.0 (so it is client/runtime-driven, not a framework-version regression). RestTemplate and the JDK HttpClient record the hostname instead and do not leak.

How we tested it (e2e, offline)

Fully local, no cloud: a mock that captures the heartbeat payload (the hostnames array that would be sent to Zen), a Spring Boot app run under the released agent vs the patched agent, probing each HTTP client against a hostname that resolves to a private IP.

Result (Spring Boot 4.1.0, identical load):

Hostnames kept Private IP literals recorded Verdict
released 1.1.29 intsvc.local, localhost 0.0.0.0, 10.2.0.1, :: 🔴 leak
patched intsvc.local, localhost — none 🟢 clean

Names are preserved; private IP literals no longer reach the cloud.

⚠️ Build note: agent.jar shades agent_api, and the getAllByName advice loads DNSRecordCollector via a parent-first classloader — so the shaded copy in agent.jar wins. Both agent.jar and agent_api.jar must be rebuilt for the fix to take effect.

CLI commands

# Unit tests for the collector
./gradlew agent_api:test --tests "collectors.DNSRecordCollectorTest"

# Rebuild BOTH jars (agent.jar shades agent_api -> both required)
./gradlew agent:shadowJar agent_api:shadowJar
cp agent/build/libs/agent*-all.jar     dist/agent.jar
cp agent_api/build/libs/agent*-all.jar dist/agent_api.jar

# Run a Spring Boot app under the agent, pointed at a local mock cloud,
# with debug logging so getAllByName interceptions are visible
export AIKIDO_TOKEN="AIK_RUNTIME_dummy"
export AIKIDO_BLOCK=false
export AIKIDO_DEBUG=true
export AIKIDO_ENDPOINT="http://localhost:9999/"
export AIKIDO_REALTIME_ENDPOINT="http://localhost:9999/"
java -javaagent:dist/agent.jar -jar target/app.jar
# -> trigger a WebClient/Reactor Netty call; confirm no private IP literals
#    appear in the heartbeat 'hostnames' payload.

Tests

  • testPrivateIpLiteralNotRecordedAsOutboundHostname — private IP literals (incl. RFC1918 base addresses, 10.20.x.x, 127.0.0.1) are not recorded.
  • testPrivateIpLiteralWithPendingPortStillConsumedButNotRecorded — pending port consumed, nothing recorded.
  • testHostnameResolvingToPrivateIpStillRecorded — internal DNS name still recorded by name.
  • testPublicIpLiteralStillRecorded — public IP literals unaffected.
  • testPrivateIpLiteralNotBlockedInLockdownMode — a private IP literal is not blocked in lockdown mode.
  • testPrivateIpLiteralViaUrlInLockdownNotBlockedNorRecorded — private IP via URL: not recorded, not blocked, pending port consumed.

🤖 Generated with Claude Code

DNSRecordCollector recorded every getAllByName() argument as an outbound
hostname, including raw IP literals. When something resolves a private/internal
IP literal directly (Reactor Netty DNS-resolver bootstrap resolving
nameserver/gateway addresses, service discovery connecting by IP, a library
building a private-IP matcher at startup, ...), the agent flooded the
"new outbound connection" feature with private IPs on port 0.

Skip recording into HostnamesStore when the looked-up host is a private IP
literal (IsPrivateIP.isPrivateIp). Real DNS names that resolve to private IPs
are still recorded by name; public IP literals are unaffected; SSRF/stored-SSRF,
stats and outbound-domain blocking are unchanged. Pending ports are still
consumed so they can't leak into a later lookup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@bitterpanda63 bitterpanda63 merged commit 8987c88 into main Jun 26, 2026
246 of 247 checks passed
@bitterpanda63 bitterpanda63 deleted the fix/dns-collector-skip-private-ip-literals branch June 26, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants