Skip to content

Drop DOCKER-ISOLATION rules #49981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

robmry
Copy link
Contributor

@robmry robmry commented May 14, 2025

- What I did

Simplify iptables rules by dropping Inter-Network Communication (INC) rules, to make behaviour more consistent - and, to avoid carrying inconsistent behaviour forwards into nftables, without introducing semantic differences between iptables and nftables networks.

With this change:

  • Published ports on host addresses can be accessed by containers in other networks (even without the userland-proxy).
  • The rules for direct routing between bridge networks are the same as the rules for direct routing from outside the Docker host (allowed for gw modes "routed" and "nat-unprotected", disallowed for "nat").

Fewer rules, so it's simpler, and perhaps slightly faster.

Background ...

The Inter-Network Communication rules in the iptables chains DOCKER-ISOLATION-STAGE-1 / DOCKER-ISOLATION-STAGE-2 (which are called from filter-FORWARD) currently:

  • Block access from containers in one bridge network, to ports published to host addresses by containers in other bridge networks, when the userland-proxy is disabled.
    • But, that access is allowed when the proxy is enabled.
  • Block access to all ports on container addresses in gateway mode "nat-unprotected" networks.
    • But, those ports can be accessed from anywhere else, including other hosts. Just not other bridge networks.
  • Allow access from containers in "nat" bridge networks to published ports on container addresses in "routed" networks. But, to do that, extra INC rules are added for the routed network.

Since #48724, the INC rules are no longer needed to block access from containers in one network to unpublished ports on container addresses in other networks.

Since #49325, direct routing to containers in NAT networks is blocked by the "raw-PREROUTING" rules that block access from untrusted interfaces (all interfaces apart from the network's own bridge).

- How I did it

Dropped the INC rules to resolve the inconsistencies listed above.

Internal networks (with no access to networks outside the host) were also implemented using rules in the DOCKER-ISOLATION chains. This change moves those rules to a new chain, DOCKER-INTERNAL, and drops the DOCKER-ISOLATION chains.

- How to verify it

New and updated tests.

Also, started a daemon without this change, added some networks to create INC rules in the DOCKER-ISOLATION chains, for normal and internal networks. Stopped that daemon, started one with the change, checked that the DOCKER-ISOLATION chains were removed and the internal network's rules migrated to DOCKER-INTERNAL. (With and without live-restore enabled.)

- Human readable description for the release notes

- The iptables rules for bridge networks have been updated, including removal of the `DOCKER-ISOLATION-STAGE-1` and `DOCKER-ISOLATION-STAGE-2` chains. With these changes:
  - Containers can now access ports published to host addresses by containers in other networks when the userland-proxy is not running.
  - Containers can now access ports on container addresses in other networks that have gateway mode "nat-unprotected".

@robmry robmry self-assigned this May 14, 2025
@robmry robmry added this to the 29.0.0 milestone May 14, 2025
@robmry robmry marked this pull request as ready for review May 15, 2025 09:50
@thaJeztah
Copy link
Member

@robmry I see the milestone v29; does this one have to wait for that, or was that just "need some milestone?"

@robmry
Copy link
Contributor Author

robmry commented May 15, 2025

@robmry I see the milestone v29; does this one have to wait for that, or was that just "need some milestone?"

This one should wait for v29.

Comment on lines +701 to +723
checkHTTP := func(t *testing.T, addr string, expResp bool) {
t.Parallel()
t.Helper()
url := "http://" + net.JoinHostPort(addr, "80")
res := container.RunAttach(ctx, t, c,
container.WithNetworkMode(clientNetName),
container.WithCmd("wget", "-O-", "-T3", url),
)
if expResp {
// 404 Not Found means the server responded, but it's got nothing to serve.
assert.Check(t, is.Contains(res.Stderr.String(), "404 Not Found"), "url: %s", url)
} else {
assert.Check(t, is.Contains(res.Stderr.String(), "download timed out"), "url: %s", url)
}
}
t.Run("w", func(t *testing.T) { // Wait for the parallel tests to complete.
t.Run("ipv4/pub", func(t *testing.T) { checkHTTP(t, pub4, tc.expPubResp) })
t.Run("ipv6/pub", func(t *testing.T) { checkHTTP(t, pub6, tc.expPubResp) })
t.Run("ipv4/unpub", func(t *testing.T) { checkHTTP(t, unpub4, tc.expUnpubResp) })
t.Run("ipv6/unpub", func(t *testing.T) { checkHTTP(t, unpub6, tc.expUnpubResp) })
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checkHTTP := func(t *testing.T, addr string, expResp bool) {
t.Parallel()
t.Helper()
url := "http://" + net.JoinHostPort(addr, "80")
res := container.RunAttach(ctx, t, c,
container.WithNetworkMode(clientNetName),
container.WithCmd("wget", "-O-", "-T3", url),
)
if expResp {
// 404 Not Found means the server responded, but it's got nothing to serve.
assert.Check(t, is.Contains(res.Stderr.String(), "404 Not Found"), "url: %s", url)
} else {
assert.Check(t, is.Contains(res.Stderr.String(), "download timed out"), "url: %s", url)
}
}
t.Run("w", func(t *testing.T) { // Wait for the parallel tests to complete.
t.Run("ipv4/pub", func(t *testing.T) { checkHTTP(t, pub4, tc.expPubResp) })
t.Run("ipv6/pub", func(t *testing.T) { checkHTTP(t, pub6, tc.expPubResp) })
t.Run("ipv4/unpub", func(t *testing.T) { checkHTTP(t, unpub4, tc.expUnpubResp) })
t.Run("ipv6/unpub", func(t *testing.T) { checkHTTP(t, unpub6, tc.expUnpubResp) })
})
checkHTTP := func(addr string, expResp bool) func(*testing.T) {
return func(t *testing.T) {
t.Parallel()
t.Helper()
url := "http://" + net.JoinHostPort(addr, "80")
res := container.RunAttach(ctx, t, c,
container.WithNetworkMode(clientNetName),
container.WithCmd("wget", "-O-", "-T3", url),
)
if expResp {
// 404 Not Found means the server responded, but it's got nothing to serve.
assert.Check(t, is.Contains(res.Stderr.String(), "404 Not Found"), "url: %s", url)
} else {
assert.Check(t, is.Contains(res.Stderr.String(), "download timed out"), "url: %s", url)
}
}
}
t.Run("w", func(t *testing.T) { // Wait for the parallel tests to complete.
t.Run("ipv4/pub", checkHTTP(pub4, tc.expPubResp))
t.Run("ipv6/pub", checkHTTP(pub6, tc.expPubResp))
t.Run("ipv4/unpub", checkHTTP(unpub4, tc.expUnpubResp) )
t.Run("ipv6/unpub", checkHTTP(unpub6, tc.expUnpubResp) )
})

@thaJeztah
Copy link
Member

Let me temporarily move to draft to prevent trigger-happy

@thaJeztah thaJeztah marked this pull request as draft May 20, 2025 19:18
@robmry
Copy link
Contributor Author

robmry commented May 29, 2025

Rebased to resolve conflicts.

The Inter-Network Communication rules in the iptables chains
DOCKER-ISOLATION-STAGE-1 / DOCKER-ISOLATION-STAGE-2 (which are
called from filter-FORWARD) currently:
- Block access from containers in one bridge network, to ports
  published to host addresses by containers in other bridge
  networks, when the userland-proxy is disabled.
  - But, that access is allowed when the proxy is enabled.
- Block access to all ports on container addresses in gateway
  mode "nat-unprotected" networks.
  - But, those ports can be accessed from anywhere else, including
    other hosts. Just not other bridge networks.
- Allow access from containers in "nat" bridge networks to published
  ports on container addresses in "routed" networks. But, to do that,
  extra INC rules are added for the routed network.

The INC rules are no longer needed to block access from containers
in one network to unpublished ports on container addresses in
other networks. Direct routing to containers in NAT networks is
blocked by the "raw-PREROUTING" rules that block access from
untrusted interfaces (all interfaces apart from the network's
own bridge).

Drop these INC rules to resolve the inconsistencies listed above,
with this change:
- Published ports on host addresses can be accessed from containers
  in other networks (even without the userland-proxy).
- The rules for direct routing between bridge networks are the same
  as the rules for direct routing from outside the Docker host
  (allowed for gw modes "routed" and "nat-unprotected", disallowed
  for "nat").

Fewer rules, so it's simpler, and perhaps slightly faster.

Internal networks (with no access to networks outside the host)
are also implemented using rules in the DOCKER-ISOLATION chains.
This change moves those rules to a new chain, DOCKER-INTERNAL,
and drops the DOCKER-ISOLATION chains.

Signed-off-by: Rob Murray <rob.murray@docker.com>
@robmry
Copy link
Contributor Author

robmry commented May 30, 2025

Marked as ready for review again ... the master branch is now 29.x, and we'll cherry-pick for 28.x releases.

@robmry robmry marked this pull request as ready for review May 30, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy