NPEP-133: FQDN Selector for Egress Traffic¶

Issue: #133
Status: Implementable

TLDR¶

This enhancement proposes adding a new optional selector to specify egress peers using Fully Qualified Domain Names (FQDNs).

Goals¶

Provide a selector to specify egress peers using a Fully Qualified Domain Name (for example kubernetes.io).
Support basic wildcard matching capabilities when specifying FQDNs (for example *.cloud-provider.io)
Currently only ALLOW type rules are proposed.
Safely enforcing DENY rules based on FQDN selectors is difficult as there is no guarantee a Network Policy plugin is aware of all IPs backing a FQDN policy. If a Network Policy plugin has incomplete information, it may accidentally allow traffic to an IP belonging to a denied domain. This would constitute a security breach.

By contrast, ALLOW rules, which may also have an incomplete list of IPs, would not create a security breach. In case of incomplete information, valid traffic would be dropped as the plugin believes the destination IP does not belong to the domain. While this is definitely undesirable, it is at least not an unsafe failure.
Currently only AdminNetworkPolicy is the intended scope for this proposal.
Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this capability to BaselineAdminNetworkPolicy could result in writing baseline rules that can't be replicated by an overriding NetworkPolicy. For example, if BANP allows traffic to example.io, but the namespace admin installs a Kubernetes Network Policy, the namespace admin has no way to replicate the example.io selector using just Kubernetes Network Policies.

Non-Goals¶

This enhancement does not include a FQDN selector for allowing ingress traffic.
This enhancement only describes enhancements to the existing L4 filtering as provided by AdminNetworkPolicy. It does not propose any new L7 matching or filtering capabilities, like matching HTTP traffic or URL paths.
This selector should not control what DNS records are resolvable from a particular workload.
This selector provides no capability to detect traffic destined for different domains backed by the same IP (e.g. CDN or load balancers).
This enhancement does not add any new mechanisms for specifying how traffic is routed to a destination (egress gateways, alternative SNAT IPs, etc). It just adds a new way of specifying packets to be allowed or dropped on the normal egress data path.
This enhancement does not require any mechanism for securing DNS resolution (e.g. DNSSEC or DNS-over-TLS). Unsecured DNS requests are expected to be sufficient for looking up FQDNs.

Introduction¶

FQDN-based egress controls are a common enterprise security practice. Administrators often prefer to write security policies using DNS names such as “www.kubernetes.io” instead of capturing all the IP addresses the DNS name might resolve to. Keeping up with changing IP addresses is a maintenance burden, and hampers the readability of the network policies.

User Stories¶

As a cluster admin, I want to allow all Pods in the cluster to send traffic to an external service specified by a well-known domain name. For example, all Pods must be able to talk to my-service.com.
As a cluster admin, I want to allow Pods in the "monitoring" namespace to be able to send traffic to a logs-sink, hosted at logs-storage.com
As a cluster admin, I want to allow all Pods in the cluster to send traffic to any of the managed services provided by my Cloud Provider. Since the cloud provider has a well known parent domain, I want to allow Pods to send traffic to all sub-domains using a wild-card selector -- *.my-cloud-provider.com
As a cluster admin, I want to allow Pods in the cluster to send traffic to a entire tree of domains. For example, our CDN has domains of the format <session>.<random>.<region>.my-app.cdn.com. I want to be able to use a wild-card selector toallow the full tree of subdomains below **.my-app.cdn.com.

Future User Stories¶

These are some user stories we want to keep in mind, but due to limitations of the existing Network Policy API, cannot be implemented currently. The design goal in this case is to ensure we do not make these unimplementable down the line.

As a cluster admin, I want to switch the default disposition of the cluster to be default deny. This is enforced using a BaselineAdminNetworkPolicy. I also want individual namespace owners to be able to specify their egress peers. Namespace admins would then use a FQDN selector in the Kubernetes NetworkPolicy objects to allow my-service.com.

API¶

This NPEP proposes adding a new type of AdminNetworkPolicyEgressPeer called FQDNPeerSelector which allows specifying domain names.

// DomainName describes one or more domain names to be used as a peer.
//
// DomainName can be an exact match, or use the wildcard specifier '*' to match
// one or more labels.
//
// '*', the wildcard specifier, matches one or more entire labels. It does not
// support partial matches. '*' may only be specified as a prefix.
// 
//  Examples:
//    - `kubernetes.io` matches only `kubernetes.io`.
//      It does not match "www.kubernetes.io", "blog.kubernetes.io",
//      "my-kubernetes.io", or "wikipedia.org".
//    - `blog.kubernetes.io` matches only "blog.kubernetes.io".
//      It does not match "www.kubernetes.io" or "kubernetes.io".
//    - `*.kubernetes.io` matches subdomains of kubernetes.io.
//      "www.kubernetes.io", "blog.kubernetes.io", and
//      "latest.blog.kubernetes.io" match, however "kubernetes.io", and
//      "wikipedia.org" do not.
//
// +kubebuilder:validation:Pattern=`^(\*\.)?([a-zA-z0-9]([-a-zA-Z0-9_]*[a-zA-Z0-9])?\.)+[a-zA-z0-9]([-a-zA-Z0-9_]*[a-zA-Z0-9])?\.?$`
type DomainName string

type AdminNetworkPolicyEgressPeer struct {
    <snipped>
    // DomainNames provides a way to specify domain names as peers.
    // 
    // DomainNames is only supported for Allow rules. In order to control
    // access, DomainNames Allow rules should be used with a lower priority
    // egress deny -- this allows the admin to maintain an explicit "allowlist"
    // of reachable domains.
    // 
    // Support: Extended
    //
    // <network-policy-api:experimental>
    // +optional
    // +listType=set
    // +kubebuilder:validation:MinItems=1
    DomainNames []Domain `json:"domainNames,omitempty"`
}

Examples¶

Pods in `monitoring` namespace can talk to `my-service.com` and `*.cloud-provider.io`¶

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: allow-my-service-egress
spec:
  priority: 55
  subject:
    namespaces:
      matchLabels:
        kubernetes.io/metadata.name: "monitoring"
  egress:
  - name: "allow-to-my-service"
    action: "Allow"
    to:
    - domainNames:
      - "my-service.com"
      - "*.cloud-provider.io"
    ports:
    - portNumber:
        protocol: TCP
        port: 443

Maintaining an allowlist of domains¶

There are a couple ways to maintain an allowlist:

This example, includes the DENY rule in the same ANP object. It's also possible to use another ANP object with a lower priority (e.g. 100 in this example):

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: allow-my-service-egress
spec:
  priority: 55
  subject:
    namespaces:
      matchLabels:
        kubernetes.io/metadata.name: "monitoring"
  egress:
  - name: "allow-to-my-service"
    action: "Allow"
    to:
    - domainNames:
      - "my-service.com"
      - "*.cloud-provider.io"
    ports:
    - portNumber:
        protocol: TCP
        port: 443
  - name: "default-deny"
    action: "Deny"
    to:
    - networks:
      - "0.0.0.0/0"

This example uses a default-deny BaselineAdminNetworkPolicy to create the allowlist:

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: allow-my-service-egress
spec:
  priority: 55
  subject:
    namespaces:
      matchLabels:
        kubernetes.io/metadata.name: "monitoring"
  egress:
  - name: "allow-to-my-service"
    action: "Allow"
    to:
    - domainNames:
      - "my-service.com"
      - "*.cloud-provider.io"
    ports:
    - portNumber:
        protocol: TCP
        port: 443
---
apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
  name: default
spec:
  subject:
    namespaces: {}
  ingress:
    - action: Deny
      to:
      - networks:
        - "0.0.0.0/0"

Expected Behavior¶

A FQDN egress policy does not grant the workload permission to communicate with any in-cluster DNS services (like kube-dns). A separate rule needs to be configured to allow traffic to any DNS servers.
FQDN policies should not affect the ability of workloads to resolve domains, only their ability to communicate with the IP backing them. Put another way, FQDN policies should not result in any form of DNS filtering.
For example, if a policy allows traffic to kubernetes.io, any selected Pods can still resolve wikipedia.org or my-services.default.svc.cluster.local, but can not send traffic to them unless allowed by a different rule.
Each implementation will provide guidance on which DNS name-server is considered authoritative for resolving domain names. This could be the kube-dns Service or potentially some other DNS provider specified in the implementation's configuration.
DNS record querying and lifetimes:
Pods are expected to make a DNS query for a domain before sending traffic to it. If the Pod fails to send a DNS request and instead just sends traffic to the IP (either because of caching or a static config), traffic is not guaranteed to flow.
Pods should respect the TTL of DNS records they receive. Trying to establish new connection using DNS records that are expired is not guaranteed to work.
When the TTL for a DNS record expires, the implementor should stop allowing new connections to that IP. Existing connection will still be allowed (that's consistent with NetworkPolicy behavior on long-running connections).
Implementations must support at least 100 unique IPs (either IPv4 or IPv6) for each domain. This is true for both explicitly specified domains, as well as for each domain selected by a wild-card rule. For example, the rule *.kubernetes.io supports 100 IPs each for both docs.kubernetes.io and blog.kubernetes.io.
PTR records are not required to properly configure a FQDN selector. For example, as long as an A record exists mapping my-hostname to 1.2.3.4, the Network Policy implementation should allow traffic to 1.2.3.4. There is no requirement that a PTR record for 1.2.3.4.in-addr.arpa exist or that it points to my-hostname (it is allowed to point to other-host).
Targeting in-cluster endpoints with FQDN selector is not recommended. There are other selectors which can more precisely capture intent. However, if in-cluster endpoints are selected:
✅︎ Supported:
- Selecting Pods using their generated DNS record (for example pod-ip-address.my-namespace.pod.cluster.local). This is analogous to selecting the Pod by its IP address using the Network selector.
- Headless Services can be selected using their generated DNS record because the generated DNS records contain a list of all the Pod IPs that back the service.
❌ Not Supported:
- ClusterIP Services can not be selected using their generated DNS record (for example my-svc.my-namespace.svc.cluster.local). This is consistent with the behavior when selecting the Service VIP using the Network selector.
- ExternalName Services return a CNAME record. See the entry below about CNAME support.
- Any record which points to the IPs used for LoadBalancer type services. This includes the externalIPs and the .status.loadBalancer.ingress fields
If the specified domain in a FQDN selector resolves to a CNAME record the behavior of the implementor depends on the returned response.

If the upstream resolver used CNAME chasing to fully resolve the domain to a A/AAAA record and returns the resulting chain, the implementor can use this information to allow traffic to the specified IPs. However the implementor does not need to perform their own CNAME chasing or to understand resolutions across multiple DNS requests.

For example, if the FQDN selector is allowing traffic to www.kubernetes.io: * If a DNS query to the upstream resolver returns a single response with the following records:

www.kubernetes.io  --  CNAME to kubernetes.io
kubernetes.io      --  A     to 1.2.3.4

The implementor can use this response to allow traffic to 1.2.3.4 * If DNS query only responds with a CNAME record, the resolver is not required to allow traffic even if subsequent requests resolve the full chain:

# REQUEST 1

www.kubernetes.io  --  CNAME to kubernetes.io

# REQUEST 2

kubernetes.io      --  A     to 1.2.3.4

The implementer can still deny traffic to 1.2.3.4 because no single response contained the full chain required to resolve the domain.

Alternatives¶

IP Block Selector¶

IP blocks are an important tool for specifying Network Policies. However, they do not address all user needs and have a few short-comings when compared to FQDN selectors:

IP-based selectors can become verbose if a single logical service has numerous IPs backing it.
IP-based selectors pose an ongoing maintenance burden for administrators, who need to be aware of changing IPs.
IP-based selectors can result in policies that are difficult to read and audit.

L4 Proxy¶

Users can also configure a L4 Proxy (e.g. using SOCKS) to inspect their traffic and implement egress firewalls. They present a few trade-ofs when compared to a FQDN selector:

Additional configuration and maintenance burden of the proxy application itself
Configuring new routes to direct traffic leaving the application to the L4 proxy.

L7 Policy¶

Another alternative is to provide a L7 selector, similar to the policies provided by Service Mesh providers. While L7 selectors can offer more expressivity, they often come trade-offs that are not suitable for all users:

L7 selectors necessarily support a select set of protocols. Users may be using a custom protocol for application-level communication, but still want the ability to specify endpoints using DNS.
L7 selectors often require proxies to perform deep packet inspection and enforce the policies. These proxies can introduce un-desireable latencies in the datapath of applications.

References¶

NPEP #126: Egress Control in ANP

Implementations¶

The following is a best-effort breakdown of capabilities of different NetworkPolicy providers, as of 2023-09-25. This information may be out-of-date, or inaccurate.

	Antrea	Calico	Cilium	OpenShift (current)	OpenShift (future)
Implementation	DNS Snooping + Async DNS	DNS Snooping	DNS Snooping	Async DNS	DNS Snooping
Wildcards	✅︎	️✅︎	✅︎	❌	✅︎
Egress Rules	✅︎	️✅︎	✅︎	✅︎	✅︎
Ingress Rules	❌	️❌	❌	❌	❌
Allow Rules	✅︎	️✅︎	✅︎	✅︎	✅︎
Deny Rules	✅︎	️❌(?)	❌	✅︎	❌(?)

Appendix¶

CNAME Records¶

CNAME records are a type of DNS record (like a A or AAAA) that direct the resolver to query another name to retrieve actual A/AAAA records.

For example:

$ dig www.kubernetes.io

... Omitted output ...

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.kubernetes.io.     IN  A

;; ANSWER SECTION:
www.kubernetes.io.  3600    IN  CNAME   kubernetes.io.
kubernetes.io.      3600    IN  A   147.75.40.148

... Omitted Output ...

CNAME Chasing¶

CNAME chasing refers to an optional behavior for DNS resolvers whereby they perform subsequent lookups to resolve CNAMEs returned for a particular query. In the above example, querying for www.kubernetes.io. returned a CNAME record for kubernetes.io.. When CNAME chasing is enabled, the DNS server will automatically resolve kubernetes.io. and return both records as the DNS response.