HOW DNSSEC RESOLUTION PLATFORMS BENEFIT FROM LOAD BALANCING TRAFFIC ACCORDING TO FULLY QUALIFIED DOMAIN NAME Daniel Migault
Maryline Laurent
Orange Labs, Issy-les-Moulineaux, France
[email protected]
Institut TELECOM, TELECOM SudParis, CNRS Samovar UMR 5157, Evry, France
[email protected]
Abstract—In July 2008, the Kaminsky attack showed that DNS is sensitive to cache poisoning, and DNSSEC is considered the long term solution to mitigate this attack. Compared to DNS, DNSSEC resolution requires cryptographic operations such as signature checks or hashing data. Tests over a lab platform, as well as deployment feed back shows that migration of a DNS platform to DNSSEC requires five times more nodes than with DNS. ISP with already large DNS platform can hardly migrate to DNSSEC with the currently designed platform. In fact currently designed platform split DNS traffic over a n node platform according to the IP addresses of the query. This paper provides an alternative architecture where DNS(SEC) traffic is load balanced according to the Fully Qualified Domain Name (FQDN) of the query. Simulation shows that such an architecture is 1.409 time more efficient than current architecture. This paper briefly describes the DNSSEC protocol, then based on experimental measurements it estimates the cost of migrating from DNS to DNSSEC. The last section compares different load balancing strategies, and shows that load balancing traffic according to FQDN seems promising. Index Terms—DNS, DNSSEC, performance
I. W HAT IS DNSSEC DNSSEC [2], [4] and [3] provides mechanisms to authenticate the origin of the RRset, integrity protect RRsets, build a chain of trust and prove the non-existence of the FQDN. Authentication and integrity protection are performed by the signatures (RRSIG). This means that each DNSSEC zone MUST be somehow assigned an identity – a Key Signing Key (KSK) – and a key that signs the DNS zone – Zone signing Key (ZSK). The KSK has also cryptographic properties and is used both to identify a zone and to sign the ZSK so that ZSK are bound to the proper KSK. The chain of trust implies that once you trust one entry, you can securely trust the subdomains. This trust delegation is performed through the Delegation Signer (DS) RRset, where a parent specifies the KSK of its child. The proof of non existence can be performed through two different mechanisms. NSEC [2], [3], [4] and NSEC3 [5]. Both mechanisms are based on ordering all RRsets of a zone file as a dictionary. Each FQDN has a specific place in the zone file, and NSECx RRsets provide the link between the FQDNs. NSECx proves the FQDN does not exist since it
indicates the FQDN is not at the right place. Furthermore, the response is signed by the authoritative server. By ordering and providing FQDN, NSEC enables zone walking [1], that is to say downloading the whole zone file even though the AXFR is disabled. NSEC3 addresses this problem by considering the hash of the FQDN instead of the FQDN itself. II. P ERFORMANCE I SSUE Suppose a resolver sends a DNS query to get the IP address of a given FQDN. In our case we also suppose that the queried FQDN exists and has a valid answer. The response has an ANSWER section that contains a few RRsets such as the different IP addresses that corresponds to the queried FQDN (RRsets of type A for IPv4 or AAAA for IPv6). The response also has an AUTHORITATIVE section that contains the name of the authoritative Name Servers (RRset of type NS) as well as the corresponding IP addresses of the authoritative servers (RRste of type A or of type AAAA). With DNSSEC each different Type comes with a signature, which means that in our case we will has up to 4 signatures (1 for the queried FQDN IP address (A or AAA), one for the authoritative section (NS) and one or two for the additional section depending if there one family of IP addresses or both IPv4 and IPv6 addresses. On the other hand, if the FQDN has no responses we have to consider NSEC3 RRsets to prove the non existence. We will consider the simple case where no wildcard is involved. The ANSWER section is empty. The AUTHORITATIVE section has one NSEC3 RRset that proves the FQDN does not exist. Then we have to prove that the Zone exists, to clearly indicate that the mismatch only occurs with the FQDN. This involves another NSEC3 RRset. At last we have to prove that the FQDN may not result from a wildcard expansion. In other words, we have to prove that the FQDN *.Zone does not exist. This makes 3 signatures to check, as well as 3 hashes to perform. In the latest example above, we supposed that the resolver had the cryptographic material to perform the signature check, or to perform the hash. If not the resolver may request the
public key of the authoritative server. As a result DNSSEC responses require much more CPU cycles to be treated than DNS responses. On servers’ side, Comcast reports at NANOG45 [8] that DNSSEC increases memory footprint between 5 and 9 times for the authoritative infrastructure and that the recursive infrastructure requires additional recursive clusters. [6] provided a performance view of DNSSEC. It considers tests for resolving servers as well as authoritative servers. In this paper we are more focused on the impact of DNSSEC on the resolving server. From experimental measurements of [6] we evaluate the cost of migrating from a DNS platform to a DNSSEC platform, we simulate the cost represented by migrating to DNSSEC for two different implementations of DNS(SEC) resolving servers: BIND 9.6.0-P1 and UNBOUND 1.2.1. Costs are provided in term of number of nodes and in term of Response Time (RT). The simulation was performed with ISP live network capture, that’s to say : a daily query rate of 40, 000 q.s−1 , a maximum query rates up to 120, 000 q.s−1 and a Cache Hit Rate (CHR) of 70%. We estimate that the number of signature check per resolution is 3, and that the maximum CPU load on each server is 60%. Table I shows that the cost can be up to 425% for the CPU and 499% for the response time. Note that the simulation is performed with measured data on an experimental platform with Pentium IIIs, with 1GHz CPU and 384 MB of RAM, which are definitely not the nodes we have in our operational platform, and thus results must be considered cautiously. However, real experimentations provided similar results and one should keep in mind a factor of 5. Node BIND9 UNBOUND UNBOUND (IR∗∗ ) BIND9
DNS 80 (1∗ ) 20 (1∗ )
DNSSEC 87 (1.09∗ ) 24 (1.20∗ )
DNSSEC-VAL 160 (2.00∗ ) 85 (4.25∗ )
0.25
0.28
0.53
(a) Number of nodes Response Time (µs) BIND9 UNBOUND UNBOUND (IR∗∗ ) BIND9
DNS 1402 (1∗ ) 401 (1∗ )
DNSSEC 1161 (0.80∗ ) 366 (0.92∗ )
DNSSEC-VAL 2300 (1.63∗ ) 2000 (4.99∗ )
0.20
0.20
0.87
(b) Response Time (µs) TABLE I DNSSEC IMPACT RESOLVING PLATFORM – (∗ ) PR), (∗∗ ) IR
III. FQDN L OAD BALANCER ISP are managing 18 nodes platforms for DNS, and migrating to DNSSEC would increase the platform around 100 nodes, which leads us to think on how we could optimize the resolving platform. DNS architecture may differ from one ISP to the other, but in this paper we consider centralized platform composed of one or few clusters, which responds to all end users queries, like Orange having 18-node clusters. Load balancers are in charge of splitting the DNS queries among the n different nodes. For each incoming query, the
load balancer XORs the IP source and destination of the query, and then performs a modulo n operation over the 24 least significant bits [7]. We will call this architecture XOR. This architecture is well designed to balance the incoming traffic and XOR happens to be an efficient hash function with a low cost. The main advantage of this architecture is that each node of the platform is independent, which provides management and scalability facilities. In fact when more CPU is required, the network administrator only has to add a node to the platform. However, this platform is not so efficient in term of DNS resolution. Since the traffic is split according to the IP addresses, it happens that popular FQDN are resolved by all nodes of the platform. With DNSSEC the resolution is much higher than with DNS, thus when migrating to DNSSEC, we would like the platform to avoid performing redundant resolutions. On way to do so is to split the DNS traffic according to the FQDN rather then the IP address. By doing so, a given FQDN will be only resolved by one node, thus avoiding multiple resolutions. The architecture that splits DNS traffic among the n nodes of the platform is called FQDN. Compared to XOR, we expect FQDN to be more efficient, more scalable and to better protect the end users’ privacy. First FQDN is expected to be more efficient as it implicitly assigns each FQDN to a single node that is responsible for performing the resolution for that FQDN. The number of resolutions is expected to decrease, thus reducing the CPU consumption. Second the platform is expected to be more scalable since when one adds one node to XOR, the added nodes performs a lot of redundant resolutions. As a result the platform does not take a full advantage of the new introduced node. This is avoided with FQDN where no redundant resolutions are performed. The platform better protects end user’s privacy since optimization of load balancing is performed on the FQDN. In fact if we want to optimize the load balancing and not only relying on a single hash function, optimized XOR would consider which IP addresses to assign to each node. This means that end user behaviour is monitored, and end users are affected to a dedicated node. On the other hand FQDN would consider which FQDN to assign to each node, thus monitoring the DNS traffic rather than each individual end user’s IP address. To compare the efficiency of FQDN and XOR we compute the Occupancy Time (OT) of the nodes of the platform. The Occupancy Time is compute from replaying a live capture of DNS traffic we consider a 10 minute live DNS traffic capture at the rushing hours (19 : 33) on our 18 node platform. For each node we count the number of cached queries and the number of resolutions. The global Occupancy Time is derived by summing the CPU required for a performing a resolution with a 3 signature checks and the CPU required by performing DNS cache lookup. From [6] we found that for UNBOUND a resolution that requires one signature check costs 0.241%CP U and when three signatures checks are required we multiply this by three. A resolution whose response is already in the cache costs 0.005%CP U . XOR and FQDN Occupancy Time distribution are represented
for DNS and DNSSEC in figure 1. In order to compare the distribution we also plot IP which balances the traffic according to the IP addresses. Instead of applying a XOR hash function, it applies a SHA1 hash function. IP works like XOR and makes possible a comparison between the efficiency of XOR vs SHA1. We also plot the distribution when DNS queries are randomly spread over the different nodes. Randomly means that neither the IP address nor the FQDN is considered. Such a load balancing is called Random in this paper. XOR and IPs are very similar, and perform better than Random. FQDN, on the other hand, provides a bi-cluster distribution: the low OT and high OT group. For DNS, the low OT group has a high variance, and the mean OT of the high OT group almost equals OT of IPs. With DNSSEC things are better, there are still two clusters but they have smaller variance, which means that the OT is more uniformly distributed. The mean OT value of both groups are closer to each other than in the case of DNS, and in any case much lower than with Random or IPs/ XOR. FQDN seems promising since it offers a mean Occupancy Time lower than any of the other architectures. FQDN happens to be 1.172 more efficient for UNBOUND and DNS, and 1.409 more efficient with DNSSEC. However its major drawback is that it presents a non uniform distribution. In the case of our DNS 18 node platform, DNSSEC migration without modifying the architecture would require around 90 nodes. The proposed FQDN based load balancing technique requires only 64 nodes.
8
10
Random FQDN IPs XOR
8 Number of Nodes
Number of Nodes
10
6 4
6 4
2
2
0
0 6
Occupancy Time (x 104 %CPU)
5.5
5
4.5
4
3.5
3
2.5
Fig. 1.
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
Occupancy Time (x 104 %CPU)
(a) DNS - UNBOUND
Random FQDN IPs XOR
(b) DNSSEC SIG - UNBOUND Occupancy Time
IV. C ONCLUSION Load balancing the traffic according to the FQDN seems promising. However, there challenges to still need to be overcame. First using a single hash function do not provide a uniform distribution. One way to provide a uniform distribution is to define a routing table for the k most popular FQDN, and using a hash function for the others. Then load balancer that load balances the traffic according to the FQDN are not widely available on the market. On way to overcome this is to consider Pastry based architectures in conjunction of the existing IP based load balancers. The idea is that IP based
load balancer splits the traffic among nodes that may either perform a resolution if there are responsible for that FQDN or that forwards the query to the responsible node for that FQDN. R EFERENCES [1] Dnssec walker. https://www.dns-oarc.net/tools/dnssecwalker, jan 2008. [2] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose. DNS Security Introduction and Requirements. RFC 4033 (Proposed Standard), Mar. 2005. [3] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose. Protocol Modifications for the DNS Security Extensions. RFC 4035 (Proposed Standard), Mar. 2005. Updated by RFC 4470. [4] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose. Resource Records for the DNS Security Extensions. RFC 4034 (Proposed Standard), Mar. 2005. Updated by RFC 4470. [5] B. Laurie, G. Sisson, R. Arends, and D. Blacka. DNS Security (DNSSEC) Hashed Authenticated Denial of Existence. RFC 5155 (Proposed Standard), Mar. 2008. [6] D. Migault, C. Girard, and M. Laurent. A performance view on dnssec migration. In CNSM 2010, oct 2010. [7] N. Networks. Alteon OS 21.0, Alteon Application Switch, Sept. 2003. [8] R. K. Oberman. Dnssec implementation, jan 2009.