Ilya Ilya Grigorik’s great course on web performance made me aware of the importance of DNS server performance and how they are poorly mantained. Domain Name System was invented in ’82, they compose one of the oldest core services of the Internet, however, they are often disregarded, as they are assumed to be fast and usually one connects to whatever is offered through DHCP. DNS requires very little resources: uses UDP, client and server caches, and high optimized code. DNS is also very reliable as clients have a pool of servers to connect to and requests can be forwarded between servers. However, in general, DNS servers are poorly maintained and not optimized regularly, as they most of the time “work”.
Illya suggested trying namebench, an open-source tool to benchmark and help you choose the most appropriate DNS servers for your location. What’s cool about the tool —besides being python based and having a multi-platform GUI– is that for it’s benchmark it can take domain names from your browser’s cache and graphical reports.
Your local providers DNSs usually have the lowest latencies, but, in average (and median) is not usually the case. I have been using namebench at home, work, and university as a client and server. For home ADSL connections, Google’s public DNS are usually the best choice. I have been surprised this has been also the case for the university (have reported the issue), as university connections are usually well maintained and usually have good networking equipment.
At work found something interesting, the main DNS server was slower than the secondary. Both are virtual machines (on separate physical hosts), the main one actually composed of two identical servers with failovers. In average, the secondary server was 20ms faster than any of the primaries; as a median, 50ms faster, which is a great difference. As curiosity, we performed different tests:
Checked server’s load and CPU utilization. Almost idle.
Tested directly accessing both primary servers, without the failover. The failover mechanism didn’t added latency.
Tested the second primary that had no load. Same result.
Increased from 256MB to 4GB the viral machine. Same result.
Increased the number of cores. Same result.
When we have more time, we will keep investigation no what can be causing the 20 to 50ms overhead in the primary servers. This reaffirms the case that if the name servers are working, no one periodically checks their performance. Nobody knew we were losing this time in resolution. Every site you visit, you need to resolve 15 names in average, and can go up to 50 (source HTTPArchive). So, if you manage DNS servers, check their performance, and for all of us check your client’ from different locations!!!
Something missing in namebench though is to check for private domains to included them. This can hurt results for external DNS servers.
The next figure is also important, this is best response time you can get from your configured and public DNS servers.
In this case my locar server (SYS-1 and SYS-2) could serve my requests in less than 3ms. Also have other 2 public servers that take less than 5ms. Best for Google’s public service is 34ms, but in average it is the best. This means that if the local servers are optimized further, we could be gaining up to 30ms per DNS request.