As we run the .nz ccTLD (Country Code Top Level Domain) authoritative nameservers, we receive lots of DNS queries and answer each query with a DNS response. We capture these queries and responses, and store them in a Hadoop cluster for further analysis. Based on this data, we generate daily statistics about our DNS traffic and publish it in IDP (Internet Data Portal) as .nz DNS Statistics.
We have a clean, continuous dataset dating back to 2015. We are now able to apply some time series analysis to explore trends. This post will show some interesting results from that analysis.
A DNS query contains several attributes such as the domain being queried, the query type (the type of resource related to the domain) and the set of DNS header flags. Based on the aggregated counts of each attribute, we can explore the data across a variety of dimensions.
Registered Domains Queried
IDP Dataset: Unique registered domains queried
We see lots of domains being queried in our traffic. Not all of them are registered domains, many do not exist in our register. Selecting by the response code in the DNS response message, we can extract the registered domains that were queried. As 'the number of unique registered domains queried' depends on the register size, we normalize it by dividing by the register size of each day, which can be obtained from .nz registration statistics.
Then we applied the Facebook forecasting library Prophet to our data. Using the logistic growth trend model with carrying capacity of '1', we obtained the following result.
From the plot, we can see the activity of the .nz namespace fits an upward trend over the past two years and is predicted to keep growing in the next year.
Prophet is based on an additive model where non-linear trends are fit with yearly and weekly seasonality. From the components plot below, we can see the trend, weekly variation and yearly seasonality.
The weekly and yearly seasonality are quite interesting. As our data is in UTC time, shifted 12 hours compared to NZ time, the weekly activity actually ramps up around Thursday and then stays high until Saturday. We presume the increased activity is partly due to the business queries on Thursday/Friday, and partly due to the weekend leisure queries on Friday/Saturday.
In the yearly subplot, we see a peak in March which could relate to the financial planning for the year (The financial year commonly finishes in March in NZ). And a decrease in the number of registered domains queried is across July, August and September, which could correlate with lower business activity during the winter months. Finally, the low point at Christmas time could be explained by holiday effect.
IDP Dataset: Query types
Query type (the type of resource being asked for a domain) indicates the usage for the domain. Please refer to DNS in a Glimpse for the definition of query types. We explore the query volume for each major query type to see how the usage of .nz domains evolves through the years. DS and DNSKEY are two major query types related to DNSSEC, so we show them in a separate plot as an indicator of the DNSSEC deployment progress.
We use Plotly for interactive plotting.
We can see the type of A and AAAA remain the top two query types asked for the .nz domains. Specifically,
- Queries for A (mapping the IPv4 address for a domain) shows a steady growth with occasional spikes.
- Queries for AAAA (mapping the IPv6 address for a domain) experienced a strong growth across 2015 and had a steep drop in July 2016 and then caught up gradually. The drop in July 2016 is probably related to fixing a lot of AAAA queries for two of .nz nameservers as explained in this presentation.
- Queries for NS (locating the name server for a domain) were very small and then jumped up in Feb 2016, and remained steady at the higher level. Extremely high volumes were seen in early 2017. These anomalies will be explored later in this blog.
- Queries for MX (locating the mail server for a domain) should reflect the activity of sending emails to addresses within .nz namespace, including spamming. These volumes are steady with strong seasonality in a weekly and monthly level.
- Queries for DS (validating delegations by resolvers doing validation) shows a rising trend, which reflects the deployment progress of DNSSEC.
- Queries for DNSKEY (validating signed records) shows a slower rising trend. This type of queries normally should happen in the delegated zone. As the authoritative nameserver for mainly top/second level domains, we only see a small amount of DNSKEY queries.
IDP Dataset: RD bit
The DNS message header contains an RD (Recursion Desired) bit. Usually, it's set in the DNS query sent by the end user to the resolver. As the authoritative for the .nz namespace, most of the queries should not come with that bit set. That's why we don't expect to see lots of queries with RD bit set as shown in the plot below.
We can see a big jump in Feb 2016 similar to the NS queries mentioned in the previous section. We will explore this anomaly later in this blog.
From our traffic, we can also see the source IP addresses and the network protocol they use to communicate with us such as UDP or TCP, IPv4 or IPv6. So we can explore the trend of the network protocols usage in our clients' infrastructure from our traffic. In this section, we draw the comparison plots in log scale, as the compared objects have a big difference in quantity.
UDP vs. TCP
IDP Dataset: UDP and TCP
The use of UDP and TCP in DNS is driven by message size and other factors as described in RFC7766:
Most DNS [RFC1034] transactions take place over UDP [RFC768]. TCP
[RFC793] is always used for full zone transfers (using AXFR) and is often used for messages whose sizes exceed the DNS protocol's
original 512-byte limit. The growing deployment of DNS Security
(DNSSEC) and IPv6 has increased response sizes and therefore the use of TCP. The need for increased TCP use has also been driven by the
protection it provides against address spoofing and therefore
exploitation of DNS in reflection/amplification attacks. It is now
widely used in Response Rate Limiting [RRL1] [RRL2]. Additionally,
recent work on DNS privacy solutions such as [DNS-over-TLS] is
another motivation to revisit DNS-over-TCP requirements.
We compare the UDP and TCP trends in our traffic in two ways:
- UDP vs. TCP query volume
- The number of unique source addresses through UDP vs. TCP
From the two plots, we can see that both the query volume and unique source addresses through TCP increased significantly in the first half of 2016, and then stabilized. In contrast, the query volume over UDP showed slow growth through the years, but the number of unique source addresses through UDP decreased slightly.
In general, we found that the total number of unique source addresses has been decreasing since late 2015. As we have 3 name servers hosted by offshore providers that we don't capture the data for, we speculate that this reduction could be related to the traffic moving to other name servers offshore.
IPv4 vs. IPv6
IDP Dataset: IPv4 and IPv6
We can do a similar comparison between IPv4 and IPv6 trend as below.
- Query volume from IPv4 vs. IPv6 source addresses
- The number of unique IPv4 vs. IPv6 source addresses
From 2016, the query volume from IPv6 addresses has grown as has the number of IPv6 source addresses. IPv4 query volume has grown more slowly, while the number of source addresses has decreased since 2016. The reason for this decrease may be similar to that mentioned in the analysis of UDP/TCP queries.
We have also investigated the weekly and yearly seasonality for each metric. As there are many different patterns, here we just show two typical examples related to IPv4 queries.
- Weekend off-peak is typically seen in some metrics of the query volumes and the number of unique source addresses. This reduction is probably due to lower business activity during the weekend.
- The annual seasonality in the query volume from IPv4 addresses shows low points during winter and Christmas holiday.
One increase across multiple metrics
During the time series analysis, we found some simultaneous abrupt increases in different metrics shown below.
This appears very anomalous, so we did some analysis to our raw data trying to find out what's happening. We located a bunch of source IP addresses that generated these increases. From 2016-02-04, each of these IP addresses began to send about 63k NS queries for non-existent domains every day, and the RD bit was set in each query. It increased to 91k later and remained at the level. Due to these queries, we have a great number of unique non-existent domains that are queried each day and continue to be queried.
We have checked these IP addresses in our query log, and found that none of them showed up until Feb 2016. Their sudden appearance, with such specific behavior, has already attracted our attention. We can monitor these source addresses and do further research to find out the reason so as to suggest the best solution.
Using the daily .nz DNS statistics, we undertook some time series analysis to show trends and anomalies in our DNS traffic. Interesting patterns are shown. Some of them are quite easy to explain, while others require further research. By sharing the daily DNS statistics as open data, we hope anyone who's interested can make use of it to better understand the Internet in New Zealand.