This is a follow-up post on registrant classification. Before reading this post, make sure to check out Registrant Classification using Machine Learning.
In the last post, we introduced the models we tried to solve the registrant classification problem. In this post, we will have a look at our final model and the analysis of registrations based on the classifications.
To achieve the best accuracy, we tried out different feature extraction methods and various models. The final model we used is an ensemble method called stacking classifier, where we have a set of base learners to train and generate predictions, and a meta classifier to learn to combine these predictions to make a final prediction. To make it even better, a Bayesian optimised search was applied to the classifiers to find the optimal hyper parameters. The best accuracy we achieved is 96.7%.
Let’s start by looking at the whole register. Snapshots of registrant data are taken weekly from 2016-09-19 to 2018-03-19, and the registrants are classified into person or organisation. The following figure shows the change in the number of registrants of person/organisation types. The total number of registrants shows a steady growth from 2016-09-19 to 2018-03-19, with a 17% increase. The number of organisation registrants is 50% more than the person registrants and the ratio also shows an upward trend. There were drops due to special events and at each drop, the organisation registrants seem to drop more first and then quickly catch up again. In March 2018, there are 596,479 registered companies in New Zealand. With 155856 NZ based organisation registrants on 2018-03-19, this means 26.13% of the companies in NZ have at least one domain registered.
The portfolio size ranges from 1 to more than 2600 domains. The figure below shows that most of the registrants are small portfolio holders with less than 5 domains. 67% of registrants owns 30% of domains. Only less than 1% of registrants have more than 50 domains.
The bar chart below illustrates that, for small portfolio groups, the proportion between person and organisation is even. But for larger portfolio groups, domains are mostly held by organisations. The interesting point is that, among the top 18 portfolio holders, 6 of them are persons. In fact, the largest portfolio holder, who has more than 2600 domains (almost doubling the size of second largest portfolio holder), is a person!
An important question we wanted to answer was: whether the domains owned by one type of registrants tend to live longer than those owned by another type? i.e. do they behave differently in terms of retention? The figure below reveals that organisations owned most of the domains that are older than 8 years. Taking a close look at data on 2016-09-19 and 2018-03-19, 82.18% of the organisation owned domains stay active in the register, while the percentage of person owned domains is lower. This provides us a proof that domains owned by different type of registrants do behave differently and the organisation owned domains do have a higher retention rates.
The figure below shows the retention rate of different age group. The retention rate of organisation owned domains is significantly higher than person owned ones in the first 5 years of age, after that the person owned domains starts to catch up. This means the year of age and the registrant type should be jointly considered when modeling the retention behaviour of a domain.
Going wild a bit, let us have a look at another entry of registrant data, the email. We were expecting the organisations to first start to use the multiple email accounts to manage multiple domains as they have bigger capability in that. Counterintuitively, the figure below tells us that it is the majority of person registrants start to use multiple email accounts as their portfolio size grows.
There are several interesting future applications of the registrant classifier. Due to the fact that registrant names have unstandardised free text and they are registrants provided, they naturally have anomalies. With the registrant classifier, we will be able to find the strange names that have low probability to occur and check the registrations under those names for further anomaly detection. And now that we know the retention behaviour of the two types of registrants is different, we will consider it as an important factor when modelling individual domain retention. And a great news: the registrant classification is considered to be a potential feature in the .nz registrar portal! Our registrars can have better knowledge of their registrants in the future.