Explaining our recent outage – 5 July, 2022

July 8th, 2022 - Get new posts sent straight to your inbox, click here. jameso

Last Tuesday, 5 July 2022, we had an outage that affected a large portion of our customer base.

We believe it’s better to own the mistake and tell you how we’ll be better.

Even though we were able to fix the issue within the hour, we still wanted to let you know what happened, what we did to fix it and what we’re doing post outage resolution.

What happened?

There was an error with our DNS infrastructure.

DNS stands for Domain Name System. Its job is to translate URLs, or website address, (e.g. www.google.com.au) into their IP address (e.g. 8.8.4.4), which is how webpages, and the internet generally, communicate.

The internet works in numbers; an IP address is like a home address and the IP address is telling the data which webpage to load or where to go.

So, instead of you remembering complex numbers to reach your favourite websites, DNS does all the hard work for you, you only need to remember the website address.

With our DNS infrastructure going down, internet services couldn’t communicate with the webpages they were requesting.

A bit like the postie not being able to deliver a letter because there’s no address on the envelope.

How did this happen?

When we kicked off an automated process to perform a routine check of our DNS infrastructure, an error in the code caused the process to time out before it could be completed.

This should not have happened.

How we fixed it

As soon as our team member realised what had happened, they reversed their previous steps and restored the system.

Post outage actions

We have completed a review of the incident – including evaluating our procedures (the steps taken by our team which resulted in the error) and the automation system we use for DNS maintenance.

We’ve already found a few areas for improvement, and we will be implementing the recommended changes to make sure this doesn’t happen again.

Again, we apologise if you were affected by the outage. We hope this helps explain what happened.