Blake, dknoben and botanica,
We understand how important the API is to many of our users, and I am encouraged by how you've used the API to further automate workflows. The team has been investigating what went wrong, and why our monitoring did not pick up on the failure. Here is what I have found out from the team so far from their investigation:
The API is hosted on a cluster of servers, each with monitoring setup (which issue downtime alerts to our team) according to that server's availability.
What we have discovered so far is that only one server in the cluster had the API service go down during this period - so, fortunately, the outage did not affect all API users - but unfortunately it was the server your requests landed on and were sticky to. While the API service appears to have been down during this time on that server, the actual server was otherwise functioning properly and therefore was not automatically removed from the cluster's load balancer. We did find, however, that this one server had monitoring setup incorrectly, with the incorrect level of reporting (writing only to log files instead of sending pager alerts).
While we continue to investigate the root cause of why the API service went down on this server, we have resolved its monitoring configurations, and also double-checked all the monitoring configurations on other API servers to ensure this issue does not happen again in the future without our team being the first to know.
Thank you for your patience,
Paul
Paul Jackson
Founder & CEO