We understand how important the API is to many of our users The team has been investigating what went wrong, and why our monitoring did not pick up on the failure. so far investigation:
The API is hosted on a cluster of servers, each with monitoring setup (which issue downtime alerts to our team) according to
What we have discovered so far is that only one server in the cluster had the API service
go down during this period - so, fortunately, the outage did not affect all API users . While the API appears to have been down during this time on that server, the actual server was otherwise functioning properly and therefor e was not automatically removed from the cluster's load balancer . We did find, however, that this one server had monitoring setup incorrectly, with the incorrect level of reporting (writing to log files instead of sending pager alerts).
While we continue to investigate the root cause of why the API service went down
on this server, we have resolved monitoring configurations , and also double-checked all the monitoring configurations on other API servers to ensure this issue does not happen again in the future without our team being the first to know
Thank you for your patience,
Founder & CEO