This page monitors all of our critical services. If there is a disruption in service, incident notes will be updated here. Subscribing to updates is the best way to stay informed about any issues affecting our services availability.
Affected Service:
PlatformAdmin PanelPublic Status PageWebsiteAugust 14, Fri
Resolved
The StatusKit server crashed or restarted in the early morning May 30, 2020 and did not recover correctly.
On reboot the Docker config and nginx config did not load in the proper order forcing a default nginx webserver message to display for users of the service -- or a failure to load browser message as if the service was completely offline.
In January, when the SSL wildcard cert was changed this process installed default nginx web server config that was not previously on the production servers. The SSL certs were correctly installed and the service functioned normally for the almost 5 months with no reboots.
When the service crashed and restarted the correct web service config was not load due the error introduced in January.
No customer data was impacted. No customer data was lost. Databases functioned normally throughout the incident.
We believe we have fixed the error but will be testing it again in a future maintenance window.
The IP address of the statuskit.com service changed as a result of the downtime. Please refresh your cache if your client's experience issues.
During this extended downtime on May 31, 2020, we did expand the disk space available to the application, performed log maintenance and greatly expanded the size of the database to accommodate future growth.
Affected Service:
PlatformAdmin PanelApril 23, Sun
Investigating
StatusKit is currently investigating a database connect issue that is resulting in timeouts and 500 server errors to user of the Admin panel and Platform. StatusKit pages are up and running and end users should not experience problems.
We have rebooted our RDS instances in AWS and
Event logs show this issue starting late on 4/20/2017 to current, 4/22/2017.
Please report any issues to hello@statuskit.com.
Thank you for your patience.
18:08, Apr 22 UTC
Update
A fix has been implemented and is in testing. We're monitoring the situation and event logs before further updating status. Thank you.
20:23, Apr 22 UTC
Resolved
The issue, which we believed caused by an internal DNS timeout in our Amazon cloud when accessing the RDS database, appears to be behind us.
At the time of this update, this exception had not been seen for 8 hours.
https://www.screencast.com/t/NIqlAZqoIf
RDS loads were not out of the ordinary leading us to determine the DNS lookups internal to our AWS were at fault on this intermittent issue. We made some changes on 4/22 mentioned above and have configuration changes at the ready to relax timeout thresholds should performance degrade in the future.
We are tagging this as RESOLVED and will continue to monitor. Thank you!
*** PS: we regret any performance issues with our service but remind ourselves as we type this update, this is why we exist, for you when similar issues occur!!!
Editing incident will result in an internal server error, data can not be updated as a result. We’re working to resolve this issue and hope to have it all fixed soon. Thanks so much for your patience and understanding, and sorry for any inconvenience.
15:04, Sep 22 UTC
Fixing
We’re working to resolve this issue now and hope to have it all fixed soon.
15:10, Sep 22 UTC
Fixing
A fix has been implemented and deployed. We're monitoring the situation to make sure everything runs smoothly.