Following the Platform Maintenance roll-out we started to experience slow loading times and, in some cases, timeout issues. From Monday May 23th it became clear that the impact was higher than initially perceived and overall platform performance got worse. A degraded performance incident was created and an investigation to identify the cause was started. AskCody Status - Degraded performance
Better monitoring was established specifically for this incident and the following investigation found that connection time to the database was causing problems for requests within the platform. This resulted in the long response times and some individual requests "hanging" for minutes, resulting in timeouts. Tuesday May 24th in the afternoon we scaled our solution to handle the issue concerning connection times to the database. The data bandwidth was sufficient but there was a bottleneck in actually opening new connections to the database. When this was solved the average response time dropped and platform performance was fully restored.
Following the implementation of the solution, the incident status was changed to "monitoring" and the following day it was fully closed. The Performance readout suggests that not only is overall platform performance "back to normal" but exceeds what it was before the Platform Maintenance. We sincerely regret the inconvenience this has had and continue to monitor and maintain the platform to make sure the risk of further incidents is kept to a minimum.