Issues resolved: Interruptions to our services
We’re grateful to see that our services are being so well received and used by our customers. We are seeing a steady increase in activity and we are doing our best to scale our systems so that they can handle the upcoming load peaks without problems.
Unfortunately, this could not be 100% guaranteed in our Public Cloud environment over the last few days.
On Monday, we were confronted with a previously unknown problem that resulted in functional restrictions for some users (e.g. board copying and board export functions were temporarily unavailable). When analyzing the problem, we found that two different, load-relevant developments came together which affected a core component of the application, and that data storage was also partly affected. Because of this, we had to carry out short-term, unannounced maintenance. That meant the service was therefore unavailable on Tuesday morning between 2:14am and 4:06am (CEST).
The state of the system after the maintenance was fully functional again, but the measures needed to avoid the problems in the future had not yet been taken. These were in preparation and planned for release on Wednesday morning – today. Then, early this morning, before that release could take place, the cause of the problem reappeared, so another short-term, unannounced maintenance window was necessary. The service was therefore not available between 9:46am and 11:44am (CEST) today.
With the second maintenance, we were able to implement measures to avoid the problem in the future so that no further interruptions of this type are to be expected from now on.
In addition to the total of almost four hours of unavailability of the service, it is particularly serious that a handful of customers have lost recently added board content from Monday and in one case from the weekend. We are already in contact with these users. I would like to sincerely apologize for this loss and for the inconvenience it caused. I am also very sorry for the unavailability due to the maintenance window that was only announced at such short notice. These occurrences absolutely do not meet our standards and goals and we’ll work hard to prevent them in the future.
In the following days and weeks we will continue to investigate the events and implement improvements to the system as well as in our processes in order to identify similar problems earlier and to exclude them in the long term. Further information will follow as part of our regular release communication.
For updates on our service status, and news on current events, follow our Twitter feed. For the current live status of the service, as well as for our availability track record, please visit our status page. For any further questions and feedback, feel free to reach out to us at firstname.lastname@example.org.
Again, I thank you for your patience.