Renkulab down
Incident Report for renkulab
Postmortem

On November 4th 2021 around 10:00 Renkulab was subject to an outage that lasted around 20 minutes. During that time Renkulab was not accessible and all the user-sessions that were running at the time of the incident were lost.

The outage is the result of an automated process that acted upon merging some changes in the repository where the Renku team holds the configuration for all the clusters we manage. The automated process deleted the Renku deployment (for reasons that are still being investigated) thus rendering the service inaccessible.

The team immediately intervened to restore Renkulab and revert the rogue code and around 10:20 Renkulab was again available.

Precautions to prevent rogue code to disrupt our production cluster have already been taken.

The Renku team apologizes for any inconvenience experienced by our users.

Posted Nov 04, 2021 - 17:57 CET

Resolved
This incident has been resolved.
User sessions were shut down due to this incident, however there should be an auto-save branch with uncommitted/unpushed changes.
Posted Nov 04, 2021 - 10:42 CET
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 04, 2021 - 10:31 CET
Identified
The issue has been identified and a fix is being implemented.
Posted Nov 04, 2021 - 10:17 CET
This incident affected: Renkulab web UI, Knowledge Graph, GitLab, Renkulab sessions, and Loud.