Edu-ID authentication issues

Incident Report for renkulab

Postmortem

Incident Postmortem: SAML Edu-ID Login Outage

Incident Summary

Field Details
Date March 11, 2026 – March 18, 2026
Duration 7 days
Severity High — complete SAML Edu-ID login unavailability
Affected System Keycloak SAML Edu-ID authentication (Renku realm), Renkulab 2.15.0
Status Resolved

Timeline

Date Event
Feb 11, 2026 Renkulab 2.14.0 deployed, which included Keycloak 21. Keycloak 21 introduced a breaking change stopping decryption of SAML assertions encrypted with realm signing keys; the -Dkeycloak.saml.deprecated.encryption=true workaround flag was applied to maintain compatibility.
Mar 11, 2026 Renkulab 2.15.0 deployed, which included Keycloak 25. Keycloak 25 removed the -Dkeycloak.saml.deprecated.encryption=true flag entirely, eliminating the workaround. SAML Edu-ID login outage begins.
Mar 11, 2026 Workaround implemented using OIDC Edu-ID. This resolved the issue for Edu-ID users without a separate academic profile.
Mar 11–13, 2026 Investigation began. Root cause not immediately identified. Keycloak Identity Provider for SAML Edu-ID was recreated in an attempt to resolve the issue, introducing additional risk.
Mar 13–17, 2026 Root cause identified: the removal of the deprecated encryption flag in Keycloak 25 meant assertions encrypted with the realm signing key could no longer be decrypted.
Mar 17–18, 2026 New encryption certificate created and added to the Renku realm as a dedicated encrypting certificate. Certificate registered in the Switch Edu-ID portal.
Mar 18, 2026 SAML Edu-ID login restored. Incident resolved.

Root Cause Analysis

Starting with Keycloak version 21 (introduced in Renkulab 2.14.0, deployed Feb 11, 2026), Keycloak stopped decrypting SAML assertions that were encrypted using a realm key originally generated for signing purposes. This is a documented breaking change described in the Keycloak upgrade guide (SAML SP Metadata Changes).

A temporary mitigation had been applied at the time: the JVM flag -Dkeycloak.saml.deprecated.encryption=true was passed to restore the legacy behaviour. However, this flag was removed entirely in Keycloak 25, which was bundled with Renkulab 2.15.0 (deployed Mar 11, 2026). This eliminated the workaround and triggered the outage.

Resolution

The issue was resolved by the following steps:

  1. Generated a new dedicated encryption certificate.
  2. Added the new certificate to the Renku realm in Keycloak, specifically designated as an encrypting certificate (distinct from the existing signing certificate).
  3. Registered the new certificate in the Switch Edu-ID portal as an accepted certificate for the service provider.

Contributing Factors & Mistakes

Delayed Root Cause Identification

The root cause was not identified immediately. As a result, the team attempted to recreate the Keycloak Identity Provider for SAML Edu-ID, which is a fragile and complex operation with many potential pitfalls.

XML Metadata Incompatibility

An incompatibility exists between the XML metadata exported by Switch AAI and the format expected by Keycloak, making it impossible to directly import the metadata. This significantly complicated any attempt to reconfigure the Identity Provider from scratch.

Database Recovery Risk

Recovering the SAML configuration from a database backup was considered but assessed as risky, as it could have introduced configuration drift between user records across different databases. The backup was from 10 hours before the update was applied.

Lack of Test Coverage for Edu-ID SAML

The Edu-ID SAML login method was not configured in the development cluster, meaning the breaking change introduced in Keycloak 21 (and the subsequent removal of the workaround flag in Keycloak 25) was not caught during pre-production testing of the Renkulab 2.15.0 upgrade.

Lessons Learned

The Keycloak configuration for Edu-ID SAML is both fragile and critical. A misconfigured option can silently break the login method in ways that are difficult to debug, and recovery options are limited and risky.

Key takeaways:

  • Keycloak SAML configuration should be treated as precious infrastructure. Manual changes carry high risk of outage.
  • Breaking changes in Keycloak upgrade notes must be reviewed carefully before each upgrade, especially around SAML and authentication flows.
  • Deprecated workaround flags (such as Dkeycloak.saml.deprecated.encryption=true) should be tracked as technical debt with a clear remediation plan, tied to the specific Keycloak version in which they are scheduled for removal.

Action Items

Action Priority Status
Investigate and implement Keycloak-as-code tooling (e.g. Terraform provider or realm export/import automation) to capture the Edu-ID SAML configuration in version-controlled code. High Open
Configure Edu-ID SAML in the development cluster so that future Keycloak upgrades can be tested end-to-end before being applied to production. High Open
Document the Edu-ID SAML configuration (certificate management, Switch Edu-ID portal steps, Keycloak realm settings) as an operational runbook. Medium Open
Establish a pre-upgrade checklist that includes reviewing Keycloak release notes for SAML and authentication-related breaking changes, and a Postgres backup Medium Open
Posted Mar 20, 2026 - 09:56 CET

Resolved

This incident has been resolved.
Posted Mar 18, 2026 - 11:51 CET

Update

Edu-ID users with both personal and private profiles might encounter problems signing in. We are still trying to resolve this. Please contact us for help signing in.
Posted Mar 17, 2026 - 12:17 CET

Monitoring

We have identified the issue and implemented a fix, in case of issues, please contact us.
Posted Mar 11, 2026 - 11:18 CET

Investigating

The Edu-ID IdP provider integration is currently facing issues, we are investigating.
Posted Mar 11, 2026 - 09:05 CET
This incident affected: Authentication.