ProctorExam Status Notification
Incident Report for ProctorExam
Postmortem

WebRTC infrastructure failure: Test takers not able to start sessions

  • On 7 October 2023, at 14:17 CEST, we were made aware of issues with our webRTC infrastructure, as test takers on various environments were not able to start their exam. Unfortunately there were no engineers directly available to investigate the issue.
  • At 16:15 engineers began investigating the issue and found that the signaling controller, responsible for starting the webRTC media servers needed for each session, was not able to communicate with our infrastructure server anymore due to an authentication failure.
  • At 16:51 the issue was resolved by restarting the signaling infrastructure.

Impact: all customers on EU environments were affected, as their sessions were not able to start during the incident period.

Lead-up: Failure inside infrastructure configuration caused by an expired kubernetes api authentication token.

Resolution: Restarting servers refreshed the authentication token.

Prevention measures: We will create an alert for kube api server authentication errors so that we can react more quickly to such incident in the future. Longer term remedy: Auto-renewal of the token. We apologise for the great inconvenience this has caused for some of you.

We apologise for the great inconvenience this has caused for some of you.

Posted Oct 16, 2023 - 16:22 UTC

Resolved
At 14:17 CEST, we were made aware of issues with our webRTC infrastructure, as test takers on various environments were not able to start their exam.

At 16:51 the issue was resolved by restarting the signaling infrastructure.
Impact: all customers on EU environments were affected, as their sessions were not able to start during the incident period.
Posted Oct 07, 2023 - 12:00 UTC