Live Proctoring incident experienced on CA region
Incident Report for ProctorExam
Postmortem

Executive summary (All times in UTC)

  • On November 26th 2022, a live proctoring exam on the ca-central-1 instance of Proctorroulette (the service to link test takers to live proctors) caused the DynamoDB write usage to go over the provisioned capacity of 1
  • Since provisioned write capacity was not autoscaling DynamoDB started throttling requests
  • Impact: (CA region only) Students and proctors failed to be assigned because of the previous point and therefore proctors couldn’t see the students
  • Lead-up: No change has been made in almost 2 years to this region, previous exams ran without issue. The trigger for the much higher than usual load was a simultaneous disconnection and reconnection of all proctors (sharing the same internet connection), causing a self-aggravating loop of timeouts and retries.
  • Resolution: We have changed the configuration of our DynamoDB tables to use on-demand mode, so that it will always be able to scale to meet demand. We have also changed the configuration of Proctorroulette to allow more time for a request to be handled, so that throttled requests have more time to get processed.
Posted Dec 01, 2022 - 11:37 UTC

Resolved
We experienced an issue on Live Proctoring on Saturday the 26th of November which lasted between 13:20 UTC and 14:15 UTC, that affected the CA region.
Posted Nov 26, 2022 - 12:20 UTC