One of our servers that routes requests experienced a hardware failure, causing a small percentage of requests to fail for 8 minutes. We were alerted to the issue by our monitoring systems as it happened, and were able to fail over to a standby server. In response to this incident, we have already deployed improvements to our automated health checks so that we can recover within seconds vs. minutes if this happens again.