GOOD
good
This incident has been resolved.
Oct 12, 1:11 AM UTC
major
We’re continuing to work towards recovery of code search service.
Oct 12, 12:46 AM UTC
major
We’ve identified the issue with code search and are working towards recovery of service.
Oct 12, 12:14 AM UTC
major
We’re continuing to investigate issues with code search.
Oct 11, 11:31 PM UTC
major
We’re continuing to investigate issues with code search. Copilot and Actions services are recovered and operating normally.
Oct 11, 10:57 PM UTC
major
Copilot is operating normally.
Oct 11, 10:16 PM UTC
major
We are rolling out a fix to address the network connectivity issues. Copilot is seeing recovery. support.github.com is recovered.
Oct 11, 10:14 PM UTC
major
Actions is operating normally.
Oct 11, 9:46 PM UTC
major
We continue to work on mitigations. Actions is starting to see recovery.
Oct 11, 9:28 PM UTC
major
The mitigation attempt did not resolve the issue and we are working on a different resolution path. In addition to the previously listed impacts, some Actions runs will see delays in starting.
Oct 11, 8:52 PM UTC
major
Actions is experiencing degraded performance. We are continuing to investigate.
Oct 11, 8:48 PM UTC
major
We continue to work on mitigations. In addition to previously listed impact, code search is also unavailable.
Oct 11, 8:15 PM UTC
major
A mitigation for the network connectivity issues is being tested.
Oct 11, 8:05 PM UTC
major
We continue to work on mitigations to restore network connectivity. In addition to the previously listed impact, access to support.github.com is also impacted.
Oct 11, 7:28 PM UTC
major
We have identified the problem and are working on mitigations. In addition to previously listed impact, new Artifact Attestations cannot be created.
Oct 11, 7:05 PM UTC
major
We have identified the problem is related to maintenance performed in our networking infrastructure. We are working to bring back the connectivity.

Copilot users in organizations or enterprises that have opted into the Content Exclusions feature will experience disabled completions in their editors.

Customer migrations remain paused as well.
Oct 11, 6:41 PM UTC
major
We are investigating network connectivity issues. Some Copilot customers will see errors on API calls and experiences. We have also paused the remaining customer migration queue while we investigate due to an increase in errors.
Oct 11, 6:25 PM UTC
major
We are investigating reports of issues with service(s): Copilot. We will continue to keep users updated on progress towards mitigation.
Oct 11, 5:58 PM UTC
major
Copilot is experiencing degraded availability. We are continuing to investigate.
Oct 11, 5:56 PM UTC
major
We are currently investigating this issue.
Oct 11, 5:53 PM UTC
good
This incident has been resolved.
Oct 08, 11:32 PM UTC
minor
Codespaces is operating normally.
Oct 08, 11:32 PM UTC
minor
Codespace creation has been remediated in this region.
Oct 08, 11:32 PM UTC
minor
We are once again seeing signs of increased latency for codespace creation in this region, but are at the same time recovering previously unavailable resources.
Oct 08, 10:54 PM UTC
minor
Recovery continues slowly, and we are investigating strategies to speed up the recovery process.
Oct 08, 10:10 PM UTC
minor
We are continuing to see gradual recovery in the region and continue to validate the persistent fix.
Oct 08, 9:39 PM UTC
minor
The persistent fix has been applied, and are beginning to see improvements in the region. We are still working on follow-on effects, however, and expect recovery to be gradual.
Oct 08, 9:06 PM UTC
minor
We are nearing full application of the persistent fix and will provide more updates soon.
Oct 08, 8:26 PM UTC
minor
Mitigations we have put in place are yielding improvements in Codespace creation success rates in the affected region. We expect full recovery once the persistent fix fully rolls out.
Oct 08, 7:51 PM UTC
minor
We are continuing to work on mitigations while the more persistent fix rolls out.
Oct 08, 7:17 PM UTC
minor
We are continuing to apply mitigations while we deploy the more persistent fix. Full recovery is expected in 2 hours or less, but more updates will be coming soon.
Oct 08, 6:44 PM UTC
minor
We have applied some mitigations that are improving creation success rates while we work on the more comprehensive fix.
Oct 08, 6:08 PM UTC
minor
We have identified a possible root cause and are working on the fix.
Oct 08, 5:43 PM UTC
minor
Some Codespaces are failing to create successfully in the Western EU region. Investigating is ongoing.
Oct 08, 5:11 PM UTC
minor
Codespaces is experiencing degraded performance. We are continuing to investigate.
Oct 08, 5:08 PM UTC
minor
We are currently investigating this issue.
Oct 08, 5:02 PM UTC
good
On September 30th, 2024 from 10:43 UTC to 11:26 UTC Codespaces customers in the Central India region were unable to create new Codespaces. Resumes were not impacted. Additionally, there was no impact to customers in other regions.

The cause was traced to storage capacity constraints in the region and was mitigated by temporarily redirecting create requests to other regions. Afterwards, additional storage capacity was added to the region and traffic was routed back.

A bug was also identified that caused some available capacity to not be utilized, artificially constraining capacity and halting creations in the region prematurely. We have since fixed this bug as well, so that available capacity scales as expected according to our capacity planning projections.
Sep 30, 11:26 AM UTC
major
Codespaces is operating normally.
Sep 30, 11:26 AM UTC
major
We are seeing signs of recovery in Codespaces creations and starts. We are continuing to monitor for full recovery.
Sep 30, 11:25 AM UTC
major
Codespaces is experiencing degraded performance. We are continuing to investigate.
Sep 30, 11:24 AM UTC
major
We are investigating a high number of errors in Codespaces creation and start.
Sep 30, 11:09 AM UTC
major
We are investigating reports of degraded availability for Codespaces
Sep 30, 11:08 AM UTC
good
Between September 27, 2024, 15:26 UTC and September 27, 2024, 15:34 UTC the Repositories Releases service was degraded. During this time 9% of requests to list releases via API or the webpage received a `500 Internal Server` error. This was due to a bug in our software roll out strategy. The rollout was reverted starting at 15:30 UTC, which began to restore functionality. The rollback was completed at 15:34 UTC. We are continuing to improve our testing infrastructure to ensure that bugs such as this one can be detected before they make their way into production.
Oct 03, 5:37 PM UTC
good
Between September 25, 2024, 22:20 UTC and September 26, 2024, 5:00 UTC the Copilot service was degraded. During this time Copilot chat requests failed at an average rate of 15%.

This was due to a faulty deployment in a service provider that caused server errors from multiple regions. Traffic was routed away from those regions at 22:28 UTC and 23:39 UTC, which partially restored functionality, while the upstream service provider rolled back their change. The rollback was completed at 04:41 UTC.

We are continuing to improve our ability to respond more quickly to similar issues through faster regional redirection and working with our upstream provider on improved monitoring.

Sep 26, 5:08 AM UTC
minor
Monitors continue to see improvements. We are declaring full recovery.
Sep 26, 5:08 AM UTC
minor
Copilot is operating normally.
Sep 26, 5:03 AM UTC
minor
We've applied a mitigation to fix the issues and are seeing improvements in telemetry. We are monitoring for full recovery.
Sep 26, 3:51 AM UTC
minor
We believe we have identified the root cause of the issue and are monitoring to ensure the problem does not recur.
Sep 26, 2:34 AM UTC
minor
We are continuing to investigate the root cause of the latency previously observed to ensure there is no reoccurrence, and better stability going forward.

Sep 26, 1:46 AM UTC
minor
We are continuing to investigate the root cause of the latency previously observed to ensure there is no reoccurrence, and better stability going forward.
Sep 26, 1:03 AM UTC
minor
Copilot users should no longer see request failures. We are still investigating the root cause of the issue to ensure that the experience will remain uninterrupted.
Sep 26, 12:29 AM UTC
minor
We are seeing recovery for requests to Copilot API in affected regions, and are continuing to investigate to ensure the experience remains stable.
Sep 25, 11:55 PM UTC
minor
We have noticed a degradation in performance of Copilot API in some regions. This may result in latency or failed responses to requests to Copilot. We are investigating mitigation options.

Sep 25, 11:40 PM UTC
minor
We are investigating reports of degraded performance for Copilot
Sep 25, 11:39 PM UTC
good
On September 25th, 2024 from 18:32 UTC to 19:13 UTC, Actions service experienced a degradation during a production deployment, leading to actions failing to be downloaded at the start of a job. On average, 21% of Actions workflow runs failed to start during the course of the incident. The issue was traced back to a bug in an internal service responsible for generating the URLs used by the Actions runner to download actions.

To mitigate the impact, we rolled back the affecting deployment. We are implementing new monitors to improve our detection and response time for this class of issues in the future.
Sep 25, 7:19 PM UTC
minor
We're seeing issues related to Actions runs failing to download actions at the start of a job. We're investigating the cause and working on mitigations for customers impacted by this issue.
Sep 25, 7:14 PM UTC
minor
We are investigating reports of degraded performance for Actions and Pages
Sep 25, 7:11 PM UTC
good
On September 25, 2024 from 14:31 UTC to 15:06 UTC the Git Operations service experienced a degradation, leading to 1,381,993 failed git operations. The overall error rate during this period was 4.2%, with a peak error rate of 12.5%.

The root cause was traced to a bug in a build script for a component that runs on the file servers that host git repository data. The build script incurred an error that did not cause the overall build process to fail, resulting in a faulty set of artifacts being deployed to production.

To mitigate the impact, we rolled back the affecting deployment.

To prevent further occurrences of this cause in the future, we will be addressing the underlying cause of the ignored build failure and improving metrics and alerting for the resulting production failure scenarios.
Sep 25, 4:03 PM UTC
minor
We are investigating reports of issues with both Actions and Packages, related to a brief period of time where specific Git Operations were failing. We will continue to keep users updated on progress towards mitigation.
Sep 25, 3:34 PM UTC
minor
We are investigating reports of degraded performance for Git Operations
Sep 25, 3:25 PM UTC
good
On September 24th, 2024 from 08:20 UTC to 09:04 UTC the Codespaces service experienced an interruption in network connectivity, leading to 175 codespaces being unable to be created or resumed. The overall error rate during this period was 25%.

The cause was traced to an interruption in network connectivity caused by SNAT port exhaustion following a deployment, causing individual Codespaces to lose their connection to the service.

To mitigate the impact, we increased port allocations to give enough buffer for increased outbound connections shortly after deployments, and will be scaling up our outbound connectivity in the near future, as well as adding improved monitoring of network capacity to prevent future regressions.
Sep 24, 9:04 PM UTC
major
Codespaces is operating normally.
Sep 24, 9:04 PM UTC
major
We have successfully mitigated the issue affecting create and resume requests for Codespaces. Early signs of recovery are being observed in the impacted region.
Sep 24, 9:01 PM UTC
major
Codespaces is experiencing degraded performance. We are continuing to investigate.
Sep 24, 9:00 PM UTC
major
We are investigating issues with Codespaces in the US East geographic area. Some users may not be able to create or start their Codespaces at this time. We will update you on mitigation progress.
Sep 24, 8:56 PM UTC
major
We are investigating reports of degraded availability for Codespaces
Sep 24, 8:54 PM UTC
good
On September 16, 2024, between 21:11 UTC and 22:20 UTC, Actions and Pages services were degraded. Customers who deploy Pages from a source branch experienced delayed runs. Approximately 1,100 runs were delayed long enough to get marked as abandoned. The runs that weren't abandoned completed successfully after we recovered from the incident. Actions jobs experienced average delays of 23 minutes, with some jobs experiencing delays as high as 45 minutes. During the course of the incident, 17% of runs were delayed by more than 5 minutes. At peak, as many as 80% of runs experienced delays exceeding 5 minutes. The root cause was a misconfiguration in the service that manages runner connections, which caused CPU throttling and led to a performance degradation in that service.

We mitigated the incident by diverting runner connections away from the misconfigured nodes. We are working to improve our internal monitoring and alerting to reduce our time to detection and mitigation of issues like this one in the future.
Sep 16, 10:08 PM UTC
major
Actions is experiencing degraded performance. We are continuing to investigate.
Sep 16, 9:55 PM UTC
major
The team is investigating issues with some Actions jobs being queued for a long time and a percentage of jobs failing. A mitigation has been applied and jobs are starting to recover.
Sep 16, 9:53 PM UTC
major
Pages is operating normally.
Sep 16, 9:52 PM UTC
major
Actions is experiencing degraded availability. We are continuing to investigate.
Sep 16, 9:37 PM UTC
major
We are investigating reports of degraded performance for Actions and Pages
Sep 16, 9:31 PM UTC
good
On September 16, 2024, between 13:24 UTC and 14:28 UTC, the Git Operations service experienced a degradation, leading to intermittent SSH connection drops. The overall SSH error rate during this period was 0.0005%, with a peak error rate of 0.3%. The root cause was traced to a regression in the service reload mechanism, which resulted in SSH hosts dropping connections on an hourly basis. As SSH hosts were rebooted for routine security updates, the issue progressively affected more hosts. To mitigate the impact, we removed the affected hosts from production traffic. The SSH regression has since been identified and resolved, with all SSH hosts fully restored. Additionally, we have implemented new monitoring to alert us of any SSH connection refusals moving forward.
Sep 16, 2:28 PM UTC
minor
We are no longer seeing dropped Git SSH connections and believe we have mitigated the incident. We are continuing to monitor and investigate to prevent reoccurrence.
Sep 16, 2:27 PM UTC
minor
We have taken suspected hosts out of rotation and have not seen any impact in the last 20 minutes. We are continuing to monitor to ensure the problem is resolved and are investigating the cause.
Sep 16, 2:11 PM UTC
minor
We are seeing up to 2% of Git SSH connections failing.

We have taken suspected problematic hosts out of rotation and are monitoring for recovery and continuing to investigate.
Sep 16, 1:38 PM UTC
minor
We are investigating failed connections for Git SSH. Customers may be experiencing failed SSH connections both in CI and interactively. Retrying the connection may be successful. Git HTTP connections appear to be unaffected.
Sep 16, 1:30 PM UTC
minor
We are currently investigating this issue.
Sep 16, 1:29 PM UTC
good
On September 14th, 2024 from 20:45 UTC to 22:31 UTC commit creation operations, most commonly Pull Request merges, failed for some repositories. 226 repositories were impacted.

The root cause was a hardware fault in a Git file server, where merge commits are calculated. To mitigate the issue we marked the file server as offline.

Detection was slower than is typical because of lower weekend traffic. We’re making improvements to monitoring to decrease time to detection in future.
Sep 14, 10:43 PM UTC
minor
Pull Requests is operating normally.
Sep 14, 10:43 PM UTC
minor
we believe we have mitigated and are confirming recovery.
Sep 14, 10:41 PM UTC
minor
We are investigating reports of degraded performance for Pull Requests
Sep 14, 10:10 PM UTC
good
On Sep 13, 2024, between 05:03 UTC and 07:13 UTC, the Webhooks and Actions services were degraded resulting in some customers experiencing delayed processing of Webhooks and Actions Runs. 0.5% of Webhook deliveries were delayed more than 2 minutes during the incident. 15% of Actions Runs started between 05:03 and 05:24 UTC saw run start delays or failures. At 05:24 UTC, we implemented a mitigation to shift traffic to healthy infrastructure and new Actions Runs resumed normal operations. During the rest of the incident window, Actions runs started before 05:24 UTC continued to see delays publishing logs or job results. No Actions runs or Webhook deliveries were lost, only delayed.

We mitigated the incident by immediately shifting traffic to a healthy cluster while investigating. The incident was caused by an erroneous configuration change on our eventing platform. A permanent fix was deployed at 06:22 UTC after which services began to recover and burn down their backed up queues, with full recovery by 07:13 UTC.

We are working to reduce our time to detection and develop test automation to prevent issues like this one in the future.

Sep 13, 7:13 AM UTC
minor
We are seeing improvements in telemetry and are monitoring the delivery of delayed Webhooks and Actions job statuses.
Sep 13, 6:49 AM UTC
minor
We've applied a mitigation to fix the issues being experienced in some cases with delays to webhook deliveries, and the delayed reporting of the outcome of some running Actions jobs. We are monitoring for full recovery.
Sep 13, 6:23 AM UTC
minor
Actions is experiencing degraded performance. We are continuing to investigate.
Sep 13, 5:59 AM UTC
minor
We are investigating reports of degraded performance for Issues, Pull Requests and Webhooks
Sep 13, 5:42 AM UTC
good
Between August 27, 2024, 15:52 UTC and September 5, 2024, 17:26 UTC the GitHub Connect service was degraded. This specifically impacted GHES customers who were enabling GitHub Connect for the first time on a GHES instance. Previously enabled GitHub Connect GHES instances were not impacted by this issue.

Customers experiencing this issue would have received a 404 response during GitHub Connect enablement and subsequent messages about a failure to connect. This was due to a recent change in configuration to GitHub Connect which has since been rolled back.

Subsequent enablement failures on re-attempts were caused by data corruption which has been remediated. Customers should now be able to enable GitHub Connect successfully.

To reduce our time to detection and mitigation of such issues in the future, we are working to improve observability of GitHub Connect failures. We are also making efforts to prevent future misconfiguration of GitHub Connect.
Sep 05, 5:24 PM UTC
minor
We continue to work on making GitHub Connect setup available for customers that experienced errors since Sept 2. We will share another update as we make progress.
Sep 05, 4:49 PM UTC
minor
We've reverted a related change and customers can now setup GitHub Connect again. However, customers who attempted to setup GitHub Connect after September 2nd that saw a 404 will see continued failures until we complete an additional repair step. We will provide additional updates as this work progresses.
Sep 05, 4:15 PM UTC
minor
Enterprise administrators are seeing failures in the form of 404s when attempting to setup GitHub Connect for the first time. Existing Connect setups are not impacted. We are working on a fix for this issue.
Sep 05, 3:56 PM UTC
minor
We are currently investigating this issue.
Sep 05, 3:55 PM UTC
good
On August 29th, 2024, from 16:56 UTC to 21:42 UTC, we observed an elevated rate of traffic on our public edge, which triggered GitHub’s rate limiting protections. This resulted in <0.1% of users being identified as false-positives, which they experienced as intermittent connection timeouts. At 20:59 UTC the engineering team improved the system to remediate the false-positive identification of user traffic, and return to normal traffic operations.
Aug 29, 9:54 PM UTC
minor
The connectivity issues have been resolved and we are back to normal.
Aug 29, 9:54 PM UTC
minor
We have implemented a potential fix and are continuing to monitor for success.
Aug 29, 9:45 PM UTC
minor
We have isolated the symptom of the connectivity issues and are working to trace down the cause.
Aug 29, 9:25 PM UTC
minor
While we have seen a reduction in reports of users having connectivity issues to GitHub.com, we are still investigating the issue.
Aug 29, 8:43 PM UTC
minor
We are continuing to investigate issues with customers reporting temporary issues accessing GitHub.com
Aug 29, 8:07 PM UTC
minor
We are getting reports of users who aren't able to access GitHub.com and are investigating.
Aug 29, 7:29 PM UTC
minor
We are currently investigating this issue.
Aug 29, 7:29 PM UTC
good
On August 28, 2024, from 21:40 to 23:43 UTC, up to 25% of unauthenticated dotcom traffic in SE Asia (representing <1% of global traffic) encountered HTTP 500 errors. We observed elevated error rates at one of our global points of presence, where geo-DNS health checks were failing. We identified unhealthy cloud hardware in the region, indicated by abnormal CPU utilization patterns. As a result, we drained the site at 23:26 UTC, which promptly restored normal traffic operations.
Aug 28, 11:43 PM UTC
minor
Our mitigation has taken effect and impact is resolved.
Aug 28, 11:43 PM UTC
minor
Our mitigation is in place and we are seeing a reduction in overall impact. We are continuing to monitor until we are confident that the issue has been resolved.
Aug 28, 11:38 PM UTC
minor
We have identified a potential mitigation and are currently testing.
Aug 28, 11:33 PM UTC
minor
We are experiencing reduced overall impact, but continue to see a continued small impact to unauthenticated requests. We are continuing to investigate.
Aug 28, 11:14 PM UTC
minor
We are continuing to see customer impact and are still investigating.
Aug 28, 10:53 PM UTC
minor
We are continuing to investigate.
Aug 28, 10:37 PM UTC
minor
We are seeing cases of user impact in some locations are continuing to investigate.
Aug 28, 10:19 PM UTC
minor
We are seeing elevated traffic in some of our regions that we are in process of investigating.
Aug 28, 10:06 PM UTC
minor
We are currently investigating this issue.
Aug 28, 10:02 PM UTC
good
On August 28th, 2024, starting at 20:43 UTC, some customers accessing GitHub from North America experienced degraded access to GitHub services. The error was intermittent and manifested as timeouts when requests tried to reach endpoints. This was due to a degraded route internal to one of our transit providers. We identified the unhealthy provider path and drained it at 23:26 UTC, rerouting traffic through other providers and promptly restoring normal traffic operations.
Aug 27, 11:26 PM UTC
minor
We are no longer seeing any traffic going through the affected routes and this issue should be resolved.
Aug 27, 11:25 PM UTC
minor
All traffic should be rerouted by now; and we have seen a complete drain from the affected provider. We are performing final validations that networking traffic is back to normal.
Aug 27, 11:20 PM UTC
minor
We have drained connections that would be running through the affected routes and are waiting for caches to expire in order to validate that issues are resolved.
Aug 27, 11:03 PM UTC
minor
We have identified some potential connectivity issues among public Internet transit routes and are attempting to reroute.
Aug 27, 10:50 PM UTC
minor
Git Operations is experiencing degraded performance. We are continuing to investigate.
Aug 27, 10:46 PM UTC
minor
Issues is experiencing degraded performance. We are continuing to investigate.
Aug 27, 10:45 PM UTC
minor
We are experiencing some issues related to TLS/SSL encrypted connections and are currently investigating.
Aug 27, 10:38 PM UTC
minor
We are currently investigating this issue.
Aug 27, 10:37 PM UTC
good
On August 22, 2024, between 16:10 UTC and 17:28 UTC, Actions experienced degraded performance leading to failed workflow runs. On average, 2.5% of workflow runs failed to start with the failure rate peaking at 6%. In addition we saw a 1% error rate for Actions API endpoints. This was due to an Actions service being deployed to faulty hardware that had an incorrect memory configuration, leading to significant performance degradation of those pods due to insufficient memory.

The impact was mitigated when the pods were evicted automatically and moved to healthy hosts. The faulty hardware was disabled to prevent a recurrence. We are improving our health checks to ensure that unhealthy hardware is consistently marked offline automatically. We are also improving our monitoring and deployment practices to reduce our time to detection and automated mitigation at the service layer for issues like this in the future.
Aug 22, 5:28 PM UTC
minor
We are investigating issues with failed workflow runs due to internal errors. We are seeing signs of recovery and continuing to monitor the situation.
Aug 22, 5:21 PM UTC
minor
We are investigating reports of degraded performance for Actions
Aug 22, 4:49 PM UTC
good
On August 21, 2024, between 13:48 UTC and 15:00 UTC, Actions experienced degraded performance, leading to delays in workflow runs. On average, 25% of workflow runs were delayed by 8 minutes. Less than 1% of workflow runs exhausted retries and failed to start. The issue stemmed from a backlog of Pull Request events which caused delays in Actions processing the event queues that trigger workflow runs.

We mitigated the incident by disabling the process that led to the sudden spike in Pull Request events. We are working to improve our monitoring and deployment practices to reduce our time to detection and mitigation of issues like this one in the future. We are also identifying appropriate changes to rate limits and reserved capacity to reduce the breadth of impact.
Aug 21, 3:11 PM UTC
minor
We have seen recovery and Actions workflow runs are now running as expected.
Aug 21, 3:10 PM UTC
minor
We are seeing reduced delays for Actions workflow runs to get triggered. We are continuing to investigate how to further reduce impact on customers and recover more quickly.
Aug 21, 2:51 PM UTC
minor
We are investigating reports of users seeing delays for Actions workflow runs to get triggered.
Aug 21, 2:15 PM UTC
minor
We are investigating reports of degraded performance for Actions
Aug 21, 2:09 PM UTC
good
On August 15, 2024, between 13:14 UTC and 13:43 UTC, the Actions service was degraded and resulted in failures to start new workflow runs for customers of github.com. On average, 10% of Actions workflow runs failed to start with the failure rate peaking at 15%. This was due to an infrastructure change that enabled a network proxy for requests between the Actions service and an internal API which caused requests to fail.

We mitigated the incident by rolling back the change. We are working to improve our monitoring and deployment practices to reduce our time to detection and mitigation of issues like this one in the future.
Aug 15, 1:59 PM UTC
minor
Approximately 10-15% of customers may be experiencing problems executing new GitHub Actions runs. The problem is currently being investigated by our teams.
Aug 15, 1:45 PM UTC
minor
We are investigating reports of degraded performance for Actions
Aug 15, 1:35 PM UTC
good
On August 14, 2024 between 23:02 UTC and 23:38 UTC, all GitHub services on GitHub.com were inaccessible for all users.

This was due to a configuration change that impacted traffic routing within our database infrastructure, resulting in critical services unexpectedly losing database connectivity. There was no data loss or corruption during this incident.

At 22:59 UTC an erroneous configuration change rolled out to all GitHub.com databases that impacted the ability of the database to respond to health check pings from the routing service. As a result, the routing service could not detect healthy databases to route application traffic to. This led to widespread impact on GitHub.com starting at 23:02 UTC.

We mitigated the incident by reverting the change and confirming restored connectivity to our databases. At 23:38 UTC, traffic resumed and all services recovered to full health. Out of an abundance of caution, we continued to monitor before resolving the incident at 00:30 UTC on August 15th, 2024.

To prevent recurrence we are implementing additional guardrails in our database change management process. We are also prioritizing several repair items such as faster rollback functionality and more resilience to dependency failures.

Given the severity of this incident, follow-up items are the highest priority work for teams at this time.
Aug 15, 12:30 AM UTC
major
We have fully rolled-back the changes to database infrastructure and mitigated the impact. All services are now fully operational.
Aug 15, 12:26 AM UTC
major
Git Operations is operating normally.
Aug 15, 12:25 AM UTC
major
Copilot is operating normally.
Aug 15, 12:21 AM UTC
major
Codespaces, Packages and Pages are operating normally.
Aug 15, 12:20 AM UTC
major
Webhooks is operating normally.
Aug 15, 12:19 AM UTC
major
Actions is operating normally.
Aug 15, 12:19 AM UTC
major
Actions is operating normally.
Aug 15, 12:19 AM UTC
major
Actions is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:19 AM UTC
major
Actions is operating normally.
Aug 15, 12:19 AM UTC
major
Pull Requests is operating normally.
Aug 15, 12:19 AM UTC
major
Issues is operating normally.
Aug 15, 12:18 AM UTC
major
Copilot is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:12 AM UTC
major
Git Operations is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:11 AM UTC
major
Codespaces is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:10 AM UTC
major
Webhooks is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:10 AM UTC
major
Actions is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:10 AM UTC
major
Pages is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:09 AM UTC
major
Service health continues to improve, and we are working to stabilize all services. Some services may experience delays in updates and notifications as we work through a backlog of events.
Aug 15, 12:09 AM UTC
major
Packages is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:09 AM UTC
major
Issues is experiencing degraded performance. We are continuing to investigate.
Aug 15, 12:09 AM UTC
major
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Aug 14, 11:50 PM UTC
major
The database infrastructure change is being rolled back. We are seeing improvements in service health and are monitoring for full recovery.
Aug 14, 11:45 PM UTC
major
We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.
Aug 14, 11:29 PM UTC
major
Codespaces is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:22 PM UTC
major
Webhooks is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:20 PM UTC
major
Issues is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:19 PM UTC
major
Git Operations is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:19 PM UTC
major
Packages is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:18 PM UTC
major
We are investigating reports of issues with GitHub.com and GitHub API. We will continue to keep users updated on progress towards mitigation.
Aug 14, 11:16 PM UTC
major
Copilot is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:13 PM UTC
major
Pages is experiencing degraded availability. We are continuing to investigate.
Aug 14, 11:12 PM UTC
major
We are investigating reports of degraded availability for Actions, Pages and Pull Requests
Aug 14, 11:11 PM UTC