This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jun 06, 6:49 PM UTC
All work has been completed and we are hands off.
Jun 06, 6:48 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Jun 06, 3:36 PM UTC
We are conducting routine maintenance on our network infrastructure in the EU. This will not impact production traffic, but may result in slightly increased latency for the remainder of our work. We expect this to last until 17:00 UTC.
Jun 06, 3:31 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 06, 3:31 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jun 06, 5:07 PM UTC
Packages is experiencing degraded performance. We are continuing to investigate.
Jun 06, 4:56 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 06, 4:53 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jun 05, 10:21 PM UTC
Affected Slack and Teams subscriptions have been restored. Please contact support if you encounter any additional issues.
Jun 05, 10:21 PM UTC
Additional detail on the scope of impact during the 14:49 UTC to 16:45 UTC window: a small but elevated percentage of authenticated requests to GitHub.com received incorrect authorization failures. We saw a 1 to 2% increase in 4xx responses for a small number of endpoints (/repos/{owner}/{repo}/pulls/{pull_number}, /repos/{owner}/{repo}, /repos/{owner}/{repo}/contents/{path}). The vast majority of requests completed normally; customers who saw errors during the window can retry now and should see them succeed.
Jun 05, 8:34 PM UTC
We are still exploring options to restore the deleted subscriptions, and we will provide another update soon. In the meantime, customers can manually re-subscribe their Slack and Teams channels to repositories.
Jun 05, 6:43 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Jun 05, 6:05 PM UTC
During 14:49 UTC to 16:45 UTC, customers may have experienced authorization failures for legitimate requests. This was caused by a recently enabled feature flag, which has now been turned off as a mitigation. Customers should now see normal authorization behavior. This is also the cause of the chat integration issue, and we are exploring options to restore it. In the meantime, customers can manually re-subscribe their repo.
Jun 05, 6:04 PM UTC
Customers may see unexpected repo unsubscription events in their Slack or Teams channels.
Jun 05, 5:25 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 05, 5:20 PM UTC
Everything is operating normally.
Jun 04, 8:32 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 04, 8:20 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jun 04, 7:59 PM UTC
This issue is now fully resolved and Copilot Code Review is working as expected.
Jun 04, 7:59 PM UTC
The mitigation for Copilot Code Review is now fully deployed, and new reviews are working as expected. We are continuing to monitor for full resolution.
Customers may need to re-request Copilot Code Review. Copilot Code Review Actions runs running for longer than 20 minutes may be safely cancelled.
Jun 04, 7:41 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Jun 04, 7:07 PM UTC
The mitigation for Copilot Code Review is now fully deployed, and new reviews are working as expected.
Customers may need to re-request Copilot Code Review. Copilot Code Review Actions runs running for longer than 20 minutes may be safely cancelled.
Jun 04, 7:07 PM UTC
The mitigation for Copilot Code Review is rolling out and we are seeing early signs of recovery.
Jun 04, 6:52 PM UTC
We have identified that Copilot Code Review users may see "Copilot ran into an error" on Pull Requests that requested Copilot Code Review.
A mitigation is in progress, we expect mitigation in approximately 30m.
GitHub Enterprise Cloud with Data Residency is not impacted.
Jun 04, 6:22 PM UTC
We have identified that Copilot Code Review. Users may see "Copilot ran into an error" on Pull Requests that requested Copilot Code Review.
A mitigation is in progress.
Jun 04, 6:03 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 04, 6:02 PM UTC
Between June 1, 2026, 23:00 UTC and June 4, 2026 04:11 UTC, customers experienced delays in Dependabot scheduled version updates.
Pull request creation for version updates was delayed, with delays increasing over time and reaching up to two days. Approximately 1.5 million repositories with active Dependabot version update configurations were affected. Dependabot security updates were not affected. The primary cause was changes to an internal platform service that routes requests for Dependabot and other services.
We mitigated the incident by deploying a fix that enables batch enqueuing of update jobs, which significantly increased processing throughput. Once the backlog was drained, Dependabot returned to normal processing times.
To reduce the risk of recurrence, we are working on tuning batch size and concurrency limits for Dependabot update job processing. We are also adding monitoring for job processing lag to enable earlier detection and faster mitigation of similar issues.
Jun 04, 4:11 AM UTC
Job lag has recovered to within normal operating thresholds. We are declaring this incident closed and will follow up with a summary soon.
Jun 04, 4:11 AM UTC
Job lag has recovered from a peak of 1.71 days to 9h 9m at 19:29 UTC and continues to decrease. Backlog is draining at a healthy rate with no signs of reversal. New jobs are processing on schedule. Remaining lag will continue to drain over the next few hours as queued work completes; this is expected post-incident catch-up, not active impact. We will continue monitoring and re-engage if lag trend reverses.
Jun 04, 2:37 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Jun 04, 2:35 AM UTC
We have applied mitigations and are continuing to see improvements in the Dependabot scheduled version updates.
Next update in 12 hours.
Jun 03, 11:38 PM UTC
We are preparing a mitigation for the delayed Dependabot scheduled version updates.
Next update in 2 hours.
Jun 03, 9:10 PM UTC
Customers may see delays of up to two days in Dependabot version updates.
Dependabot Security updates are not delayed.
The team is investigating mitigations for the backlog.
Next update in 1 hour.
Jun 03, 8:18 PM UTC
We're seeing delays in Dependabot scheduled version update runs. Our team is actively working on a fix and will share updates as the situation develops.
Jun 03, 7:43 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 03, 7:42 PM UTC
On June 2, 2026, between 21:54 UTC and June 3, 2026 06:45 UTC, the Spark service was degraded and users were unable to store or retrieve data for their Spark apps in one of our hosting regions. Users could still make changes to their app configuration during this time. The error rate peaked at 25% of affected requests to the service. Impact was limited to users whose requests were served through a single affected region; 43 users experienced errors during this window.
The root cause was a configuration that referenced a service component by a fixed address rather than a dynamic service endpoint. When the component was replaced, requests could no longer reach the fixed address and began to fail. We resolved the incident by updating the configuration to use a our standard service endpoints that are resilient to component replacement. Recovery time was extended because replacing the component required overrides to a temporary deployment safeguard.
We are working to add validation that prevents fixed infrastructure addresses from being used in application configuration outside of test environments and to improve our monitoring to reduce our time to detect.
Jun 03, 6:46 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Jun 03, 6:45 AM UTC
We are investigating reports of issues with service(s): Spark. We will continue to keep users updated on progress towards mitigation.
Jun 03, 6:02 AM UTC
We are investigating reports of impacted performance for some GitHub services.
---
Relevant stamps: dotcom
Jun 03, 3:45 AM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 03, 3:13 AM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jun 02, 12:17 AM UTC
We have identified the root cause and applied mitigations to address delays in billing updates and are continuing to see improvement in the processing rate. We will continue to monitor the progress and will provide an update in few hours.
GitHub Enterprise Cloud with Data Residency is not impacted.
Jun 01, 9:59 PM UTC
We are continuing to investigate delayed billing updates, on GitHub.com. We have applied additional mitigations are continuing to see signs of improvement, and are continuing to work to improve the processing rate. We will continue to keep users updated on progress towards mitigation.
GitHub Enterprise Cloud with Data Residency is not impacted.
Next update in 2 hours.
Jun 01, 7:27 PM UTC
We are continuing to investigate delayed billing updates, on GitHub.com. We have applied additional mitigations and are seeing some more signs of improvement, and are continuing to work to improve the processing rate. We will continue to keep users updated on progress towards mitigation.
GitHub Enterprise Cloud with Data Residency is not impacted.
Jun 01, 6:48 PM UTC
We are continuing to investigate delayed billing updates, on GitHub.com. We have applied multiple mitigations and are seeing some signs of improvement, and are continuing to work to improve the processing rate. We will continue to keep users updated on progress towards mitigation.
GitHub Enterprise Cloud with Data Residency is not impacted.
Jun 01, 5:28 PM UTC
We are investigating reports of delayed billing updates, on GitHub.com. We are continuing to investigate delays in our job processing architecture. We are attempting to mitigate at the infrastructure level. Code scanning runs and notifications have recovered. We will continue to keep users updated on progress towards mitigation.
GitHub Enterprise Cloud with Data Residency is not impacted.
Jun 01, 4:42 PM UTC
We are investigating reports of delayed code scanning runs, billing updates, email and mobile push notifications. We are investigating delays in our job processing architecture. We will continue to keep users updated on progress towards mitigation.
Jun 01, 3:43 PM UTC
We are investigating reports of delayed code scanning runs and delayed billing updates. We will continue to keep users updated on progress towards mitigation.
Jun 01, 3:17 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Jun 01, 3:17 PM UTC
On May 28th, 2026, between approximately 18:27 and 20:41 UTC, the GitHub Copilot service was degraded due to an issue with the Responses API of an upstream provider affecting the GPT-5.2, GPT-5.3-Codex, GPT-5.4, and GPT-5.5 models. Requests routed to these models via the Responses API returned elevated error rates, which also affected Copilot coding agent and Copilot code review. No other models were impacted.
We mitigated the incident by shifting traffic away from the affected models while the upstream provider deployed a fix.
GitHub is working to improve automated failover for the affected models and strengthen monitoring to prevent similar incidents in the future.
May 28, 8:41 PM UTC
Open AI models are currently unavailable. We are shifting requests to other models to reduce impact.
May 28, 8:06 PM UTC
We are investigating errors with Copilot requests using OpenAI models
May 28, 7:40 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
May 28, 7:20 PM UTC
We are investigating reports of impacted performance for some GitHub services.
May 28, 7:01 PM UTC
On May 28, 2026, between 19:07 UTC and 19:16 UTC, multiple GitHub services experienced elevated error rates. This was due to a change that was partially deployed to an authentication service, causing errors for dependent services including the web experience, REST API, Git operations, and GitHub Actions. At peak impact, 10% of GitHub Actions runs failed to queue or encountered errors while downloading actions. We mitigated the incident by rolling back the change.
We are expanding test coverage and improving our deployment validation process to prevent recurrence of this issue in the future.
May 28, 11:10 PM UTC
On May 28, 2026, between 00:54 UTC and 01:19 UTC, some users experienced errors when interacting with the Webhooks API, including webhook delivery history and configuration endpoints. On average, the error rate was 0.28% and peaked at 0.45%. This was due to a bug that caused a single Kubernetes pod to enter a CrashLoopBackOff after receiving a 500 with an empty response body from Cosmos DB.
We mitigated the incident by restarting the service. To prevent future incidents, we are pushing a change to handle this response scenario from Cosmos DB appropriately.
May 28, 1:32 AM UTC
The degradation affecting Webhooks has been mitigated. We are monitoring to ensure stability.
May 28, 1:27 AM UTC
We are investigating reports of degraded performance for Webhooks
May 28, 1:13 AM UTC
On May 27, 2026, between 12:07 UTC and 13:16 UTC, users experienced degraded performance for Git operations, Pull Requests, Issues, GraphQL API, and related services on github.com. During this time, operations that depended on Git file servers experienced elevated error rates (3.5% of pushes via HTTPS and 0.2% of pushes via SSH failed; no fetches/clones failed). An internal analytics component generated unexpectedly high load, which caused CPU saturation on the underlying infrastructure. This led to cascading slowdowns and errors across services that depend on Git operations. The issue was mitigated by stopping the offending component. Services began recovering shortly after mitigation and were fully restored by 13:16 UTC. We are taking steps to add resource limits and kill switches for internal analytics components to prevent similar issues in the future.
May 27, 1:16 PM UTC
We're continuing to investigate degraded performance of Git operations, Issues and Pull requests.
May 27, 12:54 PM UTC
We are investigating reports of degraded performance for API Requests, Git Operations, Issues and Pull Requests
May 27, 12:10 PM UTC
On May 26, 2026, between 15:10 UTC and 16:35 UTC the Copilot service was degraded and many models were no longer available for use. On average, the error rate was ~5% and peaked at 11% of requests to the service. This was due to a change that introduced a configuration mismatch in HMAC signing credentials which caused the list of available models to be truncated. This was mitigated by rolling back the change. This rollback was complete by 15:34 UTC though users continued to see impact until cache TTLs expired.
We are working to improve our monitoring and error handling to reduce time to detection and better experience for issues like this in the future.
May 26, 4:35 PM UTC
The degradation affecting Copilot has been mitigated. We are monitoring to ensure stability.
May 26, 4:24 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
May 26, 3:48 PM UTC
We are investigating reports of impacted performance for some GitHub services.
May 26, 3:44 PM UTC
On May 26, 2026, between 10:40 UTC and 12:56 UTC, GitHub Actions jobs were degraded. From 10:40 to 12:16 UTC, all newly queued Actions runs failed to start. From 12:16 to 12:56 UTC, Actions runs that required downloading actions for their workflows continued to fail. GitHub Pages, Copilot Code Review, Copilot coding agent, Octoshift, and GitHub Enterprise Importer were also impacted due to their dependency on Actions.
This was caused by our automated account review system incorrectly suspending the service account used by GitHub Actions to authenticate workflow runs and download actions.
We mitigated by restoring the account at 12:16 UTC, marking it exempt from further automated review at 12:20 UTC, and redeploying a related service at 12:48 UTC to flush cached account state. Full recovery was confirmed at 12:56 UTC.
During this incident, a small number of Issues, PRs, Comments, and Discussions were marked as hidden when the service account was disabled. No data was lost. All content hidden because of this incident has been restored and full search index restoration is in progress.
To prevent a recurrence, we have added an allowlist of all service accounts that cannot be suspended by automated systems, and ensuring these protections are enforced consistently across all account management tooling. We are also improving diagnostic tooling for accounts and reducing cache propagation delays to shorten time to mitigate similar incidents in the future.
May 26, 1:18 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
May 26, 1:01 PM UTC
The degradation affecting Actions and Pages has been mitigated. We are monitoring to ensure stability.
May 26, 1:00 PM UTC
We have identified the cause of the authentication issues affecting GitHub Actions and are actively working on mitigation
May 26, 12:37 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
May 26, 12:17 PM UTC
We are investigating authentication issues leading to failure in starting Actions runs and downloading actions. At this time the majority of Actions runs is impacted.
May 26, 11:53 AM UTC
Actions is experiencing degraded availability. We are continuing to investigate.
May 26, 11:19 AM UTC
We are investigating reports of degraded performance for Actions and Pages
May 26, 10:57 AM UTC
On May 23, 2026 between 06:00 UTC and 19:12 UTC, GitHub experienced intermittent errors authenticating
GitHub app installation tokens.
During this time, between 1-5% of app installation token authentication requests failed, with an average of 2.3% and the error rate peaking at approximately 5.4% around 14:00 UTC. Users may have experienced authentication failures when using GitHub Apps, including failures in Git operations and API calls using app installation tokens.
The issue was caused by an issue in a caching proxy component and was remediated by rolling back that component to a previous version. We are taking steps to improve monitoring for cache miss anomalies to ensure that token authentication remains functional during infrastructure changes and reviewing our protocol for testing and when we upgrade third-party dependencies.
May 23, 7:32 PM UTCThis is fully mitigated, we will continue to monitor to ensure it does not reoccur.
May 23, 7:32 PM UTC
We have identified and are applying additional mitigation and will continue to monitor for complete mitigation.
May 23, 7:06 PM UTC
We see significant signs of mitigation and are monitoring for full mitigation.
May 23, 6:42 PM UTC
We are seeing signs of mitigation and are continuing to monitor for complete mitigation.
Next update in one hour.
May 23, 5:41 PM UTC
We are continuing to investigate an elevated error rate of authentication failures for app installation tokens. Next update in one hour.
May 23, 4:35 PM UTC
We are seeing an increased rate of authentication failures for app installation tokens, affecting approximately 1% of tokens. We are continuing to investigate.
May 23, 4:01 PM UTC
We are investigating reports of impacted performance for some GitHub services.
May 23, 4:00 PM UTC
On May 20, 2026, between 16:00 UTC and 17:45 UTC, GitHub Actions customers experienced run start delays exceeding 5 minutes. Approximately 4.5% of all runs were delayed during the impact window, with scale set jobs disproportionately affected. 30% of scale set jobs were delayed and 4% failed to start entirely.
The incident was caused by a misconfigured health check on an internal service that assigns jobs to runners. A brief latency spike in an upstream dependency triggered health check failures across several pods, removing them from service and concentrating load on the remaining capacity. The added load drove memory pressure that escalated into a cascading failure in one regional cluster, leaving it unable to self-recover.
Responders mitigated the incident by scaling capacity in the healthy regional clusters and draining traffic away from the impaired one, after which run start latency recovered. To prevent recurrence, we are strengthening our health check configuration to avoid cascading failure scenarios and evaluating automated mitigations to rebalance traffic when a region is degraded.
May 20, 8:14 PM UTC
Customer impact has fully subsided. We are maintaining yellow status while we deploy a permanent fix to prevent recurrence.
May 20, 7:41 PM UTC
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
May 20, 6:17 PM UTC
The degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
May 20, 5:52 PM UTC
A subset of runners are taking longer than expected to connect, which may delay some jobs from beginning execution. We are actively working to mitigate the issue.
May 20, 5:46 PM UTC
We are investigating reports of degraded performance for Actions
May 20, 4:58 PM UTC
On May 19, 2026, between 05:30 UTC and 14:50 UTC, some Copilot users experienced failures when using code completions, chat sessions, and cloud agent sessions. At peak impact, approximately 13% of Copilot API requests failed, and approximately 24% of remote sessions failed to initialize. A partial mitigation at 08:16 UTC reduced the Copilot API error rate to approximately 0.3%, but intermittent failures persisted until a full fix was deployed at 14:15 UTC and recovery was verified by 14:50 UTC.
The incident was caused by rate limits being exceeded on a shared infrastructure component. A recently enabled feature increased call volume to this component, and the combined load exceeded capacity limits as traffic increased during business hours.
We mitigated the incident by deploying a caching layer to reduce load on shared infrastructure. To prevent recurrence, we are separating rate limit scopes between services, adding monitoring for internal dependency rate limiting, and reducing redundant calls.
May 28, 12:40 PM UTC
On May 15, 2026, from approximately 07:43 UTC to 08:48 UTC, GitHub Actions experienced a degradation that caused workflow runs to fail or experience delayed starts for a subset of customers. The incident was triggered by a planned failover of supporting infrastructure used by GitHub Actions. During that operation, an automated service discovery update did not propagate correctly, which caused traffic to be routed incorrectly and increased request timeouts in a core dependency for workflow orchestration.
At peak impact, 42% of Actions runs failed. Downstream services that depend on Actions workflow execution were also impacted, including GitHub Pages and Copilot cloud services. At 08:12 UTC, responders manually corrected the service discovery routing issue. Timeout and failure rates recovered shortly after, and we continued monitoring until full stabilization was confirmed across all affected services. The incident was marked resolved at 08:48 UTC.
To prevent recurrence, we are implementing failover guardrails that validate service discovery state before completing failover operations, strengthening pre-flight and post-flight verification checks, and improving dependency resilience to reduce timeout cascades during infrastructure events.
May 15, 8:48 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
May 15, 8:41 AM UTC
We are monitoring an issue that was affecting GitHub Actions and causing downstream issues in GitHub Coding Agent and GitHub Code Review Agent. The issue has resolved now but we are closely monitoring our systems for full recovery.
May 15, 8:29 AM UTC
The degradation affecting Pages has been mitigated. We are monitoring to ensure stability.
May 15, 8:27 AM UTC
The degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
May 15, 8:26 AM UTC
Pages is experiencing degraded availability. We are continuing to investigate.
May 15, 8:14 AM UTC
We are investigating reports of degraded availability for Actions
May 15, 8:13 AM UTC
Beginning at 02:49 UTC on May 15 2026 and lasting until 03:04 UTC, GitHub.com was unavailable for a subset of customers. This impact has been mitigated and normal service resumed.
The issue was rooted in a sudden spike in traffic, with intermittent impact. We've identified the source of the traffic and prevented further disruption.
May 15, 3:57 AM UTC
On May 13, 2026, between 14:31 and 16:03 UTC, the Code Scanning service experienced processing delays and 12% of check runs took over 15 minutes to complete. The delays were caused by replication lag due to an internal database migration, resulting in insufficient worker capacity for our high rate of job enqueues.
We mitigated the impact by scaling our processing workers by 34%. Code Scanning results returned to normal processing times after the mitigation was applied.
The capacity increases are permanent, and we are looking into more ways to decrease the load on our workers to help prevent this in the future.
May 13, 4:03 PM UTC
CodeQL impact has been mitigated. We are continuing to monitor for durable recovery.
May 13, 3:30 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
May 13, 3:26 PM UTC
We have applied a mitigation to increase processing capacity. We are continuing to monitor to confirm full recovery. We will provide another update by 15:30 UTC.
May 13, 2:58 PM UTC
We are investigating delays affecting CodeQL, the code analysis engine used by Code Scanning. Some users may experience delayed or incomplete code scanning results. Our engineering team is investigating. We will provide another update by 15:15 UTC.
May 13, 2:43 PM UTC
We are investigating reports of impacted performance for some GitHub services.
May 13, 2:41 PM UTC
On May 12, 2026, between 13:41 and 17:43 UTC, some services experienced delays in processing. For the Code Scanning service, 53% of check runs took over 15 minutes to complete. Additionally, notifications took an average of 22 minutes to be delivered and Slack integration webhooks took an average of 20 minutes to be delivered. The delays were caused by replication lag due to an internal database migration, resulting in insufficient worker capacity for our high rate of job enqueues.
We mitigated the impact by scaling our processing workers to handle the increased load. All services returned to normal processing times after the mitigation was applied.
We are working to create dedicated worker pools for some of our high usage shared queues to help prevent this in the future.
May 12, 5:43 PM UTC
All services have fully recovered.
May 12, 5:43 PM UTC
CodeQL has fully recovered. We're continuing to work on recovery for the remaining impacted services.
May 12, 4:59 PM UTC
Webhooks have fully recovered. Continuing to work on recovery for the other services.
May 12, 4:29 PM UTC
Webhooks is operating normally.
May 12, 4:28 PM UTC
We've established that most delays are related to a queuing service and are working to scale out. Early signals from the scale-out are showing signs of recovery for some services. We'll provide an update when services are fully recovered.
May 12, 4:18 PM UTC
Webhooks is experiencing degraded performance. We are continuing to investigate.
May 12, 3:44 PM UTC
We're continuing to investigate issues with CodeQL actions workflows. We're additionally seeing delays for notifications, webhooks, and the Slack integration.
May 12, 3:42 PM UTC
CodeQL actions are currently experiencing delays, which may result in those actions being stuck in a pending state or having failed due to a timeout.
May 12, 3:13 PM UTC
We are investigating reports of degraded performance for CodeQL
May 12, 2:38 PM UTC
On May 11th, 2026, between 14:00 UTC and 14:33 UTC, HTTP-based Git read operations were degraded. On average, the error rate was 2.8% and peaked at 7.5% of requests to the service. This was due to resource exhaustion in a networking gateway between GitHub.com’s frontend service for Git operations and a dependency service that performs authentication and authorization. Following the initial spike, the frontend service became stuck in a degraded state in one of our data centers, increasing time to mitigation.
We mitigated the incident by scaling the networking gateway and re-deploying the frontend service.
To reduce our time to detection and mitigation in the future, we are adding auto-scaling to the networking gateway, and resolving a bug which caused the frontend service to remain degraded.
May 11, 2:33 PM UTC
We are investigating reports of degraded performance for Git Operations
May 11, 2:25 PM UTC
On May 7, 2026, between 04:12 UTC and 06:13 UTC, Copilot Cloud Agent and Copilot Code Review Agent sessions for pull requests were delayed or failed to start.
The issue was caused by follow-up recovery work from a separate Pull Requests incident (https://www.githubstatus.com/incidents/f5pb5d5mr9yh). As part of that recovery, we ran a large database migration, which caused replication delays on several replica hosts.
Although those replicas were not serving user traffic, our safeguards correctly treated the elevated replication lag as a signal to slow down writes to the affected database cluster. As a result, some pull request background processing was temporarily delayed. That processing is responsible for sending the internal events that Copilot agents use to begin work, so affected agents did not start until the database replicas caught up.
The system recovered once replication lag returned to normal and pull request processing resumed. We are reviewing how this safeguard interacts with recovery migrations so we can reduce the chance of similar secondary impact during future incident recovery work.
May 07, 6:56 AM UTC
Copilot code review and cloud agents are starting again for pull requests, we are monitoring for full recovery.
May 07, 6:14 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
May 07, 6:13 AM UTC
We are investigating reports of impacted performance for some GitHub services.
May 07, 5:02 AM UTC
On May 6, 2026 between 15:12 and 19:02 UTC creation of new pull request review threads on GitHub.com failed. This included new line comments and file comments on pull requests. Existing PRs and previously created comments were unaffected.
This incident was caused by a 32-bit integer key reaching its maximum value in a Vitess lookup table used during PR thread creation. The primary table had been migrated to a 64-bit integer key but the Vitesse lookup table remained 32-bit. Once the values in the primary table passed the available 32-bit ID space in the lookup table, attempts to create new review threads began failing, resulting in near 100% failure rate for new thread creation requests. We mitigated the issue by updating the impacted lookup table definitions across all shards to use 64-bit integer column types, increasing the available ID range and restoring normal operation. Service was fully restored once the schema changes competed globally.
To help prevent similar incidents, we are expanding existing monitoring of database columns to include Vitess lookup tables to notify in advance of any tables that is approaching a column size limit. This work is intended to provide earlier detection of columns approaching size limits before customer impact occurs.
May 06, 7:04 PM UTC
Mitigations have been fully applied and we are seeing full recovery of functionality on Pull Request threads. We are continuing to monitor to ensure sustained recovery.
May 06, 7:04 PM UTC
Creation of new Pull Request threads (including line and file comments) continues to be affected although we are seeing partial recovery.
A mitigation is being applied to continue to accelerate recovery with complete recovery expected by 8:00pm UTC.
Top-level comments on pull requests still function and should remain usable during recovery. Opening and merging pull requests, actions, and other pull request operations remain functional.
May 06, 5:52 PM UTC
Creation of new Pull Request threads (including line and file comments) continues to be affected.
Top-level comments on pull requests still function and should remain usable during recovery. Opening and merging pull requests, actions, and other pull request operations remain functional.
A mitigation is being applied. Recovery is expected to be gradual, with complete recovery expected by 8:00pm UTC.
May 06, 4:20 PM UTC
Pull Requests is experiencing degraded availability. We are continuing to investigate.
May 06, 4:07 PM UTC
Creation of new Pull Request threads (including line and file comments) continues to be affected. We have identified the cause of the issue and have started taking steps to mitigate this issue.
May 06, 3:55 PM UTC
We are investigating failures for new thread creation on Pull Requests. Responses to existing pull request threads are unaffected.
May 06, 3:28 PM UTC
We are investigating reports of degraded performance for Pull Requests
May 06, 3:25 PM UTC
On May 6, 2026 between 11:02 UTC and 11:13 UTC, users were unable to start or view Copilot Cloud Agent or remote sessions. During this time, requests to the session API returned errors, preventing users from creating new sessions or viewing existing ones. The issue was caused by a configuration change to the service's network routing that inadvertently removed the ingress path for the service. The team reverted the change at 11:13 UTC which restored service. The incident remained open until 11:59 UTC while the team verified full recovery. We are taking steps to improve our deployment validation process to prevent similar configuration changes from impacting production traffic in the future.
May 06, 11:59 AM UTC
We have applied a mitigation and Copilot services have recovered.
May 06, 11:59 AM UTC
We are investigating issues with the ability to start Copilot Cloud Agent sessions and view them.
May 06, 11:25 AM UTC
We are investigating reports of impacted performance for some GitHub services.
May 06, 11:21 AM UTC
On May 6, 2026, from approximately 06:45 UTC to 09:15 UTC, GitHub Actions Standard Ubuntu hosted runners were degraded. 17.1% of jobs requesting a standard runner failed.
This was caused by an unexpected data shape in the allocation configuration data for standard runners. That data was introduced as part of post-incident remediation work for an incident the previous day and caused new allocations to be blocked as load ramped up for the day. Removing that data at 08:51 allowed allocations to proceed and hosted runner pools to scale up and recover.
We are updating the filter logic for this allocation data to be resilient to abnormal data shapes and improving monitoring to alert when allocations are blocked, allowing the team to respond before customer impact starts.
May 06, 9:44 AM UTC
Actions wait times have fully recovered.
May 06, 9:44 AM UTC
The degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
May 06, 9:19 AM UTC
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
May 06, 9:08 AM UTC
Actions is experiencing issues with ubuntu standard hosted runners leading to high wait times. We are actively investigating the issue
May 06, 8:00 AM UTC
We are investigating reports of degraded availability for Actions
May 06, 7:19 AM UTC
Between approximately 14:00 and 16:10 UTC on May 5, 2026, SSH-based Git operations experienced elevated latency and intermittent failures. On average, the error rate was 0.46% and peaked at 0.6% of SSH write requests. HTTP-based Git operations, including web UI and HTTPS clones, were not affected.
The impact was caused by reduced SSH capacity at one of our data center sites. During a period of high traffic, the remaining hosts became overloaded, leading to connection exhaustion and some failures for SSH-based operations.
Additional capacity was provisioned to expand SSH capacity and resolve the incident. The expanded capacity was fully online by 18:18 UTC.
To reduce the likelihood of similar incidents, we will implement faster scaling solutions for SSH infrastructure and improved alerting for host availability and capacity thresholds.
May 05, 6:35 PM UTC
We've completed our mitigation to prevent further impact. At this time the incident is considered resolved.
May 05, 6:35 PM UTC
The degradation affecting Git Operations has been mitigated. We are monitoring to ensure stability.
May 05, 6:25 PM UTC
We're continuing to work on preventing further impact from the earlier issue. No SSH-based impact is expected at this time. We'll post new updates if impact recurs or once our mitigation is in place.
May 05, 5:26 PM UTC
Git Operations is experiencing degraded performance. We are continuing to investigate.
May 05, 5:23 PM UTC
Between approximately 14:00 and 16:10 UTC, customers using SSH-based Git operations may have experienced elevated latency and failures. HTTP-based operations were not impacted. We've identified a suspected root cause and are working to implement a mitigation to prevent further impact.
May 05, 4:54 PM UTC
We are investigating reports of impacted performance for some GitHub services.
May 05, 4:49 PM UTC
On May 5, 2026, from approximately 13:22 UTC to 17:05 UTC, GitHub Actions hosted runners in the East US region were degraded. 13.5% of jobs requesting a standard runner failed and ~16% of requested Larger Runners with private networking pinned to East US failed or were delayed by more than 5 minutes. Copilot Code Review requests were also impacted. Approximately 8,500 code review requests timed out during this window. Affected users saw an error comment on their pull requests and were able to retry by re-requesting a review. Most runner requests were picked up by other regions automatically, but a portion of requests still routing to East US were impacted.
This was triggered by a scale-up operation for hosted runner VMs in the East US region. This is a regular operation, but the VM create load hit an internal rate limit when VM creates pull images from storage. Existing backoff logic was not triggered because of the response code returned in this case. The rate limiting and VM creation failures were mitigated by reducing load to allow for recovery and allowing queued work to be processed. By 15:34 UTC, queued and failed job assignments were mostly mitigated, with less than 0.5% of runner assignments impacted between 15:34 and full recovery at 17:05.
We are improving our system’s throttling behavior when limits occur, improving our controls to more quickly mitigate similar situations in the future, and reviewing all limits end-to-end for similar operations. We also immediately paused all scale and similar operations until these changes are in place and validated.
May 05, 5:26 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
May 05, 5:11 PM UTC
Standard hosted runners have now reached full recovery. Hosted Runners with Private Networking in the East US region remain degraded as we continue working with our compute provider to restore capacity. Hosted Runners with private networking can fail over to a different Region to mitigate the issue.
May 05, 5:11 PM UTC
We've seen signs of recovery for Standard Hosted Runners and are continuing to monitor for full recovery. Hosted Runners with Private Networking in the East US region remain affected as we continue working with our compute provider to restore capacity.
May 05, 4:33 PM UTC
We've applied a mitigation for long queue times and failures on Standard Hosted Runners and are monitoring for full recovery. Hosted Runners with Private Networking in the East US region remain affected as we continue working with our compute provider to restore capacity.
May 05, 3:54 PM UTC
We are working with our compute provider to alleviate elevated queue times and failures for Actions Jobs running on Hosted Runners in the East US region affecting 10% of runs. Hosted Runners with private networking can fail over to a different Region to mitigate the issue.
May 05, 3:12 PM UTC
We are investigating elevated queue times and failures on Actions Jobs running on Hosted Runners in East US affecting 8% of runs. Hosted Runners with private networking can fail over to a different Azure region to mitigate the issue.
May 05, 2:14 PM UTC
We are investigating elevated queue times on Actions Jobs running on Standard Hosted Runners in East US affecting 10% of runs
May 05, 1:48 PM UTC
We are investigating reports of degraded availability for Actions
May 05, 1:37 PM UTC
On 2026-05-04 at 3:37:17 PM UTC we detected increased latency on issues resulting in timeouts, and elevated 500 errors on webhooks. A scheduled workload drove high utilization on the primary host of a critical datastore, saturating the connection pool. We paused the job to mitigate the problem at 4:40:05 PM UTC and have implemented measures to prevent recurrence.
May 04, 4:40 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
May 04, 4:36 PM UTC
Webhooks is operating normally.
May 04, 4:35 PM UTC
The degradation affecting Codespaces has been mitigated. We are monitoring to ensure stability.
May 04, 4:35 PM UTC
The degradation affecting Issues has been mitigated. We are monitoring to ensure stability.
May 04, 4:34 PM UTC
Pull Requests is operating normally.
May 04, 4:32 PM UTC
Pages is operating normally.
May 04, 4:29 PM UTC
Latency across services has normalized. We are continuing to investigate the root cause and prevent reoccurrence.
May 04, 4:29 PM UTC
Actions and Packages are operating normally.
May 04, 4:28 PM UTC
Git Operations is operating normally.
May 04, 4:25 PM UTC
Pages is experiencing degraded performance. We are continuing to investigate.
May 04, 4:06 PM UTC
Codespaces is experiencing degraded performance. We are continuing to investigate.
May 04, 4:05 PM UTC
Pull Requests is experiencing degraded performance. We are continuing to investigate.
May 04, 3:56 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
May 04, 3:51 PM UTC
Pull Requests is experiencing degraded availability. We are continuing to investigate.
May 04, 3:51 PM UTC
Packages is experiencing degraded performance. We are continuing to investigate.
May 04, 3:50 PM UTC
Git Operations is experiencing degraded performance. We are continuing to investigate.
May 04, 3:48 PM UTC
We are investigating Increased latency and timeouts across multiple GitHub services.
May 04, 3:48 PM UTC
We are investigating reports of degraded performance for Issues and Webhooks
May 04, 3:45 PM UTC
On April 28, 2026, at approximately 14:07 UTC, GitHub received reports that pull requests were missing from search results across global and repository /pulls pages.
The issue was caused by a manually invoked repair job intended for a single repository, which was executed without the required safety flags. During execution of the repair job, the database query remained correctly scoped to the repo’s PR IDs. However, the Elasticsearch reconciliation logic did not apply the same scope. It interpreted the min and max PR IDs as a continuous range, causing unrelated PR documents across other repos to be marked for deletion. This resulted in the removal of 1,789,756,838 PR documents from the search index, approximately 49% of indexed PR documents.
Customer impact was limited to PR search and list discoverability. Primary storage was unaffected, and there was no impact to opening, updating, or merging PRs.
The issue was identified ~10 minutes after initial customer reports. Because it affected search index completeness rather than service availability, it was not caught by existing monitoring.
The root cause was a flaw in the search document repair framework: it allowed a scoped reconciliation to run without enforcing a matching Elasticsearch query scope. This created a destructive mismatch between the source-of-truth and the index. The issue was compounded by the ability to trigger the job from the production console without safety defaults. Prior testing focused only on safe backfill scenarios and did not cover this reconciliation path. Additionally, there was no automated detection for large-volume deletions in Elasticsearch.
We mitigated the incident through three parallel actions: (1) Deployed a MySQL-backed search fallback for the most active repos by traffic to restore PR visibility for highly impacted users (2) Initiated a snapshot restore and reindex process to repopulate missing pull request documents in Elasticsearch (3) Added a degradation notice on PR pages to inform users of incomplete search results while recovery was in progress. The incident was resolved on May 1, 2026 at 4:15 UTC, following completion and validation of the reindex process.
To prevent recurrence, we are prioritizing improvements to the repair framework and safeguards. These include enforcing scoped query alignment between primary storage and Elasticsearch, preventing destructive operations without explicit opt-in, strengthening guardrails for manual repair jobs, and evaluating restrictions on production console access.
In parallel, we are expanding automated test coverage for reconciliation safety invariants and introducing detection for anomalous deletion patterns in Elasticsearch so similar issues can be identified or blocked earlier.
We are committed to improving the safety and reliability of our repair systems and ensuring that operational workflows are resilient to both software defects and manual invocation risks.
May 01, 4:15 AM UTC
This incident has been resolved. Search and indexing functionality for pull requests are now fully restored. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
May 01, 4:11 AM UTC
We have repaired the missing search records for affected Pull Requests and are working to identify and repair records left in a stale state after the recovery.
Apr 30, 3:49 AM UTC
We have restored search/indexing functionality for over 99% of impacted pull requests. We are continuing to address the remaining affected pull requests and are reviewing outstanding gaps as part of the restoration process.
Apr 29, 10:22 PM UTC
Mitigation is in progress, with full recovery of impacted pull request listings expected within approximately 24 hours.
Apr 29, 12:40 AM UTC
We have made an interim mitigation to improve availability for some impacted repositories while reindexing continues, and we are actively monitoring the indexing progress.
Apr 28, 10:46 PM UTC
Elastic search reindexing of pull requests is continuing. All data is preserved, but may not be available on pages relying on elasticsearch until the reindex is complete.
Pages and APIs that do not rely on elasticsearch, including the GitHub CLI (gh pr list) and API (/repos/{owner}/{repo}/pulls), are not impacted and can be used to retrieve pull request data in the interim.
Apr 28, 9:43 PM UTC
We are actively reindexing the remaining ElasticSearch indexes. Our priority is ensuring correctness and avoiding further impact. We are taking a measured approach to safely backfill data and will share additional updates as progress continues.
Apr 28, 3:58 PM UTC
After yesterday’s incident, we are investigating cases where /pulls and /repo/pulls pages are not showing all indexed pull requests. This is because our Elasticsearch cluster does not currently contain all indexed documents.
No pull request data has been lost. As pull requests are updated, they will be reindexed. We are also working on accelerating a full reindex so these pages return complete results again.
Apr 28, 2:51 PM UTC
We are investigating reports of degraded performance for Pull Requests
Apr 28, 2:17 PM UTC
On April 28, 2026, from approximately 12:41 UTC to 17:09 UTC, GitHub Actions jobs using Standard Ubuntu 22 and Ubuntu 24 hosted runners experienced run start delays. Approximately 8% of hosted runner jobs using Ubuntu 22 and Ubuntu 24 experienced delays greater than 5 minutes or failures. Larger and self-hosted runners were not impacted.
This was caused by a performance regression introduced in the VM reimage process. That reimage delay lowered the overall capacity of runners available to pick up new jobs. This was mitigated with a rollback to a known good image version.
We are addressing the core issue with reimage performance and improving the granularity of reimage telemetry across our services and our compute provider to more quickly diagnose similar issues in the future. Finally, we are evaluating other rollout changes to automatically detect similar regressions.
Apr 28, 5:09 PM UTC
Actions is operating normally.
Apr 28, 5:08 PM UTC
Less than 1% of hosted ubuntu-latest runs are delayed. We’re working through remaining steps to restore runner capacity.
Apr 28, 4:36 PM UTC
Currently less than 2% of hosted ubuntu-latest and ubuntu-24.04 runs are delayed or failing. We are continuing to monitor for full recovery.
Apr 28, 3:41 PM UTC
We've applied a mitigation to unblock running Actions. We're continuing to monitor.
Apr 28, 3:20 PM UTC
We're still investigating the root cause for run start delays and failures for Actions hosted Ubuntu jobs, around 5% of jobs are impacted as of now.
Apr 28, 2:49 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
Apr 28, 2:02 PM UTC
Actions is experiencing capacity constraints with hosted ubuntu-latest and ubuntu-24.02, leading to high wait times. Other hosted labels and self-hosted runners are not impacted.
Apr 28, 1:59 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 28, 1:59 PM UTC
On April 27, 2026 between 16:15 UTC and 22:46 UTC, GitHub search services experienced degraded connectivity due to saturation of the load balancing tier deployed in front of our search infrastructure. This resulted in intermittent failures for services relying on our search data including Issues, Pull Requests, Projects, Repositories, Actions, Package Registry and Dependabot Alerts. The impact was varied by search target, with services seeing up to 65% of searches timing out or returning an error between 16:15 UTC and 18:00 UTC.
We detected the drop in search results through our ongoing monitoring and declared an incident at 16:21 UTC when we determined the issues would not self-heal. We tracked the incident as mitigated as of 21:33 UTC and monitored the systems until 22:46 UTC when we declared the incident resolved. Our existing monitoring did not classify the increased scraping as a risk and this dimension of the incident was only discovered while working to mitigate.
The saturation was caused by a large influx of anonymous distributed scraping traffic that was crafted to avoid our public API rate limits. This scraping traffic made up 30% of the day’s total search traffic, but it was concentrated within a four-hour period. The traffic originated from over 600,000 Unique IP addresses, with matching actor information across the board.
To mitigate, we immediately focused on relieving pressure from the load balancers while simultaneously working on scaling the load balancing tier, blocking the anomalous traffic and applying tuning to the balancers to fully resolve the incident.
Looking ahead, we’ve not only scaled the load balancer tier, but applied optimizations to improve our connection handling and re-use to reduce the possibility that a saturation event like this can re-occur. We’ve also added new monitors and controls within the platform to allow us to restrict anonymous traffic to mitigate the impact to our registered users.
Apr 27, 10:46 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 27, 10:44 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 27, 10:35 PM UTC
The degradation affecting Actions, Issues, Packages and Pull Requests has been mitigated. We are monitoring to ensure stability.
Apr 27, 9:33 PM UTC
We've applied a mitigation and continuing to monitor
Apr 27, 9:32 PM UTC
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Apr 27, 8:06 PM UTC
We have identified the source of the additional load causing stress on our ElasticSearch clusters. We have disabled the source of that load and are seeing signs of recovery
Apr 27, 7:50 PM UTC
Pull Requests is experiencing degraded availability. We are continuing to investigate.
Apr 27, 6:19 PM UTC
We're continuing to see connectivity issues reaching elasticsearch. Impact on downstream services will be intermittent as we find the root cause
Apr 27, 6:17 PM UTC
Users are experiencing intermittent failures to view issues, pull requests, projects and Actions workflow runs.
We are still investigating and attempting mitigations. We will provide further updates.
Apr 27, 5:35 PM UTC
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Apr 27, 4:53 PM UTC
Packages is experiencing degraded performance. We are continuing to investigate.
Apr 27, 4:39 PM UTC
Issues is experiencing degraded performance. We are continuing to investigate.
Apr 27, 4:36 PM UTC
Customers across GitHub are experiencing failures with searches. Examples include: workflow run failures, projects failing to load, and timed out search requests. This is due to an ongoing infrastructure issue that we have been investigating.
Apr 27, 4:33 PM UTC
We are investigating reports of degraded performance for Actions
Apr 27, 4:31 PM UTC
On April 22, 2026 from 18:49 to 19:32 UTC , the Copilot Cloud Agent service began failing during session execution for users running the Agent HQ Codex agent. Codex agent sessions failed to start for all entry points (issue assignment, @copilot comment mentions). 0.5% of total Copilot Cloud Agent jobs were impacted (~2,000 failed jobs). Copilot and other agent sessions were unaffected.
This was caused by a model resolution mismatch in Codex agent sessions, resulting in an incompatible model being used at runtime. A mitigation was deployed to select a stable default model for Codex agent sessions.
We are working to harden the underlying model-resolution path so it correctly scopes to the requesting agent's supported models to prevent similar failure mode in the future.
Apr 27, 7:02 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 27, 7:01 PM UTC
We've found the issue and are working on deploying a solution to get Codex agent runs working again.
Apr 27, 5:26 PM UTC
Copilot Cloud Agent (CCA) jobs using the Codex agent are failing after starting. To avoid this issue, please choose a different agent. We are investigating the cause and working towards remediation
Apr 27, 5:01 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 27, 4:48 PM UTC
On April 24, 2026, from approximately 11:39 UTC to April 25, 2026 at 00:15 UTC, GitHub Actions experienced delays and timeouts for Larger Hosted Runner jobs using VNet injection in the East US region without a failover region configured. Standard and Self-hosted runners were not impacted. This was caused by backend failures in our compute provider’s provisioning, scaling, and update operations for VMs in the East US region and mitigated by a rollback across all affected Availability Zones. More detail is available at https://azure.status.microsoft/en-us/status/history/?trackingId=5GP8-W0G.
We are working to improve the reliability of our annotations for jobs impacted by regional issues and are adding system log notifications as an additional customer communication channel alongside annotations.
VNet Failover is also now in public preview, allowing customers to evacuate Larger Hosted Runners using VNet injection in cases like this.
Apr 25, 12:36 AM UTC
This is related to the public impact, "Multiservice impact for Azure Workloads in East US" shared at https://azure.status.microsoft/
Apr 24, 7:14 PM UTC
We are investigating reports of degraded performance for Larger Runners with vnet injection in East US and we are working with our service provider on mitigation.
Apr 24, 7:09 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 24, 7:02 PM UTC
On April 23, 2026, between 16:05 UTC and 20:43 UTC, the Pull Requests service experienced a regression affecting merge queue operations. PRs merged via merge queue using the squash merge method produced incorrect merge commits when the merge group contained more than one PR. In affected cases, changes from previously merged PRs and prior commits were inadvertently reverted by subsequent merges.
During the impact window 2,092 pull requests were affected. The issue did not affect pull requests merged outside of merge queue, nor merge queue groups using the merge or rebase methods.
It took approximately 3 hours and 33 minutes to identify the issue. The change completed deployment at approximately 16:05 UTC, and we became aware at 19:38 UTC following an increase in customer support inquiries. Because the issue affected merge commit correctness rather than availability, it was not detected by existing automated monitoring and was identified through customer reports.
The regression was introduced by a new code path that adjusted merge base computation for merge queue ref updates. This code path was intended to be gated behind a feature flag for an unreleased feature, but the gating was incomplete.
As a result, the new behavior was inadvertently applied to squash merge groups, producing an incorrect three-way merge. This caused subsequent squash merges to revert changes from earlier pull requests and, in some cases, changes between their starting points.
We mitigated the incident by reverting the code change and force-deploying the fix across all environments. After resolution, we identified affected repositories and sent targeted remediation instructions to repository administrators with step-by-step recovery guidance.
The regression was not identified during internal validation. Existing test coverage primarily exercised single-PR merge queue groups, which did not exhibit the faulty base-reference calculation. Because automated checks did not validate merge correctness for multi-PR squash groups, the defect surfaced only in production.
To prevent recurrence, GitHub is expanding test coverage for merge correctness validation. We are broadening automated coverage for merge queue operations, including regression checks that validate resulting Git contents across supported configurations, so issues affecting merge correctness are caught before reaching production.
We are committed to ensuring the correctness and reliability of merge queue operations. These actions will reduce the risk of similar regressions and improve confidence in future changes to the Pull Requests service.
Apr 23, 9:43 PM UTC
We have resolved a regression present when using merge queue with either squash merges or rebases. If you use merge queue in this configuration, some pull requests may have been merged incorrectly between 2026-04-23 16:05-20:43 UTC.
This behavior is still present in GitHub Enterprise Cloud with Data Residency, and we are rolling out the same fix.
Apr 23, 9:18 PM UTC
Pull Requests is operating normally.
Apr 23, 8:47 PM UTC
We have identified a regression in merge queue behavior present when squash merging or rebasing. We have identified the root-cause and are in the process of reverting the change.
Apr 23, 7:58 PM UTC
We are investigating reports of degraded performance for Pull Requests
Apr 23, 7:50 PM UTC
Between 18:45 and 19:42 UTC on April 23, users were unable to start new agent tasks using either Claude or Codex agent on github.com. This was caused by a code change to how Copilot mission control routes task creation requests. Ongoing agent tasks and other Copilot agent features were not affected. We mitigated the impact by reverting the breaking change. We are adding extra monitoring and integration test coverage for the task creation path to prevent future recurrence.
Apr 23, 7:42 PM UTC
We have identified the root cause of the issue and are working on mitigation.
Apr 23, 7:33 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 23, 7:28 PM UTC
On April 23, 2026, between 16:03 UTC and 17:27 UTC, multiple GitHub services experienced elevated error rates and degraded performance due to DNS resolution failures originating from our DNS infrastructure in our VA3 datacenter. Approximately 5–7% of overall traffic was affected during the impact window:
- Webhooks: ~0.35% of API requests returned 5xx (peak ~0.39%). ~0.88% of requests exceeded 3s latency; at peak, >3s responses represented ~10% of Webhooks API traffic.
- Copilot Metrics: ~9% of Copilot Insights dashboard requests returned 5xx.
- Copilot cloud agents: ~10% of cloud agent sessions were affected and failing.
- Octoshift: 0.88% of active repo migrations failed and 79% saw elevated durations (avg. 5.2 min) during this period.
- Git Operations: averaged 1.25% errors over the duration of the incident, with a peak of 2.07% errors.
- Actions: Workflow run status updates experienced delays of up to ~8s over the duration of the incident window.
Our DNS infrastructure in VA3 entered a degraded state and began intermittently returning NXDOMAIN responses and timing out on lookups for both internal service discovery and external endpoints. This caused a cascading impact across the dependent services listed above.
We identified a specific load pattern under which our DNS resolvers began failing. The evidence points to a recently introduced traffic-balancing mechanism, rolled out progressively to support our growth, as the root cause. We have since reverted this change.
We are immediately prioritizing investments in a more controlled rollout and validation process, including a dedicated environment to safely shadow production DNS traffic and detect these failure modes before they can affect production.
Apr 23, 5:30 PM UTC
Webhooks is operating normally.
Apr 23, 5:10 PM UTC
Many services are mitigated and are validating the remaining services.
Apr 23, 5:04 PM UTC
The degradation affecting Actions and Copilot has been mitigated. We are monitoring to ensure stability.
Apr 23, 5:03 PM UTC
We have identified the root problem and are working on mitigation.
Apr 23, 4:52 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
Apr 23, 4:34 PM UTC
We are investigating multiple unavailable services.
Apr 23, 4:19 PM UTC
We are investigating reports of degraded availability for Copilot and Webhooks
Apr 23, 4:12 PM UTC
On April 23, 2026 between 14:30 UTC and 15:18 UTC multiple services were degraded on github.com. During this time approximately 1.5% of all web requests resulted in a 5xx status and unicorn pages for github.com users. We also saw elevated error rates across Actions workflow runs, Copilot, Codespaces and Packages, leading to degraded experiences during this timeframe. Codespaces impact peaked at 45% failures for create requests and 65% failures for resume requests. Packages impact was mainly Maven related with 50% failure rates in downloads and 70% failure rates in uploads. Actions experienced a peak of 8% of failed jobs and up to 85% of jobs impacted by run start delays of more than 5 minutes.
This was due to a configuration change to an internal billing service that led to a cache being overwhelmed and causing requests to time out. These timeouts cascaded across multiple services and eventually caused requests to queue up and exhaust web request workers.
This configuration change was reverted at 14:42 UTC and following this, all services began to see recovery immediately.
To prevent this situation in the future, we are taking steps to ensure that failures and timeouts in the billing service don’t cascade to other services causing impact. This includes implementing more aggressive timeouts on callers of these billing services, adding circuit breaker configurations for cache timeouts and using more resilient cache options. We have also decreased max request timeouts within the billing service that caused impact and added more capacity to our cache to prevent traffic spikes from having the same impact.
Apr 23, 3:18 PM UTC
The degradation affecting Actions, Codespaces, Copilot and Packages has been mitigated. We are monitoring to ensure stability.
Apr 23, 3:02 PM UTC
A mitigation was applied and services have recovered. Actions is working through queued work before fully recovering.
Apr 23, 3:02 PM UTC
Users are experiencing errors loading various web pages on github.com. Actions and Copilot Cloud Agent runs will be delayed.
Apr 23, 2:51 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
Apr 23, 2:44 PM UTC
Codespaces is experiencing degraded performance. We are continuing to investigate.
Apr 23, 2:42 PM UTC
Packages is experiencing degraded performance. We are continuing to investigate.
Apr 23, 2:41 PM UTC
We are investigating reports of degraded performance for Actions
Apr 23, 2:40 PM UTC
On April 22, 2026, between 09:00 UTC and 22:05 UTC, the Copilot coding agent and pull request comment event processing were degraded. During this period, approximately 0.5% of total pull request and issue comments mentioned @copilot (~23,000 invocations), explicitly requested work from the Copilot coding agent but were not acted upon.
Creating, viewing, and replying to pull request comments was unaffected, and other Copilot
functionality continued to operate normally. The impact was limited to @copilot mentions on pull request comments not triggering Copilot coding agent runs, and to some downstream systems not receiving new pull request comment events during the impact window.
The cause was a serialization error that prevented pull request comment events from being published to downstream consumers, including the Copilot coding agent. This was related to the same class of issue as incident #4295 on April 20, affecting a another event type.
We mitigated the incident by deploying a fix that restored event publishing, after which the Copilot coding agent and other downstream consumers resumed processing pull request comment events normally.
We are working to complete our audit of related event schemas, migrate remaining consumers to use
the updated identifier fields, and improve monitoring to detect drops in publishing on critical event topics, to reduce our time to detection and mitigation of issues like this one in the future.
Apr 22, 10:43 PM UTC
We have identified the root cause of the disruption affecting Copilot Coding Agent and Issues. A fix is being deployed.
Apr 22, 10:09 PM UTC
We have identified the root cause of the disruption affecting Copilot Coding Agent and Issues. Copilot @-mentions on pull requests are not being processed, and some issue-related functionality may be degraded. A fix has been developed and is being applied.
Apr 22, 8:37 PM UTC
Copilot @-mentions on pull requests are currently not being processed by Copilot Cloud Agent. We have found the issue and are investigating remediations.
Apr 22, 8:02 PM UTC
Issues is experiencing degraded performance. We are continuing to investigate.
Apr 22, 7:55 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 22, 7:53 PM UTC
On April 22, 2026, between 15:16 UTC and 19:18 UTC, users experienced errors when interacting with Copilot Chat on github.com and Copilot Cloud Agent. During this time, users were unable to use Copilot Chat or Copilot Cloud Agent. Copilot Memory (in preview) was not available to Copilot agent sessions during this time. The issue was caused by an infrastructure configuration change that resulted in connectivity issues with our databases. The team identified the cause and restored connectivity to the database. Copilot Chat and Cloud Agent for github.com were restored by 18:16 UTC. Remaining regional deployments were restored incrementally, with full resolution at 19:18 UTC. We have taken steps to prevent similar infrastructure changes from causing these kinds of database operations in the future.
Apr 22, 7:18 PM UTC
Copilot cloud agent and chat are mitigated for github.com.
Apr 22, 6:05 PM UTC
We are now seeing recovery for Copilot cloud agent.
Apr 22, 5:49 PM UTC
Mitigation is progressing for Copilot chat and cloud agent recovery.
Apr 22, 5:40 PM UTC
Mitigation is progressing for Copilot chat and cloud agent.
Apr 22, 4:58 PM UTC
We continue to work on mitigation for Copilot chat and cloud agent.
Apr 22, 4:24 PM UTC
We are aware of users seeing errors interacting with Copilot chat on github.com and Copilot cloud agent. We have identified the cause and are investigating remediations.
Apr 22, 3:43 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 22, 3:35 PM UTC
On April 21, 2026, between 13:35 UTC and 01:24 UTC the following day the projects service was degraded. During this time period, projects may have been out of sync and users may have experienced delays in changes to projects and their items. Delays in reflected changes peaked at approximately 45 minutes. The delays were caused by serialization errors that failed events and triggered a flood of resyncs, overloading our event processing layers.
We mitigated the incident by speeding up processing time for incoming changes and otherwise waiting for all changes to be processed.
We are working to increase our capacity for processing updates to projects to reduce our time to mitigation of issues like this one in the future.
Apr 22, 1:24 AM UTC
The issue remains mitigated. Users may still experience small delays in changes to projects while we process the backlog of events. We expect a full recovery in approximately two hours.
Apr 22, 12:00 AM UTC
The issue remains mitigated. Users may still experience delays in changes to projects while we process the backlog of events. We expect a full recovery in approximately three hours.
Apr 21, 10:49 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 21, 9:15 PM UTC
Recovery from the delays affecting GitHub Projects continues to progress. We have deployed additional mitigations that are accelerating processing of the backlog. Users may still experience delays where changes to projects are not reflected immediately. We expect full recovery within approximately six hours.
Apr 21, 7:41 PM UTC
The queues are continuing to decrease and we are working to accelerate the rate of processing through the queues.
Apr 21, 5:21 PM UTC
The mitigation is deployed and we are seeing recorvery in the queues and will provide an update as to when full recovery will be realized.
Apr 21, 4:45 PM UTC
We are deploying a fix to relieve the queue of delayed data. Some users may still experience delays with GitHub Projects where changes are not reflected immediately as remaining backlogs are processed.
Apr 21, 4:18 PM UTC
We continue to investigate delays with GitHub Projects where changes may not be reflected immediately. Our team has identified the cause and applied mitigations to address the issue. We are seeing initial signs of recovery, though some delays may persist as the system works through a backlog of pending updates.
Apr 21, 3:42 PM UTC
We are investigating reports of delays with GitHub Projects. Users may notice that changes made to projects are not reflected immediately. Our team has identified the source of the delays and is actively working to resolve the issue.
Apr 21, 3:07 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 21, 3:03 PM UTC
On April 20, 2026 between 10:28 UTC and 15:04 UTC GitHub experienced degraded service for code scanning default setup, code quality, and project boards. Repair of affected project boards additionally lasted until April 21, 05:04 UTC
During this time, code scanning default setup and code quality analyses were not triggered on newly opened pull requests. Additionally, newly created issues were not appearing on project boards.
The cause was a serialization error that prevented proper triggering of code scanning, code quality analyses, and project board updates.
We mitigated the issue by deploying a fix, restoring event publishing for code scanning and code quality. For project boards, an additional code change was deployed to update event consumers, followed by a reindex of affected project items.
We are working to prevent recurrence by strengthening our schema validations and improving monitoring for drops in publishing on critical hydro topics.
Apr 21, 5:04 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 21, 4:18 AM UTC
The issue remains mitigated. Issues that were linked to projects during the incident may take approximately three more hours to render correctly while we complete a re-index.
Apr 21, 3:10 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 20, 9:36 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 20, 6:21 PM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 20, 6:20 PM UTC
The issue has been mitigated. Newly created issues linked to projects should now function as expected. Issues that were linked to projects during the incident may take approximately five hours to render correctly while we complete a re-index.
Apr 20, 6:20 PM UTC
A deployment to fix this issue of new issues not showing up in projects is underway.
Apr 20, 6:08 PM UTC
We continue to work on mitigation regarding new issues not showing on project boards.
Apr 20, 5:32 PM UTC
We continue to work on mitigation regarding new issues not showing on project boards.
Apr 20, 4:48 PM UTC
Code scanning default setup and Code Quality triggers are back up and running. PRs not processed before or during this incident will require a new push to trigger code scanning or code quality analysis.
We are seeing problems with new issues not showing on project boards and are working on mitigation.
Apr 20, 4:16 PM UTC
We are continuing to work on a mitigation to unblock code scanning default setup and code quality features on pull requests.
Apr 20, 3:20 PM UTC
We are currently deploying mitigations that should unblock code scanning default setup and code quality features on pull requests.
Apr 20, 2:38 PM UTC
We are actively working to mitigate an issue affecting code scanning default setup and code quality features on pull requests. Users may experience pull request code scanning and code quality analyses not being triggered on new pull requests. Our engineering team has identified the root cause and working on mitigating the issue.
Apr 20, 1:57 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 20, 1:28 PM UTC
On April 17, 2026, between 14:46 UTC and 15:12 UTC, users experienced a degraded web experience on GitHub.com. During this time, approximately 1.5% of web requests resulted in errors, with some users encountering slow page loads or failed requests. The issue was caused by capacity saturation of a caching component in one of our data center regions. We mitigated the issue by redirecting traffic to an unaffected region and rolling back a recent deployment. The incident was fully resolved at 15:18 UTC. We are taking steps to provide appropriate capacity for this caching path to prevent recurrence.
Apr 17, 3:18 PM UTC
The degradation affecting Issues has been mitigated. We are monitoring to ensure stability.
Apr 17, 3:18 PM UTC
We have isolated a problematic component in our infrastructure and are working to mitigate. We will continue to post updates as we work toward resolution.
Apr 17, 3:08 PM UTC
We are experiencing an issue that impacts approximately 10% of traffic to the web, resulting in slow and failed calls. We are investigating and will continue to post updates as we work toward mitigation.
Apr 17, 2:57 PM UTC
Issues is experiencing degraded performance. We are continuing to investigate.
Apr 17, 2:56 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 17, 2:56 PM UTC
On April 16, 2026 between 09:30 UTC and 17:15 UTC, users experienced failures when attempting to connect to GitHub Codespaces via the VS Code editor. During this time, approximately 40% of codespace start operations failed. Users connecting via SSH were not impacted.
The issue was caused by a failure in an upstream download service that prevented the VS Code Server from being retrieved during codespace startup. The impact was mitigated by implementing a workaround to use an alternative download path when the primary endpoint is degraded.
We are working with the upstream dependency to address the root cause of the download service failure, and we are improving our fallback mechanisms to reduce the impact of similar upstream failures in the future.
Apr 16, 6:28 PM UTC
The degradation affecting Codespaces has been mitigated. We are monitoring to ensure stability.
Apr 16, 6:22 PM UTC
Our provider is implementing a mitigation and we are seeing signs of recovery.
Apr 16, 4:37 PM UTC
We found an issue that impacts 70% of Codespaces. We are engaged with the provider and working towards mitigation.
Apr 16, 3:49 PM UTC
Codespaces is experiencing degraded availability. We are continuing to investigate.
Apr 16, 3:41 PM UTC
We are experiencing degraded performance in Codespaces related to creating a new Codespace or starting an existing Codespace from the VS Code editor. SSH connections to Codespaces are not impacted. We are working toward mitigation and will continue to keep you updated on progress.
Apr 16, 3:08 PM UTC
We are investigating reports of degraded performance for Codespaces
Apr 16, 3:06 PM UTC
On April 14, between 00:58 UTC and 06:08 UTC, GitHub Enterprise Cloud customers experienced 500 errors when attempting to access Copilot Insights pages which was caused by an authentication failure in our metrics pipeline. We fully mitigated the issue and validated the fix in production. Approximately 709 users were impacted. The total impact duration was approximately 5 hours and 10 minutes.
Our investigation determined the incident was caused by a change in a tenant credential which caused authentication errors to retrieve the required data needed on our Copilot Insights pages.
We understand this disruption impacted customers' ability to access the Copilot Insights page. To prevent similar issues and reduce resolution time in the future, we are investing in improved diagnostics tooling to quickly identify the root cause of failures, enhanced monitoring, and alerting to detect issues at a more granular level.
GitHub is a critical infrastructure for your work, your teams, and your businesses. We are focused on these remediations and continued reliability improvements for Copilot Insights and related metrics experiences.
Apr 14, 6:08 AM UTC
This incident has been resolved. We will continue to monitor to ensure stability. Thank you for your patience and understanding as we addressed this issue.
Apr 14, 6:07 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 14, 6:07 AM UTC
We identified an issue that impacts the Copilot Dashboard on the Insights tab and are working on mitigation. We will continue to keep you updated on progress.
Apr 14, 4:40 AM UTC
The team continues to investigate issues accessing with Copilot Dashboard on the Insights tab. We will continue providing updates on the progress towards mitigation.
Apr 14, 3:47 AM UTC
The Copilot Dashboard on the Insights tab is not accessible and we are continuing to investigate.
Apr 14, 2:40 AM UTC
Degradation of Service - Insights Page
Apr 14, 2:37 AM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 14, 1:57 AM UTC
On Sunday April 13th, 2026, between 18:53 UTC and 20:30 UTC, the GitHub Pages service experienced elevated error rates. On average, the error rate was 10.58% and peaked at 12.77% of requests to the service, resulting in approximately 17.5 million failed requests returning HTTP 500 errors. This was due to an automated DNS management tool (octodns) erroneously deleting a DNS record for a Pages backend storage host after its upstream data source intermittently failed to return the record, causing the tool to treat it as stale and remove it.
We mitigated the incident by re-creating the deleted DNS record. To prevent future incidents, we are implementing availability-zone-tolerant routing in the Pages frontend so that an unresolvable backend host triggers failover to healthy hosts rather than returning errors, adding safeguards to prevent automated deletion of DNS records owned by other systems, and improving logging and alerting for DNS resolution failures in the Pages serving path.
Apr 13, 8:35 PM UTC
We have mitigated the issue with Pages.
Apr 13, 8:32 PM UTC
The degradation affecting Pages has been mitigated. We are monitoring to ensure stability.
Apr 13, 8:30 PM UTC
We are investigating reports of issues with Pages. We will continue to keep users updated on progress towards mitigation.
Apr 13, 7:57 PM UTC
We are investigating reports of degraded availability for Pages
Apr 13, 7:56 PM UTC
On April 13, 2026, between 14:41 UTC and 17:29 UTC, the Copilot service experienced degraded performance. All Copilot users were impacted by increased latency, and approximately 20% experienced request failures when interacting with Copilot Cloud Agent (CCA). On average, request latency increased to approximately 950ms. The GitHub User Dashboard also displayed intermittent errors loading Copilot quota information. CCA and the User Dashboard were impacted for approximately 2 hours and 56 minutes.
This was due to an infrastructure change that reduced the available compute capacity for a backend service responsible for Copilot rate limiting and quota management. The reduced capacity caused resource exhaustion under normal traffic load, leading to cascading failures in downstream request processing.
We mitigated the incident by increasing compute resources allocated to the affected service and scaling out the number of service instances to distribute load more effectively.
We are working to improve proactive capacity monitoring to detect resource degradation before it impacts users, reviewing retry and timeout configurations across dependent services to reduce amplification during degraded states, and evaluating connection management strategies to improve resilience under constrained resources.
Apr 13, 5:40 PM UTC
We have identified the root cause and are rolling out a fix for Copilot. The services should now be in recovery, with expected full recovery in 5 to 10 minutes.
Apr 13, 4:59 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 13, 4:41 PM UTC
On April 9, 2026, between 22:59 UTC and April 10, 2026, 13:24 UTC, the Copilot Mission Control service was degraded and did not display Claude and Codex Cloud Agent sessions in the agents tab dashboard. Customers were unable to see, list, or manage their third party agent sessions during this period. The underlying agent sessions continued to function normally. This was a visibility and management issue only, and no HTTP errors were generated. The API returned successful responses with incomplete results, with an average error rate of 0% and a maximum error rate of 0%. This was due to a code change that introduced a filter which inadvertently excluded third party agent sessions.
We mitigated the incident by reverting the problematic code change and deploying the fix to production.
We are working to add automated monitoring for dashboard content visibility and improve integration test coverage for third party agent session listing to reduce our time to detection and mitigation of issues like this one in the future.
Apr 10, 1:28 PM UTC
We are investigating third party Claude and Codex Cloud Agent sessions not being listed in the agents tab dashboard.
Apr 10, 1:08 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 10, 1:07 PM UTC
On April 9, 2026, between 16:05 UTC and 20:36 UTC, the Copilot cloud agent service was degraded, causing new agent sessions to be delayed or fail to start. Users who attempted to start Copilot cloud agent sessions during this period experienced jobs getting stuck in the queue, with wait times peaking at 54 minutes compared to the normal 15–40 seconds. On average, approximately 84% of requests to start agent sessions failed, peaking at 97.5% during the worst period.
This was due to an internal service exceeding API rate limits, compounded by a caching bug that persisted the rate-limited state beyond the actual rate limit window, causing recurring outage waves rather than a single recovery.
We mitigated the incident by deploying a configuration change to bypass the affected cache and shifting API traffic to an alternative authentication path that reduced rate limit exposure. We have since added automated monitoring and alerting for this failure mode, deployed per-endpoint rate limit controls, and added caching for high-traffic API calls to reduce overall load. We are also working on longer-term improvements to rate limit isolation and traffic management to prevent similar issues in the future.
This incident shared the same underlying root causes with an incident declared in the time frame https://www.githubstatus.com/incidents/zn1t56bfxdzg
Apr 09, 8:36 PM UTC
We continue to investigate periodic delays in Copilot Cloud Agent job processing
Apr 09, 7:52 PM UTC
We are continuing to investigate Copilot Cloud Agent job delays
Apr 09, 6:57 PM UTC
Copilot Cloud Agent jobs are being processed and we are monitoring recovery
Apr 09, 5:48 PM UTC
We are investigating delays processing Copilot Cloud Agent jobs
Apr 09, 4:57 PM UTC
We are experiencing issues where jobs are being delayed to start for copilot coding agent
Apr 09, 4:20 PM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 09, 4:20 PM UTC
On April 9, 2026, between 09:05 UTC and 19:05 UTC, the Copilot coding agent service was degraded and users experienced significant delays starting new agent sessions. Approximately 84% of new agent session requests were delayed across four separate outage waves, with queue wait times peaking at 54 minutes compared to a normal baseline of 15–40 seconds. On average, the error rate was 83.9% and peaked at 97.5% of requests to the service. Approximately 22,700 workflow creations were delayed or failed during the incident.
This was due to a bug in our rate limiting logic that incorrectly applied a rate limit globally across all users, rather than scoping it to the individual installation that triggered the limit. A contributing factor was a surge in API traffic from a client update that increased requests to an internal endpoint by 3–4x, which accelerated rate limit exhaustion.
We mitigated the incident by disabling the faulty rate limit caching mechanism via feature flag and updating our service to use per-installation credentials for API calls, ensuring rate limits are correctly scoped to individual installations.
We have since added automated monitoring and alerting to detect this failure mode proactively, deployed fixes to reduce unnecessary API traffic through caching improvements, and are continuing work to further isolate rate limit scoping across client types to prevent similar issues in the future.
This incident shared the same underlying root causes with an incident declared in the time frame https://www.githubstatus.com/incidents/2rqwxl8y7m0j
Apr 09, 10:15 AM UTC
The degradation has been mitigated. We are monitoring to ensure stability.
Apr 09, 10:15 AM UTC
We are investigating an issue affecting GitHub Copilot coding agent. Users may experience significant delays when starting new agent sessions, with jobs remaining queued longer than expected. Our team has identified increased load as a contributing factor and is actively working to restore normal performance.
Apr 09, 9:57 AM UTC
We are investigating reports of impacted performance for some GitHub services.
Apr 09, 9:50 AM UTC