This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jul 07, 10:34 PM UTC
We are investigating reports of Copilot Coding Agent service degraded performance
Jul 07, 10:31 PM UTC
We are currently investigating this issue.
Jul 07, 10:29 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jul 07, 10:22 PM UTC
We are investigating reports of degraded performance for larger runners.
Jul 07, 10:07 PM UTC
We are currently investigating this issue.
Jul 07, 10:03 PM UTC
On 7/3/2025, between 3:22 AM and 7:12 AM UTC, customers were prevented from SSO authorizing Personal Access Tokens and SSH keys via the GitHub UI. Approximately 1300 users were impacted.
A code change modified the content type of the response returned by the server, causing a lazily-loaded dropdown to fail to render, prohibiting the user from proceeding to authorize. No authorization systems were impacted during the incident, only the UI component. We mitigated the incident by reverting the code change that introduced the problem.
We are making improvements to our release process and test coverage to catch this class of error earlier in our deployment pipeline. Further, we are improving monitoring to reduce our time to detection and mitigation of issues like this one in the future.
Jul 03, 7:12 AM UTC
The rollback has been deployed successfully on all environments. Customers should now be able to SSO authorize their Classic Personal Access Tokens and SSH keys on their GitHub organizations.
Jul 03, 7:11 AM UTC
The root cause for the rendering bug that prevented customers from SSO authorizing Personal Access Tokens and SSH keys has started rolling out. We are continuously monitoring this rollback.
Jul 03, 6:46 AM UTC
We have identified the root cause for the rendering bug that prevented customers from SSO authorizing Personal Access Tokens and SSH keys.The changes that caused the issue are being rolled back.
Jul 03, 6:07 AM UTC
We are investigating an issue with SSO authorizing Classic Personal Access Tokens and SSH keys.
Jul 03, 5:45 AM UTC
We are currently investigating this issue.
Jul 03, 5:39 AM UTC
On July 2, 2025, between 1:35 AM UTC and 16:23 UTC, the GitHub Enterprise Importer (GEI) migration service experienced degraded performance and slower-than-normal migration queue processing times. This incident was triggered due to a migration including an abnormally large number of repositories, overwhelming the queue and slowing processing for all migrations.
We mitigated the incident by removing the problematic migrations from the queue. Service was restored to normal operation as the queue volume was reduced.
To ensure system stability, we have introduced additional concurrency controls that limit the number of queued repositories per organization migration, helping to prevent similar incidents in the future.
Jul 02, 4:23 PM UTC
We're down to healthy level of queued migrations and the system is processing migrations at normal system concurrency levels.
Jul 02, 4:23 PM UTC
Repository migrations are experiencing delayed processing times. Mitigation has been implemented and migration times are recovering.
Jul 02, 4:14 PM UTC
We are currently investigating this issue.
Jul 02, 4:09 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jul 02, 10:16 AM UTC
We are no longer experiencing degradation—Claude Sonnet 4 is once again available in Copilot Chat and across IDE integrations.
We will continue monitoring to ensure stability, but mitigation is complete.
Jul 02, 10:16 AM UTC
We are experiencing degraded availability for the Claude Sonnet 4 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue. Other models are available and working as expected.
Jul 02, 9:55 AM UTC
We are currently investigating this issue.
Jul 02, 9:54 AM UTC
On June 30th, 2025, between approximately 18:20 and 19:55 UTC, the Copilot service experienced a degradation of the Claude Sonnet 3.7 model due to an issue with our upstream provider. Users encountered elevated error rates when using Claude Sonnet 3.7. No other models were impacted.
The issue was resolved by a mitigation put in place by our provider. GitHub is working with our provider to further improve the resiliency of the service to prevent similar incidents in the future.
Jun 30, 7:55 PM UTC
The issues with our upstream model provider have been resolved, and Claude Sonnet 3.7 is once again available in Copilot Chat and across IDE integrations [VSCode, Visual Studio, JetBrains].
We will continue monitoring to ensure stability, but mitigation is complete.
Jun 30, 7:55 PM UTC
We are experiencing degraded availability for the Claude 3.7 Sonnet model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
Other models are available and working as expected.
Jun 30, 7:14 PM UTC
We are currently investigating this issue.
Jun 30, 7:13 PM UTC
Due to a degradation of one instance of our internal message delivery service, a percentage of jobs started between 06/30/2025 19:18 UTC and 06/30/2025 19:50 UTC failed, and are no longer retry-able. Runners assigned to these jobs will automatically recover within 24 hours, but deleting and recreating the runner will free up the runner immediately.
Jul 01, 12:21 AM UTC
On June 26, 2025, between 17:10 UTC and 23:30 UTC, around 40% of attempts to create a repository from a template repository failed. The failures were an unexpected result of a gap in testing and observability.
We mitigated the incident by rolling back the deployment.
We are working to improve our testing and automatic detection of errors associated with failed template repository creation.
Jun 26, 11:33 PM UTC
We identified an internal change that was causing errors when creating a repository from a template. This change has now been rolled back, and customers should no longer encounter errors when creating repositories from templates.
Jun 26, 11:32 PM UTC
Users may experience errors when creating a repository from a template. The error message may prompt the user to delete the repository, however this deletion attempt will not be successful. We are investigating the cause of these errors.
Jun 26, 11:05 PM UTC
We are currently investigating this issue.
Jun 26, 11:05 PM UTC
On June 26th, between 14:42UTC and 18:05UTC, the GitHub Enterprise Importer (GEI) service was in a degraded state, during which time, customers of the service experienced extended repository migration durations.
Our investigation found that the combined effect of several database updates resulted in the severe throttling of GEI to preserve overall database health.
We have taken steps to prevent additional impact and are working to implement additional safeguards to prevent similar incidents from occurring in the future.
Jun 26, 6:05 PM UTC
The earlier delays affecting GitHub Enterprise Importer queries and jobs have now been resolved and are operating normally.
Thank you for your patience while we investigated and addressed the issue.
Jun 26, 6:04 PM UTC
We're continuing to investigate delays with GitHub Enterprise importer, and are investigating potential delays with queries and jobs.
Next update in 60 minutes.
Jun 26, 4:51 PM UTC
We're continuing to investigate delays with GitHub Enterprise importer, and are investigating potential delays with infrastructure.
Next update in 60 minutes.
Jun 26, 3:19 PM UTC
GitHub Enterprise Importer is experiencing degraded throughput, resulting in significant slowdowns in migration processes and extended wait times for customers.
Jun 26, 2:43 PM UTC
We are currently investigating this issue.
Jun 26, 2:42 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Jun 24, 12:26 PM UTC
We have identified that the navigation bar is missing in GitHub Enterprise Cloud with data residency instances for the repositories related pages and are currently attempting a mitigation.
Jun 24, 11:00 AM UTC
We are currently investigating this issue.
Jun 24, 10:55 AM UTC
Between June 19th, 2025 11:35 UTC and June 20th, 2025 11:20 UTC the GitHub Mobile Android application was unable to login new users. The iOS app was unaffected.
This was due to a new GitHub App feature being tested internally, which was inadvertently enforced for all GitHub-owned applications, including GitHub Mobile.
A mismatch in client and server expectations due to this feature caused logins to fail. We mitigated the incident by disabling the feature flag controlling the feature.
We are working to improve our time to detection and put in place stronger guardrails that reduce impact from internal testing on applications used by all customers.
Jun 20, 11:20 AM UTC
We are investigating reports that some users are unable to sign in to the GitHub app on Android. Normal functionality is otherwise available. Our team is actively working to identify the cause.
Jun 20, 10:53 AM UTC
We are currently investigating this issue.
Jun 20, 10:49 AM UTC
On June 18, 2025 between 22:20 UTC and 23:00 UTC the Claude Sonnet 3.7 and Claude Sonnet 4 models for GitHub Copilot Chat experienced degraded performance. During the impact, some users would receive an immediate error when making a request to a Claude model. This was due to upstream errors with one of our model providers, which have since been resolved. We mitigated the impact by disabling the affected provider endpoints to reduce user impact, redirecting Claude Sonnet requests to additional partners.
We are working to update our incident response playbooks for infrastructure provider outages and improve our monitoring and alerting systems to reduce our time to detection and mitigation of issues like this one in the future.
Jun 18, 11:13 PM UTC
We are experiencing degraded availability for the Claude 4 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
Other models are available and working as expected. We recommend using Claude 3.7 as an alternative.
Jun 18, 10:42 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
Jun 18, 10:40 PM UTC
We are currently investigating this issue.
Jun 18, 10:39 PM UTC
On June 18, 2025, between 08:21 UTC and 18:47 UTC, some Actions jobs experienced intermittent failures downloading from the Actions Cache service. During the incident, 17% of workflow runs experienced cache download failures, resulting in a warning message in the logs and performance degradation. The disruption was caused by a network issue in our database systems that led to a database replica getting out of sync with the primary. We mitigated the incident by routing cache download url requests to bypass the out-of-sync replica until it was fully restored.
To prevent this class of incidents, we are developing capability in our database system to more robustly bypass out-of-sync replicas. We are also implementing improved monitoring to help us detect similar issues more quickly going forward.
Jun 18, 6:47 PM UTC
We are continuing to rollout a mitigation and are progressing towards having this rolled out for all customers.
Jun 18, 6:11 PM UTC
We are currently deploying a mitigation for this issue and will be rolling it out shortly. We will update our progress as we monitor the deployment.
Jun 18, 5:22 PM UTC
We are actively investigating and working on a mitigation for database instability leading to replication lag in the Actions Cache service. We will continue to post updates on progress towards mitigation.
Jun 18, 5:03 PM UTC
The actions cache service is experiencing degradation in a number of regions causing cache misses when attempting to download cache entries. This is not causing workflow failures, but workflow runtime might be elevated for certain runs.
Jun 18, 4:46 PM UTC
We are currently investigating this issue.
Jun 18, 4:46 PM UTC
On June 18, 2025, between 15:15 UTC and 19:29 UTC, the Issues service was degraded, and certain GraphQL queries accessing the `ReactionGroup.reactors` field returned errors. Our query routing infrastructure was impacted by exceptions from a particular database migration, resulting in errors for an average of 0.0097% of overall GraphQL requests (peaking at 0.02%).
We mitigated the incident by reverting the migration.
We continue to investigate the cause of the exceptions and are holding off on similar migrations until the underlying issue is understood and resolved.
Jun 18, 5:42 PM UTC
We have confirmed that we are currently within SLA for Issues experience. Remaining clean up will complete over the next few hours to fully restore the ability to search Issues by reaction as well as related GraphQL API queries.
Jun 18, 5:41 PM UTC
We have confirmed that impact is restricted to failing to display reactions on some issues and searching issues by reaction. Mitigation is in progress to restore these features and should be fully rolled out to all customers in the next few hours.
Jun 18, 5:07 PM UTC
Some users are seeing errors when accessing issues on GitHub. We have identified the problem and are working on a revert to restore full functionality.
Jun 18, 4:25 PM UTC
We are investigating reports of degraded performance for Issues
Jun 18, 4:21 PM UTC
On June 17, 2025, between 19:32 UTC and 20:03 UTC, an internal routing policy deployment to a subset of network devices caused reachability issues for certain network address blocks within our datacenters.
Authenticated users of the github.com UI experienced 3-4% error rates for the duration. Authenticated callers of the API experienced 40% error rates. Unauthenticated requests to the UI and API experienced nearly 100% error rates for the duration. Actions service experienced 2.5% of runs being delayed for an average of 8 minutes and 3% of runs failing. Large File Storage (LFS) requests experienced 0.978% errors.
At 19:54 UTC, the deployment was rolled back, and network availability for the affected systems was restored. At 20:03 UTC, we fully restored normal operations.
To prevent similar issues, we are expanding our validation process for routing policy changes.
Jun 17, 8:22 PM UTC
Actions is operating normally.
Jun 17, 8:15 PM UTC
Codespaces is experiencing degraded performance. We are continuing to investigate.
Jun 17, 8:14 PM UTC
Webhooks is operating normally.
Jun 17, 8:13 PM UTC
Pull Requests is operating normally.
Jun 17, 8:12 PM UTC
API Requests is operating normally.
Jun 17, 8:10 PM UTC
Issues is operating normally.
Jun 17, 8:06 PM UTC
API Requests is experiencing degraded performance. We are continuing to investigate.
Jun 17, 8:05 PM UTC
We experienced problems with multiple services, causing disruptions for some users. We have identified the cause and are rolling out changes to restore normal service. Many services are recovering, but full recovery is ongoing.
Jun 17, 8:04 PM UTC
Copilot is operating normally.
Jun 17, 8:04 PM UTC
Pages is operating normally.
Jun 17, 8:03 PM UTC
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Jun 17, 8:01 PM UTC
Pull Requests is experiencing degraded availability. We are continuing to investigate.
Jun 17, 7:55 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
Jun 17, 7:55 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
Jun 17, 7:54 PM UTC
Webhooks is experiencing degraded performance. We are continuing to investigate.
Jun 17, 7:54 PM UTC
API Requests is experiencing degraded availability. We are continuing to investigate.
Jun 17, 7:53 PM UTC
We are investigating reports of issues with many services impacting segments of customers. We will continue to keep users updated on progress towards mitigation.
Jun 17, 7:53 PM UTC
API Requests is experiencing degraded performance. We are continuing to investigate.
Jun 17, 7:51 PM UTC
Pages is experiencing degraded performance. We are continuing to investigate.
Jun 17, 7:49 PM UTC
Copilot is experiencing degraded availability. We are continuing to investigate.
Jun 17, 7:49 PM UTC
Issues is experiencing degraded performance. We are continuing to investigate.
Jun 17, 7:47 PM UTC
We are investigating reports of degraded performance for Copilot
Jun 17, 7:42 PM UTC
On June 12, 2025, between 17:55 UTC and 21:07 UTC the GitHub Copilot service was degraded and experienced unavailability for Gemini models and reduced availability for Claude models. Users experienced significantly elevated error rates for code completions, slow response times, timeouts, and chat functionality interruptions across VS Code, JetBrains IDEs, and GitHub Copilot Chat. This was due to an outage affecting one of our model providers.
We mitigated the incident by temporarily disabling the affected provider endpoints to reduce user impact.
We are working to update our incident response playbooks for infrastructure provider outages and improve our monitoring and alerting systems to reduce our time to detection and mitigation of issues like this one in the future.
Jun 12, 9:07 PM UTCAll impacted chat models have recovered, and users should no longer experience reduced availability.
Jun 12, 9:07 PM UTC
We are seeing recovery in success rates for impacted Claude models (Sonnet 4 and Opus 4), and limited recovery in Gemini models (2.5. Pro and 2.0 Flash). We will continue to monitor and provide updates until full recovery.
Jun 12, 8:39 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
Jun 12, 8:21 PM UTC
Claude Sonnet 4 and Opus 4 models continue to have degraded availability in Copilot Chat, VS Code, and other Copilot products. Gemini Pro 2.5 and 2.0 Flash are currently unavailable. Our upstream model provider has indicated that they have identified the problem and are applying mitigations.
Jun 12, 8:05 PM UTC
Gemini (2.5 Pro and 2.0 Flash) and Claude (Sonnet 4 and Opus 4) chat models in Copilot are still experiencing reduced availability. We are actively communicating with our upstream model provider to resolve the issue and restore full service. We will provide another update by 20:15 UTC.
Jun 12, 7:14 PM UTC
We redirected requests for Claude 3.7 Sonnet to additional partners and users should see recovery when using that model. We still are experiencing degraded availability for the Gemini (2.5 Pro, 2.0 Flash) and Claude (Sonnet 4, Opus 4) models in Copilot Chat, VS Code and other Copilot products.
Jun 12, 6:37 PM UTC
We are experiencing degraded availability for the Gemini (2.5 Pro, 2.0 Flash) and Claude (Sonnet 3.7, Sonnet 4, Opus 4) models in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
Other models are available and working as expected.
Jun 12, 6:23 PM UTC
We are currently investigating this issue.
Jun 12, 6:19 PM UTC
Multiple services critical to GitHub's attestation infrastructure experienced an outage which prevented Fulcio from issuing signing certificates. During the outage, GitHub customers who use the "actions/attest-build-provenance" action from public repositories were not able to generate attestations.
Jun 12, 8:26 PM UTC
Customers are currently unable to generate attestations from public repositories due to a broader outage with our partners.
Jun 12, 6:56 PM UTC
We are investigating reports of degraded performance for Actions
Jun 12, 6:50 PM UTC
Between 2025-06-10 12:25 UTC and 2025-06-11 01:51 UTC, GitHub Enterprise Cloud (GHEC) customers with approximately 10,000 or more users, saw performance degradation and 5xx errors when loading the Enterprise Settings’ People management page. Less than 2% of page requests resulted in an error. The issue was caused by a database change that replaced an index required for the page load. The issue was resolved by reverting the database change.
To prevent similar incidents, we are improving the testing and validation process for replacing database indexes.
Jun 11, 1:51 AM UTC
Fix is currently rolling out to production. We will update here once we verify.
Jun 11, 1:08 AM UTC
We are working to deploy the fix for this issue. We will update again once it is deployed and as we monitor recovery.
Jun 10, 11:32 PM UTC
We have the fix ready, once it's ready to deploy we will provide another update confirming that it has resolved the issue.
Jun 10, 10:42 PM UTC
We have identified the solution to the performance issue and are working on the mitigation. Impact continues to be limited to very large enterprise customers when viewing the People page.
Jun 10, 9:04 PM UTC
The mitigation to add a supporting index to improve the performance of the People page did not resolve the issue, and we are continuing to investigate a solution.
Jun 10, 8:09 PM UTC
We are working on the mitigation and anticpate recovery within an hour.
Jun 10, 6:57 PM UTC
Large enterprise customers may encounter issues loading the People page
Jun 10, 6:35 PM UTC
We are currently investigating this issue.
Jun 10, 6:17 PM UTC
On June 10, 2025, between 12:15 UTC and 19:04 UTC, Codespaces billing data processing experienced delays due to capacity issues in our worker pool. Approximately 57% of codespaces were affected during this incident, during which some customers may have observed incomplete or delayed billing usage information in their dashboards and usage reports, and may not have received timely notifications about approaching usage or spending limits.
The incident was caused by an increase in the number of jobs in our worker pool without a corresponding increase in capacity, resulting in a backlog of unprocessed Codespaces billing jobs.
We mitigated the issue by scaling up worker capacity, allowing the backlog to clear and billing data to catch up. We started seeing recovery immediately at 17:40 UTC and were fully caught up by 19:04 UTC.
To prevent recurrence, we are moving critical billing jobs into a dedicated worker pool monitored by the Codespaces team, and are reviewing alerting thresholds to ensure more rapid detection and mitigation of delays in the future.
Jun 10, 7:08 PM UTC
We've increased capacity to process the codespaces billing jobs and see are seeing recovery, we expect a full mitigation within the hour.
Jun 10, 6:21 PM UTC
We are currently investigating this issue.
Jun 10, 5:47 PM UTC
On June 10, 2025, between 14:28 UTC and 14:45 UTC the pull request service experienced a period of degraded performance, resulting in merge error rates exceeding 1%. The root cause was an overloaded host in our Git infrastructure.
We mitigated the incident by removing this host from the actual set of valid replicas until the host was healthy again.
We are working to improve the various mechanisms that are in place in our existing infrastructure to protect us from such problems, and we will be revisiting why in this particular scenario they didn't protect us as expected.
Jun 10, 2:46 PM UTC
We are investigating reports of degraded performance for Pull Requests
Jun 10, 2:28 PM UTC
On June 6, 2025, an update to mitigate a previous incident led to automated scaling of database infrastructure used by Copilot Coding Agent. The clients of the service were not implemented to automatically handle an extra partition. Hence it was unable to retrieve data across partitions, resulting in unexpected 404 errors.
As a result, approximately 17% of coding sessions displayed an incorrect final state - such as sessions appearing in-progress when they were actually completed. Additionally, some Copilot-authored pull requests were missing timeline events indicating task completion. Importantly, this did not affect Copilot Coding Agent’s ability to finish code tasks and submit pull requests.
To prevent similar issues in the future we are taking steps to improve our systems and monitoring.
Jun 10, 6:23 PM UTC
On June 6, 2025, between 00:21 UTC to 12:40 UTC the Copilot service was degraded and a subset of Copilot Free users were unable to sign up for or use the Copilot Free service on github.com. This was due to a change in licensing code that resulted in some users losing access despite being eligible for Copilot Free.
We mitigated this through a rollback of the offending change at 11:39 AM UTC. This resulted in users once again being able to utilize their Copilot Free access.
As a result of this incident, we have improved monitoring of Copilot changes during rollout. We are also working to reduce our time to detect and mitigate issues like this one in the future.
Jun 06, 12:40 PM UTC
Copilot is operating normally.
Jun 06, 12:40 PM UTC
We are continuing to monitor recovery and expect a complete resolution very shortly.
Jun 06, 12:18 PM UTC
The changes have been reverted and we are seeing signs of recovery. We expect impact to be largely mitigated, but are continuing to monitor and will update further as progress continues.
Jun 06, 11:31 AM UTC
We have identified changes that may be causing the issue and are working to revert the offending changes. We will continue to keep users updated as we work toward mitigation.
Jun 06, 10:39 AM UTC
We are investigating reports of users unable to utilize Copilot Free after a trial subscription has ended for Copilot Pro. We will continue to keep users updated on progress towards mitigation.
Jun 06, 10:04 AM UTC
We are investigating reports of degraded performance for Copilot
Jun 06, 9:58 AM UTC
On June 5th, 2025, between 17:47 UTC and 19:20 UTC the Actions service was degraded, leading to run start delays and intermittent job failures. During this period, 47.2% of runs had delayed starts, and 21.0% of runs failed. The impact extended beyond Actions itself - 60% of Copilot Coding Agent sessions were cancelled, and all Pages sites using branch-based builds failed to deploy (though Pages serving remained unaffected). The issue was caused by a spike in load between internal Actions services exposing a misconfiguration that caused throttling of requests in the critical path of run starts. We mitigated the incident by correcting the service configuration to prevent throttling and have updated our deployment process to ensure the correct configuration is preserved moving forward.
Jun 05, 7:29 PM UTC
We have applied a mitigation and we are beginning to see recovery. We are continuing to monitor for recovery.
Jun 05, 7:02 PM UTC
Actions is experiencing degraded availability. We are continuing to investigate.
Jun 05, 6:35 PM UTC
Users of Actions will see delays in jobs starting or job failures. Users of Pages will see slow or failed deployments
Jun 05, 6:30 PM UTC
Pages is experiencing degraded performance. We are continuing to investigate.
Jun 05, 6:01 PM UTC
We are investigating reports of degraded performance for Actions
Jun 05, 6:00 PM UTC
On June 4, 2025, between 14:35 UTC and 15:50 UTC , the Actions service experienced degradation, leading to run start delays. During the incident, about 15.4% of all workflow runs were delayed by an average of 16 minutes. An unexpected load pattern revealed a scaling issue in our backend infrastructure. We mitigated the incident by blocking the requests that triggered this pattern.
We are improving our rate limiting mechanisms to better handle unexpected load patterns while maintaining service availability. We are also strengthening our incident response procedures to reduce the time to mitigate for similar issues in the future.
Jun 04, 3:55 PM UTC
We have applied mitigations and are monitoring for recovery.
Jun 04, 3:39 PM UTC
We are currently investigating delays with Actions triggering for some users.
Jun 04, 3:19 PM UTC
We are investigating reports of degraded performance for Actions
Jun 04, 3:15 PM UTC
On May 30, 2025, between 08:10 UTC and 16:00 UTC, the Microsoft Teams GitHub integration service experienced a complete service outage.
During this period, the service was unable to deliver notifications or process user requests, resulting in a 100% error rate for all integration functionality except link previews.
This outage was due to an authentication issue with our downstream provider. We mitigated the incident by working with our provider to restore service functionality and are working to migrate to more durable authentication methods to reduce the risk of similar issues in the future.
May 30, 3:57 PM UTC
Our team is continuing to work to mitigate the source of the disruption affecting a small set of customers using the GitHub Microsoft Teams integration.
May 30, 2:47 PM UTC
We are experiencing a disruption with our Microsoft Teams integration. Investigations are underway and we will provide further updates as we progress.
May 30, 12:29 PM UTC
We are currently investigating this issue.
May 30, 11:20 AM UTC
On May 28, 2025, from approximately 09:45 UTC to 14:45 UTC, GitHub Actions experienced delayed job starts for workflows in public repos using Ubuntu-24 standard hosted runners. This was caused by a misconfiguration in backend caching behavior after a failover, which led to duplicate job assignments and reduced available capacity. Approximately 19.7% of Ubuntu-24 hosted runner jobs on public repos were delayed. Other hosted runners, self-hosted runners, and private repo workflows were unaffected.
By 12:45 UTC, we mitigated the issue by redeploying backend components to reset state and scaling up available resources to more quickly work through the backlog of queued jobs. We are working to improve our deployment and failover resiliency and validation to reduce the likelihood of similar issues in the future.
May 28, 2:43 PM UTC
We are continuing to monitor the affected Actions runners to ensure a smooth recovery.
May 28, 2:35 PM UTC
We are observing indications of recovery with the affected Actions runners.
The team will continue monitoring systems to ensure a return to normal service.
May 28, 1:42 PM UTC
We're continuing to investigate delays in Actions runners for hosted Ubuntu 24.
We will provide further updates as more information becomes available.
May 28, 12:41 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
May 28, 11:49 AM UTC
Actions is experiencing high wait times for obtaining standard hosted runners for ubuntu 24. Other hosted labels and self-hosted runners are not impacted.
May 28, 11:42 AM UTC
We are currently investigating this issue.
May 28, 11:11 AM UTC
On May 27, 2025, between 09:31 UTC and 13:31 UTC, some Actions jobs experienced failures uploading to and downloading from the Actions Cache service. During the incident, 6% of all workflow runs couldn’t upload or download cache entries from the service, resulting in a non-blocking warning message in the logs and performance degradation. The disruption was caused by an infrastructure update related to the retirement of a legacy service, which unintentionally impacted Cache service availability. We resolved the incident by reverting the change and have since implemented a permanent fix to prevent recurrence.
We are improving our configuration change processes by introducing additional end-to-end tests to cover the identified gaps, and implementing deployment pipeline improvements to reduce mitigation time for similar issues in the future.
May 27, 1:31 PM UTC
Mitigation is applied and we’re seeing signs of recovery. We’re monitoring the situation until the mitigation is applied to all affected repositories.
May 27, 1:03 PM UTC
We are experiencing degradation with the GitHub Actions cache service and are working on applying the appropriate mitigations.
May 27, 12:27 PM UTC
We are investigating reports of degraded performance for Actions
May 27, 12:26 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
May 27, 12:41 PM UTC
We are currently investigating this issue.
May 27, 12:20 PM UTC
Between 10:00 and 20:00 UTC on May 27, a change to our git proxy service resulted in some git client implementations not being able to consistently push to GitHub. Reverting the change resulted in an immediate resolution of the problem for all customers. The inflated time to detect this failure was due to the relatively few impacted clients. We are re-evaluating the proposed change to understand how we can
prevent and detect such failures in the future.
May 27, 9:53 PM UTC
On May 26, 2025, between 06:20 UTC and 09:45 UTC GitHub experienced broad failures across a variety of services (API, Issues, Git, etc). These were degraded at times, but peaked at 100% failure rates for some operations during this time.
On May 23, a new feature was added to Copilot APIs and monitored during rollout but it was not tested at peak load. At 6:20 UTC on May 26, load increased on the code path in question and started to degrade a Copilot API because the caching for this endpoint and circuit breakers for high load were misconfigured.
In addition, the traffic limiting meant to protect wider swaths of the GitHub API from queuing was not yet covering this endpoint, meaning it was able to overwhelm the capacity to serve traffic and cause request queuing.
We were able to mitigate the incident by turning off the endpoint until the behavior could be reverted.
We are already working on a quality of service strategy for API endpoints like this that will limit the impact of a broad incident and are rolling it out. We are also addressing the specific caching and circuit breaker misconfigurations for this endpoint, which would have reduced the time to mitigate this particular incident and the blast radius.
May 26, 10:17 AM UTC
We continue to see signs of recovery.
May 26, 10:09 AM UTC
Issues is operating normally.
May 26, 9:51 AM UTC
Git Operations is operating normally.
May 26, 9:46 AM UTC
API Requests is operating normally.
May 26, 9:44 AM UTC
Copilot is operating normally.
May 26, 9:43 AM UTC
Packages is operating normally.
May 26, 9:43 AM UTC
Actions is operating normally.
May 26, 9:42 AM UTC
Packages is experiencing degraded performance. We are continuing to investigate.
May 26, 8:39 AM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
May 26, 8:26 AM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
May 26, 8:25 AM UTC
We are continuing to investigate degraded performance.
May 26, 7:53 AM UTC
Issues is experiencing degraded performance. We are continuing to investigate.
May 26, 7:35 AM UTC
We are investigating reports of degraded performance for API Requests and Git Operations
May 26, 7:21 AM UTC
On May 23, 2025, between 17:40 UTC and 18:30 UTC public API and UI requests to read and write Git repository content were degraded and triggered user-facing 500 responses. On average, the error rate was 61% and peaked at 88% of requests to the service. This was due to the introduction of an uncaught fatal error in an internal service. A manual rollback was required which increased the time to remediate the incident.
We are working to automatically detect and revert a change based on alerting to reduce our time to detection and mitigation. In addition, we are adding relevant test coverage to prevent errors of this type getting to production.
May 23, 6:33 PM UTC
API Requests is operating normally.
May 23, 6:33 PM UTC
API Requests is experiencing degraded performance. We are continuing to investigate.
May 23, 6:26 PM UTC
We are currently investigating this issue.
May 23, 6:21 PM UTC
On May 22, 2025, between 07:06 UTC and 09:10 UTC, the Actions service experienced degradation, leading to run start delays. During the incident, about 11% of all workflow runs were delayed by an average of 44 minutes. A recently deployed change contained a defect that caused improper request routing between internal services, resulting in security rejections at the receiving endpoint. We resolved this by reverting the problematic change and are implementing enhanced testing procedures to catch similar issues before they reach production environments.
May 22, 9:17 AM UTC
We've applied a mitigation which has resolved these delays.
May 22, 9:17 AM UTC
Our investigation continues. At this stage GitHub Actions Jobs are being executed, albeit with delays to the start of execution in some cases.
May 22, 8:47 AM UTC
We are continuing to investigate these delays.
May 22, 8:14 AM UTC
We're investigating delays with the execution of queued GitHub Actions jobs.
May 22, 7:43 AM UTC
We are investigating reports of degraded performance for Actions
May 22, 7:42 AM UTC
A change to the webhooks UI removed the ability to add webhooks. The timeframe of this impact was between May 20th, 2025 20:40 UTC and May 21st, 2025 12:55 UTC. Existing webhooks, as well as adding webhooks via the API were unaffected. The issue has been fixed.
May 21, 2:34 PM UTC
On May 20, 2025, between 18:18 UTC and 19:53 UTC, Copilot Code Completions were degraded in the Americas. On average the error rate was 50% of requests to the service in the affected region. This was due to a misconfiguration in load distribution parameters after a scale down operation.
We mitigated the incident by addressing the misconfiguration.
We are working to improve our automated failover and load balancing mechanisms to reduce our time to detection and mitigation of issues like this one in the future.
May 20, 8:02 PM UTC
Copilot is operating normally.
May 20, 8:01 PM UTC
We are experiencing degraded availability for Copilot Code Completions in the America’s.
We are working on resolving the issue.
May 20, 7:43 PM UTC
We are investigating reports of degraded performance for Copilot
May 20, 7:37 PM UTC
On May 20, 2025, between 12:09 PM UTC and 4:07 PM UTC, the GitHub Copilot service experienced degraded availability, specifically for the Claude Sonnet 3.7 model. During this period, the success rate for Claude Sonnet 3.7 requests was highly variable, down to approximately 94% during the most severe spikes. Other models remained available and working as expected throughout the incident.
The issue was caused by capacity constraints in our model processing infrastructure that affected our ability to handle the large volume of Claude Sonnet 3.7 requests.
We mitigated the incident by rebalancing traffic across our infrastructure, adjusting rate limits, and working with our infrastructure teams to resolve the underlying capacity issues. We are working to improve our infrastructure redundancy and implementing more robust monitoring to reduce detection and mitigation time for similar incidents in the future.
May 20, 4:08 PM UTC
Copilot is operating normally.
May 20, 4:08 PM UTC
The issues with our upstream model provider have been resolved, and Claude Sonnet 3.7 is once again available in Copilot Chat, VS Code and other Copilot products.
We will continue monitoring to ensure stability, but mitigation is complete.
May 20, 4:07 PM UTC
We are continuing to work with our model providers on mitigations to increase the success rate of Sonnet 3.7 requests made via Copilot.
May 20, 2:59 PM UTC
We’re still working with our model providers on mitigations to increase the success rate of Sonnet 3.7 requests made via Copilot.
May 20, 2:15 PM UTC
We are experiencing degraded availability for the Claude Sonnet 3.7 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
Other models are available and working as expected.
May 20, 1:33 PM UTC
We are investigating reports of degraded performance for Copilot
May 20, 1:10 PM UTC
Between May 16, 2025, 1:21 PM UTC and May 17, 2025, 2:26 AM UTC, the GitHub Enterprise Importer service was degraded and experienced slow processing of customer migrations. Customers may have seen extended wait times for migrations to start or complete.
This incident was initially observed as a slowdown in migration processing. During our investigation, we identified that a recent change aimed at improving API query performance caused an increase in load signals, which triggered migration throttling. As a result, the performance of migrations was negatively impacted, and overall migration duration increased. In parallel, we identified a race condition that caused a specific migration to be repeatedly re-queued, further straining system resources and contributing to a backlog of migration jobs, resulting in accumulated delays. No data was lost, and all migrations were ultimately processed successfully.
We have reverted the feature flag associated with a query change and are working to improve system safeguards to help prevent similar race condition issues from occurring in the future.
May 17, 2:27 AM UTC
We continue to see signs of recovery for GitHub Enterprise Importer migrations. Queue depth is decreasing and migration duration is trending toward normal levels. We will continue to monitor improvements.
May 17, 2:26 AM UTC
We have identified the source of increased load and have started mitigation. Customers using the GitHub Enterprise Importer may still see extended wait times until recovery completes.
May 16, 10:33 PM UTC
Investigations on the incident impacting GitHub Enterprise Importer continue. An additional contributing cause has been identified, and we are working to ship additional mitigating measures.
May 16, 8:36 PM UTC
We have taken several steps to mitigate the incident impacting GitHub Enterprise Importer (GEI). We are seeing early indications of system recovery. However, customers may continue to experience longer migrations and extended queue times. The team is continuing to work on further mitigating efforts to speed up recovery.
May 16, 6:19 PM UTC
We are continuing to investigate issues with the GitHub Enterprise Importer. Customers may experience slower migration processes and extended wait times.
May 16, 3:32 PM UTC
We are investigating issues with the GitHub Enterprise Importer. Customers may experience slower migration processes and extended wait times.
May 16, 2:06 PM UTC
We are currently investigating this issue.
May 16, 1:46 PM UTC
On May 16th, 2025, between 08:42:00 UTC and 12:26:00 UTC, the data store powering the Audit Log API service experienced elevated latency resulting in higher error rates due to timeouts. About 3.8% of Audit Log API queries for Git events experienced timeouts. The data store team deployed mitigating actions which resulted in a full recovery of the data store’s availability.
May 16, 3:24 PM UTC
We are investigating issues with the audit log. Users querying Git audit log data may observe increased latencies and occasional timeouts.
May 16, 10:22 AM UTC
We are currently investigating this issue.
May 16, 9:22 AM UTC
Between May 15, 2025 10:10 UTC and May 15, 2025 22:58 UTC the Copilot service was degraded and returned a high volume of internal server errors for requests targeting Gemini 2.5 Pro, a public preview model. This was due to a high volume of rate limiting by the upstream model provider, similar in volume to the internal server errors during the previous day.
We mitigated the incident by temporarily disabling Gemini 2.5 Pro for all Copilot Chat experiences, and then worked with the model provider to ensure model health was sufficiently improved before re-enabling.
We are working with the model provider to move to more resilient infrastructure to mitigate issues like this one in the future.
May 15, 10:58 PM UTC
The issues with our upstream model provider have been resolved, and Gemini 2.5 Pro is available again in Copilot Chat, VS Code, and other Copilot products.
We will continue monitoring to ensure stability, but mitigation is complete.
May 15, 10:57 PM UTC
We have started to gradually re-enable the Gemini 2.5 Pro model in Copilot Chat, VS Code, and other Copilot products.
May 15, 10:20 PM UTC
We have disabled the Gemini 2.5 Pro model in Copilot Chat, VS Code and other Copilot products due to an issue with an upstream model provider.
Users may still see these models as available for a brief period but we recommend switching to a different model. Other models are not impacted and are available.
Once our model provider has resolved the issues impacting Gemini 2.5 Pro, we will re-enable it.
May 15, 1:16 PM UTC
We are currently investigating this issue.
May 15, 12:41 PM UTC
On May 15, 2025, between 00:08 AM UTC and 10:21 AM UTC, customers were unable to create fine-grained Personal Access Tokens (PATs) on github.com. This incident was triggered by a recent code change to our front end that unintentionally affected the way certain pages loaded and prevented the PAT creation process from completing.
We mitigated the incident by reverting the problematic change. To reduce the likelihood of similar issues in the future, we are improving our monitoring for page load anomalies and PAT creation failures and improving our safe deployment practices.
May 15, 10:38 AM UTC
The issue preventing users from creating Personal Access Tokens (PATs) has been resolved. The root cause was identified and a change was reverted to restore functionality. PAT generation is now working as expected.
May 15, 10:38 AM UTC
We have identified the cause, and have a working fix. We will continue to update users.
May 15, 9:56 AM UTC
We are exploring the best path forward, but no new update at this stage.
May 15, 9:20 AM UTC
While we have found a possible cause, we have no update on mitigation steps at this stage. We will continue to keep users updated.
May 15, 8:35 AM UTC
We are investigating fine grained PAT creation failures. We will continue to keep users updated on progress towards mitigation. Existing FGP's are unaffected.
May 15, 7:44 AM UTC
We are currently investigating this issue.
May 15, 7:00 AM UTC
Between May 14, 2025 14:16 UTC and May 15, 2025 01:02 UTC the Copilot service was degraded and returned a high volume of internal server errors for requests targeting Gemini 2.5 Pro, a public preview model. On average, the error rate for Gemini 2.5 Pro was 19.6% and peaked at 41%. This was due to a high volume of internal server errors and rate limiting by the upstream model provider.
We mitigated the incident by temporarily disabling Gemini 2.5 Pro for all Copilot Chat experiences, and then worked with the model provider to ensure model health was sufficiently improved before re-enabling.
We are working with partners to improve communication speed and are planning to move to more resilient infrastructure to mitigate issues like this one in the future.
May 15, 1:02 AM UTC
We have received confirmation from our upstream provider that the issue has been resolved. We are seeing significant recovery. The Gemini 2.5 Pro model is now fully available in Copilot Chat, VS Code, and other Copilot products.
May 15, 1:02 AM UTC
We continue experiencing degraded availability for the Gemini 2.5 Pro model in Copilot Chat, VS Code and other Copilot products. We are working closely with our upstream provider to resolve this issue.
May 14, 9:18 PM UTC
We continue experiencing degraded availability for the Gemini 2.5 Pro model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
May 14, 4:46 PM UTC
We are experiencing degraded availability for the Gemini 2.5 Pro model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
Other models are available and working as expected.
May 14, 4:01 PM UTC
We keep investigating issues with Gemini 2.5 Pro model which is in Public Preview. Users may see intermittent errors with this model.
May 14, 3:14 PM UTC
We are currently investigating this issue.
May 14, 2:39 PM UTC
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
May 12, 3:06 PM UTC
We are currently investigating this issue.
May 12, 2:53 PM UTC