world

I scanned all of GitHub's "oops commits" for leaked secrets

admin

Jul 3, 2025 - 08:15

0 0

I scanned all of GitHub's "oops commits" for leaked secrets

TL;DR

GitHub Archive logs every public commit, even the ones developers try to delete. Force pushes often cover up mistakes like leaked credentials by rewriting Git history. GitHub keeps these dangling commits, from what we can tell, forever. In the archive, they show up as “zero-commit” PushEvents.
I scanned every force push event since 2020 and uncovered secrets worth $25k in bug bounties.
Together with Truffle Security, we're open sourcing a new tool to scan your own GitHub organization for these hidden commits (try it here).

The new open-source Force Push Scanner tool identifies secrets in dangling commits.

This guest post by Sharon Brizinov, a white-hat hacker, was developed through Truffle Security’s Research CFP program. We first connected with Sharon after his widely shared write-up, How I Made 64k From Deleted Files, where he used TruffleHog to uncover high-value secrets in public GitHub repositories. In this follow-up, Sharon expanded his research to access 100% of deleted commits on GitHub. He takes a deeper dive into one of our favorite areas: secrets hidden in deleted GitHub commits.

Overview

What Does it Mean to Delete a Commit?
Github Event API
Finding all Deleted Commits
Building the Automation
Hunting for Impactful Secrets
Case Study - Preventing a Massive Supply-Chain Compromise
Summary

Background

My name is Sharon Brizinov, and while I usually focus on low-level vulnerability and exploitation research in OT/IoT devices, I occasionally dive into bug bounty hunting.

I recently published a blog post about uncovering secrets hidden in dangling blobs within GitHub repositories, which sparked quite a lively discussion. After the post, I had several conversations with various people including Dylan, the CEO of Truffle Security, who gave me some intriguing ideas for continuing to explore new methods for large-scale secret hunting. I decided to create a mind map with everything I know related to this topic and try to come up with a new idea.

I’ll spare you my messy sketch, but here’s a roundup of the projects, blogs, ideas, and resources I zeroed in on (highly recommended):

Eventually, I came up with a simple idea - I will use the Github Event API alongside the GitHub Archive project to scan all Zero-Commit Push-Events (deleted commits) for secrets. Everything was known, I just glued it together and built automation at scale that hunted for secrets.

In this blog, I will describe my journey from understanding why you can never really delete a commit in GitHub to how to find all of them and build automation around it.

What Does it Mean to Delete a Commit?

In my previous blog post, I discussed how I discovered supposedly deleted files within GitHub repositories. Specifically, I was able to reconstruct dangling blobs - objects that had been deleted and were no longer referenced by any commit or tree… Or so I thought. After chatting with the Truffle folks, it turns out these orphaned blobs actually had orphaned commits that went along with them. And with a little research, I was able to uncover 100% of those orphaned commits at scale, across all of GitHub.

Suppose you’ve accidentally committed and pushed a secret to your repository. What’s the next step? Typically, you’d want to reset the HEAD to the previous commit and force-push the changes, effectively removing the current commit and making it unreferenced - essentially deleting it. Here’s how you do it:

But as neodyme and TruffleHog discovered, even when a commit is deleted from a repository, GitHub never forgets. If you know the full commit hash, you can access the supposedly deleted content. Moreover, you don't even need the full commit has, as TruffleHog discovered - it's enough to brute-force just the first four hex-digits.

Force Pushing: A Tutorial

Let’s see this in action using my own repository: test-oops-commit. Try to locate the deleted commit - 9eedfa00983b7269a75d76ec5e008565c2eff2ef.

To help visualize our commits, I prepared a simple bash script that shows the commit-tree-blob objects, get_commits.sh:

git rev-list --all | while read commit; do
   echo "Commit: $commit"
   git cat-file -p "$commit" | grep '^tree\|^parent'
   git ls-tree -r "$commit"
   echo
 done

We start by creating a simple repository with a single commit (a README.md file):

Next, we create a new file named secret.txt containing our secret my-password-is-iloveu. We accidentally commit and push our secret to GitHub.

We look at the commit tree to see that we have a new commit 9eedfa… which is associated with a new tree and a new blob for the file secret.txt. We see the same when we run git rev-list -- all, git log, or when we access it from the web on GitHub.

Oops! We discover our mistake and delete the commit by moving the HEAD of the branch to the previous commit and force-push it using:

Let's remove our local version of the repo, clone the repository again, and check the commit tree. Phew, no secrets; the commit was really deleted!

But we remember the commit hash, we we check online on GitHub and the commit can still be accessed - 9eedfa00983b7269a75d76ec5e008565c2eff (even accessing using four hex digits is enough 9eef). However, this time we get a message saying that the commit is deleted or doesn't belong to any branch on this repository.

Why Does it Happen?

When you force-push after resetting (aka git reset --hard HEAD~1 followed by git push --force), you remove Git’s reference to that commit from your branch, effectively making it unreachable through normal Git navigation (like git log). However, the commit is still accessible on GitHub because GitHub stores these reflogs.

Why? I don’t know for sure, but GitHub does give some hints. As I see it, GitHub is a much more complex beast than just a git server. It has many layers, including pull-requests, forks, private-public settings, and more.

My guess is that to support all of these features, GitHub stores all commits and never deletes them. Here are some cases to consider:

What are pull requests? These are just temporary branches, as Aqua Security wrote about, and can be retrieved by fetching all refs using -
1. git -c "remote.origin.fetch=+refs/*:refs/remotes/origin/*" fetch origin
How does the GitHub fork network work? What happens when you “fork” a repository? All the data is replicated, including commits you might delete.

For these cases, and probably many others too (auditing? monitoring?) Github stores all the commits and won’t delete them, even if you force-push the head and “delete” the commit.

Github Event API

OK, so commits are not really deleted. Fine. But you’d still need to know the full commit hash, or at least the first four hex-digits ignoring collisions (16^4=65536). As it turns out, TruffleHog has a tool to do just that, but it’s very slow, as you can imagine, going through all those. It doesn’t scale well beyond taking a day or two on a single repo.

But there’s another way. A faster way, I’m now happy to share with you. The GitHub Event API is part of GitHub's REST API, which allows users to retrieve information about events that occur within GitHub. Events represent various activities in GitHub, such as:

Pushing code
Opening or closing issues or pull requests
Creating repositories
Adding comments
Forking a repo
Starring a repo

Try it:

curl http://api.github.com/events

A few notes:

No API token or auth is needed!
You can see all the events that GitHub supports here.
Events are recorded in near-real-time, but may be delayed by a few seconds.
It’s only for public repositories.

So, we could monitor commit data for all GitHub public repositories and store all the hashes. No more guessing commit hashes! Yeah, but it’s way too much. We are talking about millions of events per hour, and what about past events? Are they lost?

Luckily for us a great developer named Ilya Grigorik decided many years ago to start a project that listens to GitHub’s event stream and systematically archives it. The project is open-source and called GH Archive and the website is gharchive.org. So, if we want, for example, to get the entire GitHub public activity around Jan 1st at 3pm UTC we just download this from here: https://data.gharchive.org/2015-01-01-15.json.gz.

Here is a random sample of a PushEvent from that 2015-01-01-15 archive:

Finding Force Push Deleted Commits

To identify only the deleted commits from force push events, we can look for push events that contain zero commits. Why would a Git push event have no commits? It indicates a force push that resets the branch - essentially just moving the HEAD without adding any new commits! I call this an Oops commit or a Push-Event Zero-Commit.

Let’s see a quick example. We will download a random archive and search for such an event.

If we randomly select one of the target event types, we will see that the commits array is empty (zero commits). And if we look at the before commit - the one that was “deleted” (the HEAD before moving to HEAD^1, which is the “after”) - we see that Github still holds a record of it 10 years later!

Here it is - https://github.com/grapefruit623/gcloud-python/commit/e9c3d31212847723aec86ef96aba0a77f9387493

And it’s not necessarily just the before commit that was deleted. Sometimes a force push overwrites many commits at once.

Given a Github organization (or user), repo name, and commit hash, it’s quite easy to scan the content of the “deleted” commit(s) for secrets using Git access:

This script:

Clones a repo in a minimal way.
- --filter=blob:none: Omits file contents (blobs), only history/trees/commits.
- --no-checkout: Doesn't check out the working directory (no files appear yet).
Fetches a specific commit ().
Scans for secrets using TruffleHog.
- TruffleHog will automatically pull down the file contents (blobs) that need to be scanned.
- This command will search for secrets in all commits, starting with the before commit and working backward until the start of that branch. This ensures that all data from a force push overwriting more than one commit gets scanned; however, it will scan some non-dangling commits. The open-source tool we’ve released is a bit more efficient and only scans the actual dangling (dereferenced) commits.

GitHub doesn't specify an exact rate limit for Git operations, but excessive cloning or fetching of repositories may trigger delaying or rate limiting (see here).

In addition, we can use other methods to query a specific deleted/dangling commit with the GitHub API or simply with the Github web UI.

GitHub API

Query for the commit patch using GitHub’s REST API:

https://api.github.com/repos///commits/

Note: There’s a strict rate-limit of 5,000 queries per hour for registered users and merely 60 for unregistered users. The server response header x-ratelimit-remaining indicates how many API calls users have left.

Direct Web Access via Github.com

You can also access the commit details directly from GitHub.com.

Here are three different examples of how to access any commit via the GitHub website:

https://github.com///commit/

https://github.com///commit/.patch

https://github.com///commit/.diff

Although there is no documented rate limit, access is not guaranteed under heavy usage, and their WAF may block requests at any time without notice.

Building the Automation

So we have all the ingredients - we can get all GitHub event data, search for all PushEvent Zero-Commit events, fetch the “deleted” commit (the before hash), and then scan for active secrets using TruffleHog. Let’s do this.

You know what? No need to build it, because together with Truffle Security’s Research team, we’re open-sourcing a new tool to search the entire GH Archive for “Oops Commits” made by your GitHub organization or user account. Since the entire GH Archive is available as a Google Big Query public dataset, this tool scans GHArchive PushEvent data for zero-commit events, fetches the corresponding commits, and scans them for secrets using TruffleHog.

Disclaimer: We are releasing this tool to help blue teamers assess their potential exposure. Please use it responsibly.

Here’s a command to get started:

python force_push_scanner.py --db-file /path/to/force_push_commits.sqlite3 --scan <github_org/user

For this research, I used a custom version of our open-source tool to scan all of GitHub's Oops commits since 2020. And wow. There were lots of secrets!

Hunting for Impactful Secrets

After running the automation, I found thousands of active secrets. But how can I identify the most interesting secrets tied to the most impactful organizations? My three-step formula for success: manual search, a vibe-coded triage tool, and AI.

Manual Search

First, I manually explored and manipulated the data - essentially, got my hands dirty. The automation I built stores each newly discovered secret in a well-structured JSON file. Here's an example of what one of those files looks like:

During this stage, I manually looked over the files for interesting secrets. For example, I filtered out all commits made by authors with generic email addresses (e.g. gmail.com, outlook.com, mail.ru, etc) and focused on commits pushed by authors with a corporate email. While not perfect, it was a good start, and I found some really impactful keys.

To understand the impact of specific tokens, I tried to figure out who owns the key and what access it has using open-source tools (e.g. secrets-ninja) and a few custom scripts. During my research, I learned that the Truffle Security team launched an open-source tool to do just that - TruffleHog Analyze. It’s built into TruffleHog; you just have to run trufflehog analyze.

Note: I only did this additional secret enumeration when it was in-scope for specific Bug Bounty or Vulnerability Disclosure programs.

Once I found something relevant or interesting, I reported it via a bug-bounty program or directly via email.

Vibe Coding for Secret Triage

After a couple hundred manual checks, I had enough and decided to scale-up my secrets review. I used vercel v0 to vibe-code a whole platform for triaging these “Oops Commit” secrets.

The platform was very simple. It was a front-end-only interface (no backend at all) that received a .zip file with JSON files created by the scanner. It then presented them in a very easy-to-use table so I could quickly review them and mark what I had already reviewed. This method proved very efficient, and I used a combination of filters to quickly find the hidden gems!

I also added some graphs and pie charts because why not? Looking at these graphs immediately revealed a few insights.

First, if you look at the time-series graph below, there’s clearly a direct correlation between the year and amount of active secrets - most likely because older secrets have already been revoked or expired - as they should!

Second, MongoDB secrets leaked the most. Based on my review of the data, this is because a lot of junior developers and CS students leaked mostly non-interesting side-project MongoDB credentials. The most interesting leaked secrets were GitHub PAT tokens and AWS credentials. These also generated the highest bounties!

Finally, I plotted the frequency of files leaking valid credentials, ahd the results are clear - your .env file needs extra protection!

Besides .env the most leaking filenames are: .env, index.js, application.properties, app.js, server.js, .env.example, docker-compose.yml, Unknown, README.md, main.py, appsettings.json, db.js, .env.local, settings.py, config.py, app.py, config.env, application.yml, config.json, config.js, WeatherManager.swift, .env.production, database.js, hardhat.config.js, script.js, App.js, .env.development, hardhat.config.ts, index.ts, config.ts, secrets.txt, main.js, index.html, docusaurus.config.js, default.json, Dockerfile, vercel.json, application-dev.yml, api-client.ts, docker-compose.yaml, api_keys.py

Everything AI

I was quite satisfied with my vibe-coded secrets review platform. However, reviewing secrets is still a manual task. Ideally, the process should automatically resolve all secrets to extract basic information about the associated accounts wherever possible. This data could then be passed to a LLAMA-based agent that analyzes and identifies potentially valuable secrets. In essence, the goal is to build an offline agent capable of determining which secrets hold significance from a bug bounty or impact-driven perspective.

With the help of my friend Moti Harmats, I started working on it, but there’s still a lot more work to do, so I won’t release it at this time. But here’s a preview of what I started building:

Case Study - Preventing a Massive Supply-Chain Compromise

One of the secrets I found in a deleted commit was a GitHub Personal Access Token (PAT) belonging to a developer. The developer accidentally leaked this secret when they committed their hidden configuration files (dot files). I analyzed this token and found it had admin access to ALL of Istio repositories.

Istio is an open-source service mesh that provides a transparent and language-independent way to flexibly and easily automate application network functions. It is designed to manage the communication between microservices in a distributed application, offering features such as traffic management, security, and observability without requiring changes to the application code.

The main Istio project has 36k stars and 8k forks. Istio is used by a wide range of organizations and teams that run complex, distributed applications, especially those adopting microservices architectures. This includes giant corporations like Google, IBM, Red Hat and many others.

And I had ADMIN level access to ALL of Itsio repositories (there are many of them). I could have read environment variables, changed pipelines, pushed code, created new releases, or even deleted the entire project. The potential for a mass supply-chain attack here was scary.

Fortunately, Istio has a well-maintained report page, and the team acted quickly to revoke the GitHub PATs as soon as the issue was reported. Thank you!

Summary

This was a really fun project. I glued together some known discoveries and was able to create a reliable automation that scanned and found thousands of active secrets, even some that were buried for years. I also got the chance to vibe code a secret hunting platform with some nice features that allowed me to find needles in a haystack and earn approximately $25k of bounties and deep-thanks through the process.

The common assumption that deleting a commit is secure must change - once a secret is committed it should be considered compromised and must be revoked ASAP. It’s true for git blobs, git commits, and anything else that goes online.