Rewriting Commit History ~ or ~ How to Cleanup Your Git Past 👀

·

4 min read

Git seems like a terrifying void when you're first introduced to it ⚫️. How does everything I've saved end up in this seemingly endless space, where I can see all the changes from wherever I choose to log on!? And it doesn't end there! I can actually 'rewind' my steps and go to an earlier save too! (Geez, wish I'd had that option when I lost that one 3000 word essay to a faulty uni computer and a cloud account which had run out of space ☁️). It's truly a magical (but as I said, terrifying) void.

The latest trick I've learnt is to 'rewrite commit history'. So say you're working on a new feature on a program, you start by doing it one way, and commit that (as 001). Then you add some extra functionality (002), then you add another 5 functionality options, go through a moment at 2am where you clear everything and start over, backtrack a bit, and start again (003 - 015). Your commit history is now looking like a winding adventure with quite a few dead-ends. You go to bed, wake up in the morning and realise you've made a terrible mistake. How to both rewind to the last good save (let's say 002), and make all the bad choices ✨magically✨ disappear?

Rewriting commit history to the rescue!

I found myself in a similar situation during my Outreachy internship. I had ended up with a PR that had work in it which was out of the current scope. My mentor suggested I look into rewriting Git history to clear the branch to the last clean commit, move the out of scope work to another branch, and make the final updates to the current branch. This was possible because it was a PR for a particular issue, and we didn't need a history of the process to get to the result.

History Rewriting Options

reset

git reset is like a rollback, which points your environment back to a previous commit. The environment includes the local repository (which is the default), staging area and working directory. Options are given to update other parts of the environment, such as --hard, --mixed and --soft. This option is good if you're the only one working on the local repository and if the commits haven't been pushed to the remote repository yet. It's also a good option if you need to completely erase parts of the commit history, eg. if any sensitive data was published, and you'd use the --hard option.

revert

git revert ends up with similar results to reset, however, it takes a different approach. A new commit is added which 'cancels out' the changes. This will require a new commit message and you'll still have the full commit history. This is a better option if others are also working on the code you've written and have made new changes, as it will help avoid any conflicts.

rebase

git rebase helps to integrate changes from one branch into another, particularly looking at history rewriting (in comparison to git merge which also integrates, but focuses on moving forward). A group of commits are either moved or combined to create a new base commit. It makes it appear as though the current branch has been created from a different past commit. So there are completely new commits being added to the history.

The Full Clean Option

Here's what I did in the end to revert back to the last clean commit as well as erase the history of all the commits I'd made.

  1. Identify the last good commit by looking at git log or the commit history on GitHub. Get the short hash (SHA) for the commit, which will be a mixture of letters and integers, 8 elements long (like 1a2b3c4d)
  2. Run the following to get back to that commit
    git reset --hard 1a2b3c4d
    
    git reset sets the the current HEAD of the branch to the specified commit 1a2b3c4d (Git HEADs refer to the current commit being viewed). Using the --hard option ensures the index and the working tree are reset, so any changes made to the tracked files are disposed of, and untracked files or directories are deleted. If you want to keep the changes made, and mark them to be committed you can use --soft instead. The Git docs have more information on other options.
  3. Then force push the changes
    git push --force
    
    This overrides any restrictions and ensures that the original history is erased and replaced with the reset history in Git that was created in the previous step.

And volia!

No more messy PR with pointless commits, a nice clean branch with the correct scope of work and everyone's happy! "Little girl saying 'taadaa'"

The Git docs definitely help make things a little less scary...