Salesforce, Python, SQL, & other ways to put your data where you need it

Need event music? 🎸

Live and recorded jazz, pop, and meditative music for your virtual conference / Zoom wedding / yoga class / private party with quality sound and a smooth technical experience

Deleting history of a file in AWS CodeCommit

29 Nov 2021 🔖 git python aws windows
💬 EN

Table of Contents

Has anyone here ever tried to completely erase a file that contained a password from AWS CodeCommit? I got stuck using git-filter-repo – things seemed to work on my computer, but not in the cloud.

(In my case, a completely new file was created that did the same job without a password, so I simply needed to make it look like the file had never existed.)

I followed these directions, but while I can no longer see any trace of the file’s existence when using SourceTree to peruse my local repo’s commit history, it’s still in the commit history displayed at console.aws.amazon.com after force-pushing my local repo to CodeCommit.

(The “most recent” version of the file says not found in the web console, but it’s not missing from commit history in the web console like it appears to be through SourceTree against local.)

Below are the steps I took.

Installed git-filter-repo

  • From c:\miscsoftware\git-filter-repo\, ran:
      git clone https://github.com/newren/git-filter-repo#simple-example-with-comparisons .
    
  • From a random directory, ran:
      git filter-repo --version
    
    • Got a reply of:
        git: 'filter-repo' is not a git command. See 'git --help'.
      
  • Added c:\miscsoftware\git-filter-repo\ to my username’s PATH environment variable.
  • From a random directory, ran:
      git filter-repo --version
    
    • Got a number along the lines of 1234abcd5678 as a reply.

Looks like I successfully installed Elijah Newren’s git-filter-repo.

Back up GitConfig files

  • Backed up the contents of my repository’s “local” Git Config file, c:\my_repo\.git\config, into a plaintext file on my desktop.
  • Just to be safe, also backed up the contents of my system’s “global” Git Config file, c:\users\my_username\.gitconfig, into a plaintext file on my desktop.
  • Just to be safe, also backed up the contents of my system’s “portable” Git Config file, c:\ProgramData\Git\config.
  • There was no file named anything like gitconfig at C:\Program Files\Git\mingw64\etc, so I guess my system doesn’t have a “system” Git Config file.

Clean up my repository

  • Ran a fresh git pull to download a copy of my Git repository from AWS CodeCommit into my local computer’s copy of my repository.
  • Got rid of all files I was working on in c:\my_repo\, moving them over to my desktop, so that SourceTree didn’t show anything in the “Unstaged files” panel.

Unsuccessfully ran git filter-repo

  • In my Git Bash software, from MINGW64:/c/my_repo, ran:
      git filter-repo --invert-paths --path /c/my_repo/sub_folder/bad_script.java
    

    Or maybe it was:

      git filter-repo --invert-paths --path sub_folder/bad_script.java
    

    I forgot.

Sadly, this errored out and I didn’t copy the error message down. To summarize, git-filter-repo complained that my repository wasn’t “freshly cloned” enough.

Deleted and re-cloned my repository from AWS

  • Deleted the contents of c:\my_repo\
  • From MINGW64:/c/my_repo, ran:
      git clone https://git-codecommit.my-region.amazonaws.com/v1/repos/my_repo .
    
  • Replaced the contents of c:\my_repo\.git\config with the backup I saved earlier.

Unsuccessfully reran git filter-repo

  • In my Git Bash software, from MINGW64:/c/my_repo, ran:
      git filter-repo --invert-paths --path /c/my_repo/sub_folder/bad_script.java
    
    • Results something along the lines of:
        Parsed 5 commitsHEAD is now at 1234ab testing git
        Enumerating objects: 780, done.
        Counting objects: 100% (780/780), done.
        Delta compression using up to 8 threads
        Compressing objects: 100% (195/195), done.
        Writing objects: 100% (780/780), done.
        Total 780 (delta 560), reused 780 (delta 560), pack-reused 0
      
        New history written in 2.8 seconds; now repacking/cleaning...
        Repacking your repo and cleaning out old unneeded objects
        Completely finished after 8.4 seconds.
      

Gitignore

Not yet realizing things had failed…

  • Created a c:\my_repo\.gitignore file (there wasn’t one) and put sub_folder/bad_script.java into it.
  • Opened my repo in SourceTree. The .gitignore file is in “Unstaged files” and bad_script.java isn’t despite still existing at c:\my_repo\sub_folder\bad_script.java, which seems fine.
    • Except that if I change the contents of the gitignore to add an x to the end of the ignored filename, bad_script.java still doesn’t appear in “unstaged files.” Weird.
  • Went to SourceTree history view, and in the commit where bad_script.java was first added to the repository, sadly, sub_folder/bad_script.java was still in the “added files” list, and I could see the password in the preview of the file’s contents.

Reran git-filter-repo

  • In my Git Bash software, from MINGW64:/c/my_repo, ran:
      git filter-repo --invert-paths --path sub_folder/bad_script.java
    
    • Results something along the lines of:
        Parsed 80 commitsHEAD is now at f62b9d5 testing git
        Enumerating objects: 784, done.
        Counting objects: 100% (784/784), done.
        Delta compression using up to 8 threads
        Compressing objects: 100% (196/196), done.
        Writing objects: 100% (784/784), done.
        Total 784 (delta 572), reused 782 (delta 571), pack-reused 0
      
        New history written in 0.2 seconds; now repacking/cleaning...
        Repacking your repo and cleaning out old unneeded objects
        Completely finished after 1.5 seconds.
      
  • Back in SourceTree, I clicked to another commit in this history, then clicked to the one I was hoping to see my file disappear from. I scrolled down the “added files” list to the appropriate place in the alphabet and noticed sub_folder/bad_script.java was no longer in the list. Great!
  • Furthermore, there was no longer a c:\my_repo\sub_folder\bad_script.java file on my hard drive.

Took notes about CodeCommit web console before pushing

  • I visited https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/browse/refs/heads/main/--/sub_folder/bad_script.java?# and validated that, in the cloud, not having pushed anything from my desktop to the cloud yet, bad_script.java still existed, password and all in its body.
  • I also found bad_script.java at https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/commit/1234567890987654321?region=my-region in the “go to file” suggestions, and I could see it in the file list, and see its contents, password and all, if I clicked on that suggestion.
  • Clicking that commit’s “browse” took me to https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/browse/1234567890987654321/--/sub_folder?region=my-region, where I could see the contents of the file, password and all.
  • Clicking “browse” through the file tree for a later commit, I was able to see the contents of the file at https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/browse/0987654321234567890/--/sub_folder?region=my-region, passwords and all.

Force-pushed from my desktop to CodeCommit

  • With my “gitignore” still languishing in “unstaged files,” I skipped over the part of the directions that said to commit it to the repo. Maybe that was a mistake, but I didn’t figure I needed it, since the file had been deleted from my hard drive. locally, I hit “push” in SourceTree.
  • From MINGW64:/c/my_repo, ran:
      git push origin --force --all
    
    • Results:
        fatal: 'origin' does not appear to be a git repository
        fatal: Could not read from remote repository.
      
        Please make sure you have the correct access rights
        and the repository exists.
      
    • Oops. That’s right, the directions for using git-filter-repo warned me that my “remote” was going to disappear from c:\my_repo\.git\config.
  • Replaced the contents of c:\my_repo\.git\config with the backup I saved earlier.
  • From MINGW64:/c/my_repo, ran:
      git push origin --force --all
    
    • Results something along the lines of:
        Enumerating objects: 784, done.
        Counting objects: 100% (784/784), done.
        Delta compression using up to 8 threads
        Compressing objects: 100% (196/196), done.
        Writing objects: 100% (784/784), 3.9 MiB | 1.3 MiB/s, done.
        Total 784 (delta 572), reused 784 (delta 572), pack-reused 0
        remote: processing ...
        To https://git-codecommit.my-region.amazonaws.com/v1/repos/my_repo
         \+ 9876fe...9876fe main -> main (forced update)
      

Inspected CodeCommit web console after pushing

  • Inspected https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/browse/refs/heads/main/--/sub_folder/bad_script.java?#.
    • Results promising: I got a PathDoesNotExistException header with a Could not find path sub_folder/bad_script.java; body as a big red error message.

But then it’s bad news:

  • Not only is bad_script.java still in the picklist suggester at https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/commit/1234567890987654321?region=my-region, but I could indeed jump to it and see its highlight.
  • “Browse File Contents” still let me see it at https://console.aws.amazon.com/codesuite/codecommit/repositories/my_repo/browse/1234567890987654321/--/sub_folder/bad_script.java?region=my-region

I wonder what happened.

  • Did I mess up not committing the .gitignore before pushing?
  • Did I mess up running git filter-repo with a .gitignore in place?
  • Are there things cached in AWS that I need to request that they fix?
    • If so, would setting up a new greenfield AWS remote and force-pushing to it do the trick?
    • If that works, would nuking & re-setting-up the remote work? Or would it still fail if the remote had the same name due to caching?

Re-cloned and examined SourceTree

  • Deleted the contents of c:\my_repo\ again.
  • From MINGW64:/c/my_repo, again ran:
      git clone https://git-codecommit.my-region.amazonaws.com/v1/repos/my_repo .
    
  • Replaced the contents of c:\my_repo\.git\config with the backup I saved earlier again.
  • Clicked away from history and back into history in SourceTree. Clicked to the commit where the file had been added. I scrolled down the “added files” list to the appropriate place in the alphabet, and sub_folder/bad_script.java is still gone from that commit on my hard drive. There still isn’t a c:\my_repo\sub_folder\bad_script.java file on my hard drive.

WEIRD.

I wish AWS had provided proper documentation about thoroughly cleaning secrets out of repositories in the cloud the way GitHub did.

I’m stumped.

--- ---