背景
Git历史记录中,若不小心加入了较大的而无用的文件,很可能会让你的repository的size变得很大。Github和Bitbucket都分别对repository的大小做出了限制 (见https://help.github.com/articles/what-is-my-disk-quota/ 及https://confluence.atlassian.com/pages/viewpage.action;jsessionid=735235A3CE151FB6D4C518F3971FD524.node2?pageId=273877699)。那么如何在Git的历史记录中找出较大的文件并且删除它们呢?
解决办法
1. 在repository的根目录下,运行以下shell。该shell会自动列出size前十的文件。
#!/bin/bash #set -x # Shows you the largest objects in your repo's pack file. # Written for osx. # # @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/ # @author Antony Stubbs # set the internal field spereator to line break, so that we can iterate easily over the verify-pack output IFS=$'\n'; # list all objects including their size, sort by size, take top 10 objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head` echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file." output="size,pack,SHA,location" for y in $objects do # extract the size in bytes size=$((`echo $y | cut -f 5 -d ' '`/1024)) # extract the compressed size in bytes compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024)) # extract the SHA sha=`echo $y | cut -f 1 -d ' '` # find the objects location in the repository tree other=`git rev-list --all --objects | grep $sha` #lineBreak=`echo -e "\n"` output="${output}\n${size},${compressedSize},${other}" done echo -e $output | column -t -s ', '
运行结果:
注意最后一列列出了这些文件的location,location将会在第二步中被用到。
2. 使用git filter-branch移除文件
git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.gz' HEAD
`*.gz` 需替换为第一步中得到的location。
若你想删除的文件还存在当前的文件夹下,并且你想保留它,请将HEAD替换为HEAD^
3. push回remote repository:
git push origin master --force
4. 清理repository。这一步也是必须的。
git reflog expire --expire=now --all
git gc --aggressive --prune=now
5. 可以通过以下命令查看repository的大小:
du -sh .git