The project I’m working on is very large, and has a repository to match, due in part because during the early life of the project, the database and some large binary files were added and then removed from the repo. Now, at 800MB compressed, the repo is way too large causing problems when attempting to clone in new environments, time consuming for re-indexing in IDE’s and needed to be reduced in size.
We needed to keep much of the repository history intact. The project has had and continues to have several developers working on it, and we decided that exporting the project and creating a new repository with no history was not the best option for us. We therefore embarked on a plan that we thought would help to get the repository down to a manageable size.
The bulk of the repo is a Drupal installation with hundreds of custom modules, contrib modules and features. In addition we had some folders, vendor libraries and documentation on the same level as the drupal docroot that needed to be retained.
MyProject |--docroot |----.htaccess |----sites <-- we needed this primarily |----wireframes |----robots.txt <-- we also needed these |--hooks <-- we also needed these |--library <-- we also needed these |--vendor <-- we also needed these |--utils <-- we also needed these
We knew that the large file(s) were originally added to the docroot and below. At the beginning of the process, the repository with it's branches, tags and backup reflogs, hooks etc compressed was 803M
git clone firstname.lastname@example.org:githubusername/examplerepo.git mynewrepo
git remote remove origin or git remote rm origin
git tag -l | xargs git tag -d
git filter-branch --prune-empty --subdirectory-filter docroot/sites HEAD
If you wished to keep the branchs, use -- --all instead of HEAD
git reset --hard
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now
At the end of this process, the sites folder was whittled down to 113M a vast improvement, but we were not done there.