Splitting a Git repository in two

Until now I have managed with only a handful of git commands and Git Extensions. I know how incredibly flexible Git is and I ‘get’ the DVCS paradigm, but I have not needed to use it, until that is I had to split a large repository with lots of branches in two. The new repository must have a subset branches from the source repository and the 4-5 developers working those branches much be able to continue to work seamlessly post transition.

With Git it seems there are always at least a dozen ways to achieve the same result. Initially I spent time looking at the filter-branch command thinking that I would re-write history at the same time as splitting the repos to save space. I decided against this as space isn’t as issue (ok, the repo is nearly 700Mb so I do need to consider the size, but more on that later) and I’d rather not re-write history without a very good reason. I’m being conservative, I’ll take the lightest touch approach. What I settled on is quite obvious really, as are most things most once you have worked them out;

  • Create the new GitHub repo but do not initialize it
  • Clone the source repo locally
  • Track the origin branches I want to migrate
  • Checkout those branches and pull
  • Add the new GitHub repo as a remote
  • Push each of the branches to the new remote
  • Checkout the integration branch
  • Remove the code not required in the new repo from Integration branch, commit and push

Script like this:

#!/bin/bash

sourceRepoPath="{git hub repo url}"
sourceRepoName="{repo name}"
sinkRepoPath="{git hub repo url}"
sinkRemoteName="{anyname}"

branches="
	master
	Integration
	featureBranch1
	featureBranch2
	featureBranch3
	"

git clone $sourceRepoPath
git pull

cd $sourceRepoName

git remote add $sinkRemoteName $sinkRepoPath
git push $sinkRemoteName --set-upstream Integration

for branch in $branches
do
	git branch -t $branch $sinkRemoteName/$branch
	git checkout $branch
	git pull origin/$branch
	git push $sinkRemoteName --set-upstream $branch
done

Now the devs on the new repo can continue on their feature branches and merge from Integration branch at their leisure to remove the redundant code. With regard to the size of the repo I’m assuming that at some stage in the future I can truncate the history of the new repo to a time after every branch had the code removal commit merged and before the inception of every current branch thus enabling git to determine all common ancestors.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s