The git pull
command seems convenient, but it actually does a
bit much at once. It fetches from remote, attempts to merge into
your local stuff, and, if successful, commits. If you’ve
committed your work locally prior to the git pull
, as
recommended by the git
documentation, the resulting commits are
not as simple and digestible as one would hope. This piece,
addressing novice and intermediate level git
users, discusses
the situation in some detail and recommends a combination of git
fetch
and git rebase
.
Let’s assume you are an industrious developer who has just
prepared a contribution to some project. You’re done, all unit
tests are happy, and now you want your work to become available
on some branch on origin
, say master
, so other people can see
it.
Now what?
Other members of your team may have changed master
. At this
point in time, you don’t yet know. To find out, you can, of
course, run git pull
. If your teammates have not been working
on master
, or done only work that’s easily merge-able, you’re
fine.
In the present scenario, we assume you have completed your changes locally, but have not committed them yet. Which means: You have no backup of your own work yet.
Git is afraid that it and you with combined forces might mess up
your workspace during conflict resolution. So, if there is a
conflict with uncommitted work, the “pre-merge checks” of the
git merge
part of git pull
fail and nothing happens. No harm
has been done, but no progress has been made, either.
Let’s take a step back and survey the situation. What’s up?
You certainly don’t want to risk a conflict resolution without a
backup. So you want a commit first. This is also the approach
recommended by the git merge
documentation: It discourages
merging on top of uncommitted changes.
So, the route to take is: Commit your local stuff first. After all, this is a version system. We should be using it to our advantage.
I see no point to not heed that advice always, as a habit. You’re done with a piece of work? Commit locally first thing, before even looking what the other people have done in the meantime. I do it that way, and, in my experience and judgment, this is a good habit to have.
So, with that in our mind, let’s start our little story one more time. Again, you’ve completed a job and the tests are happy.
This time, you commit your work first thing, on top of whatever
dated material your local master
happens to hold. You first
want to safely tug your own stuff away. If you do that, you have
a commit, and that commit is there to stay. You can always come
back to it, no matter what. From this point on, your work is
safe. After that’s accomplished, you have a sound basis for
facing the merge work that’s ahead of you.
So, how do you go about that merge work? Maybe a git pull
now?
Yes, you can use git pull
now. But I argue that plain git
pull
is not what you want, once you’ve adopted the recommended “check
in first” habit.
To see why, let’s assume independent changes come in from other
members of your team. These are diligently merged by git pull
.
In many cases, the merge is done fully automatic. All is
fine. – Or is it?
That merge which git pull
has produced for you is – well, a merge.
It’s a commit with two parents.
There is nothing inherently wrong with merge commits, commits with two parents (or, though rarely seen, even more than two). But those do add a cognitive burden. For yourself and your fellow workers, it just isn’t as easy to see what your commit is up to.
When you look at it via a git UI tool, that tool has two diffs it could show you. Depending on the tool, it might decide to show you none. Again: Unless you know how to specifically ask and use that knowledge[1], your git UI is likely to not show you anything about what happened in that merge commit.
In my experience, you do yourself and your team no favor if you use merge commits for trivial cases like this one.
Concrete example: This blog post was drafted the night after we cleaned up a botched merge, which needn’t have been a merge in the first place. Incidentally, should you have experienced problems with this innoQ homepage some time between Jan 25 in the evening through Jan 26, 2016, that botched merge commit most likely was the underlying cause. Initial repair attempts didn’t cut through to the root of the problem. In the end, some five innoQ developers teamed up to fully rectify the situation. Our repair work would have started earlier and progressed faster, had the botched two-parent merge commit been a plain normal single-parent commit instead.
Assuming you happen to be the kind of person wanting to learn from other people’s troubles: Use those merge commits only for more serious branch work.[2]
So, if not git pull
, what else? Let me come back to our
original scenario one last time. Work is done, tests are happy,
now: What would I do? What do I actually do, in such a
situation?
I initially add a commit with my work to whatever dated version
of master
I’ve been working on. This is a version system, I
want to take advantage of that and make sure my work is safe. So
far, so obvious (by now).
Next thing, I might simply try git push
. If I’m lucky, nobody
else has touched master
and I’m finished. If I’m not, no harm
is done, either.
Nice try, but most of the time, that push doesn’t work. In that
case, I’d now run git fetch
.[3]
That done, I now have my own commit on my personal master
, as
well as my team-mates' results in origin/master
.
Now, I want to do merge work, but without actually producing a
merge commit: git rebase origin/master
does that trick for me.
What does that do? It grafts a copy of my commit on top of the
new stuff on origin/master
. That new commit copy has only one
single parent, namely, the previous latest commit of
origin/master
. Locally, this commit also becomes the new
checked-out HEAD
of my local master
which I’m on, with all
the work from the other team members integrated in its history.
Admittedly, this emits a certain odor, as git rebase
entails
some amount of “rewriting of history”. The new commit pretends
I’ve started my work on the basis of that previous latest commit
from origin/master
, while in fact I’ve started based on earlier
commits.
But this is only a minor amount of “rewriting of history”. As presented here, I only manipulate my own local commits, which I have not yet shared with anyone. Such limited “rewriting of history” I consider quite tolerable.[4]
In the trivial situation, when the merge work can be done
automatically, git rebase
will leave me with a version of
master
that I can test one more time and then git push
.
Should I face a merge conflict, I’ll have to resolve and commit
manually. If I manage to do that, fine, final test and git
push
again and all is well.
Should I get the merge work wrong on first try, I can back out
and try again. My original commit is still patiently waiting to
see whether it’s still needed. I just have to dig up its SHA.
To do so, I’ll just scroll up my terminal window, or else use
gitk
’s “view all refs”.
For my second attempt at the merge work, I want to reestablish my
old local master
to point to my commit. A straightforward
way is to delete it and created it anew:
Admittedly, that’s more robust than elegant. As a consequence of
the re-creation, git
does not yet know where to push the new
local master
, so I shall need the explicit
git push -u origin master
.
Fortunately, this problem is self-healing: The connection to
the remote origin
is reestablished by this explicit push
command.
But elegant or not, it works. I can retry my git rebase
origin/master
merge work as often as I require to get it right.
After the push, my colleagues will get to see a sole nice commit with a single parent, easily comprehensible. No uncalled-for cognitive burden here.
In conclusion, I want to emphasize that git
is a version system
which I, the developer, feel free to use locally as I please. I
can produce as many commits on my local master
or any other
local branches as is convenient for me. I might use frowned-upon
zero-information check-in comments such as “work in progress”.
If I so desire, I might even do a commit just to keep (via the
commit timestamp) a record of my departure time for lunch
break.[5]
None of these commits need (or should) ever become visible via
origin
. When a particular piece of work is complete, I squash
all related commits into one, using git rebase -i origin/master
(either before or after git fetch
, as I please). During that
process, I also come up with an informative check-in comment for
the whole thing. The end result is another sole shiny
well-commented single-parent commit.
So, now you have my reasons for my reservations regarding git
pull
. In contrast, I consistently find git rebase
to be my
friend.
Those of you not wanting to resist the convenience lure of git
pull
, consider taming it with one of the --rebase
options.
-
Basic command line
git diff SHA1 SHA2
works well in this situation, and I highly recommend you have it in your toolbox, including the--summary
version. ↩ -
When accepting a pull request, I like to document my review work by forcing a two–parent commit, even where a fast–forward would be possible. ↩
-
In case you care, my actual habit is to use
git fetch -p
. But the-p
is irrelevant in the present context. ↩ -
If either you and your project disagree, you'll probably have to live with double–parent merge commits and suffer the consequences. You may be able to reduce the number of such commits by taking the “feature branch” route. This helps somewhat, as long as only one person actively commits into each feature branch. ↩
-
No, even I haven't actually done that. But you see my point. ↩