Archive for October, 2009

GIT-op 3: Specifying revisions

October 17th, 2009

Now for the great time-saver of this series of tutes.  You’re going to spend a lot of time specifying revisions when working with GIT, but it does a great job of easing this for you.

The way most people get revision specifiers is by using git-log to find the commit they’re looking for and cut-and-pasting the SHA1 hash associated with it.  This hash is the basis of all git specification methods.

Note that the examples I give here relate to a particular command but will work with pretty much any of them.  Most GIT commands pipe their given specifiers through git-rev-list to get the commit IDs out and the docco for that command is a good place to get more info on what I’m going to say.

Tags

The most common form of easy revision specification is tagging.  Tag a version simply by

git tag -a tag_name [SHA1 hash of target commit]

or just

git tag -a tag_name

if you’ve currently got the target commit checked out.

You can use -u <key-id> instead of -a to sign the tag and -v to verify a signed tag.  Once you’ve got a revision tagged, that tag name simply acts as a synonym for the SHA1 has it’s linked to.

Relative

Relative specification is useful if you’re working with commits very close to HEAD.  Drop back 1 or 2 revisions with

git log HEAD^

or

git log HEAD^^

When going further back this can of course be a pain so instead of

git log HEAD^^^^^^^

you can use

git log HEAD~7

to go back 7 commits at once.

Time-based

Now GIT starts to get cool.  It keeps a track of at what your tree looked like at different points in time and you can access this information in a jiff.  Say you’ve just pulled in a bunch of changes and want to look at everything which wasn’t there when you left work; git does a pretty good job of parsing English-like time specifications so you can simply do

git log @{6pm.yesterday}

How about what was on some other branch the day before yesterday but not in your branch now

gitk other_branch@{day.before.yesterday} ^HEAD

Note that a carat before a revision name is logical not, a carat after is one commit before.

As you can see, the @{} specification is very powerful, allowing you to get the status of a tree at a particular time either in normal date specifications or more English-like phrasing.  This is not, however, the same thing as the commits since a particular time.  For example, if you pull in a big old bunch of commits, @{5.minutes.ago} will point to the tree before the pull, not a tree with all but the commits timestamped in the last 5 minutes.  For this you need the –since specifier

gitk --since="5th October"

Other neat specifiers

There are other things you can use to select commits too.

gitk --author="Ben Nizette" --committer="Haavard Skinnemoen"

will show all the patches I wrote which went through Haavard.

gitk --since="yesterday" --grep="gpio"

Will show all patches in the last day which claimed to have something to do with gpio.  We can limit this further

gitk --since="yesterday" drivers/gpio/

will simply show the last day’s commits which touched something in the drivers/gpio/ directory.

gitk --extended-regexp="^foo.*[0-9]*$"

Well that does pretty much as you’d expect really.

gitk --author="Linus Torvalds" --no-merges

Shows you how much actual coding (non-merge) work Linus actually does.

The dots

There can be some confusion when using GIT as to what a series of dots actually does.  2 dots (“..”) is a range specifier

git log HEAD~5..HEAD~2

shows all commits between the 5th and 2nd newest.  It’s a shorthand for

git log HEAD~2 ^HEAD~5

3 dots (“…”) is a symmetric difference, that is

git log HEAD~5...HEAD~2

will show all commits that are in HEAD~5 or HEAD~2 but not both.  In the case where they’re just different points on the same branch 2 and 3 dots do the same thing, but suppose you’re about to merge a pair of branches and you want to see what might conflict,

gitk merge_1...merge_2

shows the commits which are in one branch or the other but not both – i.e. it just lists the commits which could possibly conflict.

You can use this during a merge as well to narrow down which commits introduced the conflict, read their changelog and hopefully get a better understanding of the correct conflict resolution.  Say you’re out at sea with a conflict in complex/subsystem.c; simply

gitk merge_1...merge_2 complex/subsystem.c

To view all patches which could possibly cause that conflict.

Anyway, that’s a quick overview of the specifiers that make my life easier.  You can refer to “man git-rev-list” for other cool options to try out too.

Posted in Uncategorized | Comments (0)

GIT-op 2: Rebase

October 15th, 2009

Rebase is generally used to move a set of changes started against one base to another. 99% of the time, you’re developing something against an upstream branch, that branch moves and you want to get the latest and greatest. You can either merge upstream with you and loose your clean history, or rebase. Rebase has grown to be a fairly general way to change history though; you can use it to reorder, change and drop commits and move your development not just further up the same branch but across to somewhere else entirely.

Before I get going, please have a read of the last entry which has some tips about working with rebase; the biggest of these is simply Keep Rebased Trees Private.

GIT tree storage

A lot of rebase semantics make more sense if you realize that a GIT commit ID doesn’t really refer to a commit but rather to the state of your source tree immediately after that commit.  Many people expect

git diff abcdef1234

to give a unified diff of the changes introduced by that commit, but that commit ID, as above, doesn’t represent a commit as such but a tree.  What’s the correct output when you ask for the diff of a tree?  It just doesn’t make sense.

Branch names really just represent a specific commit ID, the ID of the most recent commit on that branch.  A branch name has no internal concept of where it’s currently based; this is one of the most common errors when rebasing – telling GIT the branch to operate on is based on the wrong commit.

Rebase

As I mentioned above, the most common use of rebase is simply to shift your commits further up an upstream branch.  If you’re following my advice about tracking upstream in the “master” branch and doing work somewhere else (call it “my_branch”) then this is accomplished by

git rebase master my_branch

or just

git rebase master

if you’ve got my_branch checked out.  Because GIT only deals with commits, you have to have committed current work before you run.

But remember branch names are really just commit IDs, so this is read by GIT as “take all the commits in my_branch which aren’t in master and move them to the current HEAD of master”; or else “squish all the master history in before any my_branch history”.

Rebase interactive

This was described in the previous entry.  By appending the –interactive flag to the rebase (but otherwise using the same syntax) you get an opportunity to change the commits in my_branch as they get moved to the destination.  Note that the destination head can actually be exactly where my_branch is already based.  While this makes the rebase a no-op usually, in interactive mode you can use it to change commit history at any time.

onto

The –onto switch allows you to move a set of commits from one branch of your repo to somewhere completely different.  Say you’re developing feature_2 which is based on feature_1 but feature_1 is not yet in the upstream branch.  After a bit of a brainwave you realize feature_2 can be rearranged to not have that dependency and is ready to move upstream by itself.  Do

git rebase --onto master feature_1 feature_2

This is read by GIT as “take all the history from feature_2 which isn’t in feature_1 and stick it on the end of master”.  Because the –onto switch has to come before the other 2 branch names the syntax seems backwards; just remember that the last 2 branch IDs are the same no matter what version of the rebase command you’re using (just “base” and “head” branches respectively).

Another feature of –onto rebasing is an alternative way to drop commits from the middle of history.  Suppose you’re on branch “my_branch” and the last 5 commits have names “commit1″ to “commit5″ (I’ll be writing an entry soon on the best ways to specify actual revisions).  Then

git rebase --onto commit2 commit4 my_branch

Will take the stretch of commits between commit4 and the branch head and move them to be based at commit2 – i.e. you’ve just dropped commit3 from your tree.  (note the sloppy naming here, as I’ve been at pains to point out commit names identify trees, not commits, so actually calling them commitN isn’t ideal :-) )

When all goes wrong

During a rebase, you may of course get a number of conflicts between your code and the tree on to which you’re trying to move it.  All such conflicts will be marked with standard merge markers (“<<<< >>>>”) which you can grep for, use git diff to see or just read the error log.  Once you’ve got rid of all these sites, mark them as fixed by

git add my-fixed-file.c

but instead of committing, run

git rebase --continue

If you’re stuck and want to get out of there, run

git rebase --abort

to undo all rebasing actions.  Finally, if a commit is causing conflicts because it’s no longer needed,

git rebase --skip

will skip that particular commit.  Note that this will loose that commit, be careful!

Finally, git rebase is smart enough to recognize commits which introduce the same changes as each other but have different descriptions and skip them.  That is, if you’ve got a commit in the tree you’re rebasing but has already been accepted in to the new base, git will automatically skip that patch.

Cool, huh?!  That’s it for now, next time I’ll show you the fastest and coolest ways to specify the commits you care about.

Posted in Uncategorized | Comments (0)

GIT-op 1; basics

October 13th, 2009

Version control seems to be one of my favorite subjects for blogging; here’s a slightly new thing – a tutorial style thing showing off the coolest bits of GIT I’ve come across.  This is part 1, offering tips for easy GIT usage especially for people coming from a dissimilar VCS.

Tip 1: Commit Often.  Really Often.

Git offers a great number of features, but only for commits.  Unlike other VCSs though, a commit isn’t final; you can still play with the ordering, combine commits, drop them, move them however you see fit.  There is no down-side to committing often, if you end up with a big commit mess, use interactive rebasing to organise it all again.

Interactive Rebase

Suppose you’re developing your stuff on a branch called my_branch which is based on the master branch shared with others.  You’ve taken tip 1 and you’ve ended up with a big mess of commits with no logical ordering or cohesion.  Interactive rebasing allows you to reorder, combine and drop commits.
git rebase -i master my_branch
Will bring up $EDITOR with a list of all commits between master and my_branch.  If you change the order of commits in this list, it will change the order of the commits in your branch.  Replace the “pick” keyword with “s” to squish a commit in to the previous one.  Delete a commit from that list and it will be deleted from your branch.  Save and exit and GIT takes care of the rest.

In this way you can organise, say, a new feature in to a series of logically separated actions for easy review by your peers.

Tip 2: Pro: Rebase changes history.  Con: Rebase changes history.

OK so rebasing is great.  You can move your changes on top of a newer base or, like above, you can reorganize your commits to be more useful.  Be very very careful though as you must never rebase something that someone else has access to.

Rebasing changes history but it doesn’t leave a marker anywhere saying what it has done.  If someone has pulled your branch and you rebase it, their version won’t be updated to reflect this.  They will have to remove their copy of your branch and pull it again from scratch.

Tip 3: Branches are cheap; use them for everything

Unlike some other VCSs, branching is very easy to do and merging back is just as easy.  Separating your working tree in to different branches for each feature for example can allow you to test each feature on a stable base.  You can leave the “master” branch for tracking upstream and working in private branches that others aren’t going to be annoyed if you break.

In particular, if you’re starting out with GIT and aren’t sure how to do something, branch off and try it there.  If it works, merge it back, if it fails kill the branch and start again.

Like a lot of things though, use this tip with care.  If you end up with a massive branch, getting it merged back in to the mainline gets harder and harder.  Both because there’s more likely to be conflict and also because the final version is going to be hard to review.  In short: Big branches eventually have the same problems as any patch bombs; be careful.

Posted in Uncategorized | Comments (0)