Tuesday 13 September 2011

A tarsnap client script wrapper

I've been using tarsnap as my personal “cloud” backup solution for a while now and can warmly recommend it.

The tarsnap client tool (called tarsnap) feels and behaves just like you would expect of any respectable UNIX command: it has a well-written manpage; it has command-line arguments that you cannot remember, but which sort of agree with the conventions set by the elder UNIX commands. In fact, the tarsnap syntax is almost a superset of the tar command.

In other words, it's practically crying out to be used as a building block in some script; and that's exactly what I've done here.

backup.py is my convenient tarsnap wrapper (github).

The idea is that since you generally backup the same set of archives regularly, you want to define the contents for each archive somewhere.

backup.py makes this easy by looking into a single (configurable) directory and making an archive from each directory entry. If the entry is a directory, it uses that. If it's a symbolic link, it uses that. If it's a directory containing symlinks, it follows them. On top of that, you can define exclusions (archive everything in firefox-profile/ except for Cache) via the config file.

Since tarsnap has no concept of different version of backup archives, backup.py will append the current date (yyyy-mm-dd) to each archive. (This works well with tarsnap since it does deduplication, so you automatically only pay for the diffs between archives.)

Sunday 23 January 2011

Throw out the trash: clean up a Subversion working copy

Often times, one acquires a plethora of debris, detritus, flotsam and jetsam in one's SVN working copy. Mostly these are the standard build artefacts like .o files, but depending on the project it could be a variety of different things, like perhaps the forgotten remnants of silly little experiments.

One is really not very interested in being reminded of the existence of this digital rubble in the output of svn status. Therefore, there are various ways of telling Subversion to “ignore” certain files based on the pattern of their name.

That takes care of the output of svn status, but not of ls (which is still cluttered), nor your disk space. So what's a quick way of getting rid of those files?

It's easy:
svn status | grep ^? | awk '{print $2}' | xargs rm -rf
or, if you wish to also delete those files which are ignored by Subversion,
svn status --no-ignore | grep ^[?I] | awk '{print $2}' | xargs rm -rf
Note that this second version could be useful if make clean (or equivalent) is not quite as thorough as you'd like it to be.

And that's really all there is to it. But just because I like to make things complicated, I've wrapped up this idea in a little script – svnclean:

$ svnclean --help
usage: /home/carlo/bin/svnclean [-i] [-f] [-q] [-h|-?|--help]

Recursively cleans the current svn working copy starting from the current
directory.  It does this by inspecting the output of `svn status'.  Options:

    -i  Also remove files ignored by SVN (e.g. via svn:ignore properties)
    -f  Do not confirm deletion
    -q  Do not list every file before deletion, only print number of affected
        files

    -h, -?, --help  This help

$ svnclean -i
Files/directories to be deleted:
  awesomez.pyo
  tmp_experiment/
  ai.o
  ex
Continue? [yn] 

Saturday 15 January 2011

Subversion and vimdiff: a little improvement

Vim is commonly used as a diff visualisation tool for Subversion. Often, this is accomplished by putting a line like
diff-cmd = svndiff_helper
into the [helpers] section of your ~/.subversion/config file, where svndiff_helper would contain
gvimdiff -f "$6" "$7"
We pass the sixth and seventh parameters to gvimdiff because that's where svn diff passes us the names of the files to diff. Incidentally, the -f parameter is needed to prevent Vim from forking on startup, since svn would then delete the temporary files too quickly.

So what do the other parameters contain? Let's find out:

$ echo 'for a in "$@"; do echo "$a"; done' >pargs
$ chmod +x pargs
$ ls -A
aa  rand  readme  .svn
$ svn diff -r1 --diff-cmd=./pargs readme 
Index: readme
===================================================================
-u
-L
readme (.../readme) (revision 1)
-L
readme (.../branches/mybranch/readme) (working copy)
.svn/tmp/tempfile.tmp
readme

Interesting – svn gives us a nice textual description of both files in arguments three and five respectively. How about we use those to name the buffers in gvimdiff?

With this script, you can. There are a few tricks to it:
  • I use the --cmd option to Vim to make it run a command on startup.
  • Specifically, I set up autocommands (autocmd, abbreviated to au) to change both buffer names as appropriate when the BufReadPost event is triggered, which happens when the files are read.
  • I use the :file command (abbreviated to just f) to change the buffer name. Unfortunately, Vim doesn't handle tabs in the name very well, so I replace those by spaces. Those in turn I have to escape in order for the :file command to accept them.
  • Vim also gets confused if the buffer name contains a slash, so I replace those by backslashes.
One other issue you may find with the "naïve" wrapper script shown at the start is that it makes Vim not apply syntax highlighting, because the temporary filenames like tempfile.tmp do not have the appropriate extension. My script addresses that too, by temporarily creating a symlink with the right extension to the temp file and passing that to Vim.

How to see all changes on a branch in Subversion

Ever wanted to see a diff containing all the changes you made on a branch since the branch was created from trunk? This is one of those things that should be easy in Subversion but isn't (or I couldn't find an easy way).

Helpfully, "svn help log" has a hint:

Logs follow copy history by default. Use --stop-on-copy to disable this behavior, which can be useful for determining branchpoints.

I've turned that idea into a handy wrapper script around "svn diff", which adds a revision specifier for "the revision just before this branch was created". I call it BRANCHPT or BRPT for short. Otherwise, the script just passes all arguments through unchanged to "svn diff".