A computer lockup eating a document I'd been working on started me thinking about using version control for anything you're writing that takes 'document' form, whether it be plain text files, wordprocessor documents, or program source files.
While I was in high school, the computers I used would eat my documents with alarming regularity. I'd spent many a night trying to re-create a two-thousand word essay from my outlines: many times the second version wasn't as good because the first version was very well done, and that it was late and I was tired. If I'd had version control, I may have been able to get at least a few nights more sleep.
Version control also called revision control is a system for storing a history of changes to a particular set of files. The collection of files and their history is commonly called the 'repository'. Old versions can be retrieved, and changes to the files (called 'committing') are tracked with revision numbers and timestamps. The right version control system can also let multiple users edit a set of files, but this article will not consider the practice of doing so, instead choosing to focus on the single user.
So, why version control? Let's look at the 'alternative' methods people use:
No versioning whatsoever. Lose the single copy of the file, and you're done. Not possible to recover earlier versions. Obviously not the best choice
- Successively-dated files
This strategy is having a bunch of files with the same base name, but appending some random string at the end to make them unique and give some sense of order. You'll see it done poorly in some websites or web applications: The file foo.html will have files foo.html.old and foo.html.backup: which witch is which?. Usually, people who have been doing it for a while will be more clever and consistent: turning foo.doc into foo-1.doc or foo-20070321.doc
This method is poor for two reasons: it leaves many files around (most of which end up being useless) and you will eventually run into problems with the naming. You may not be dating the files and you need to find a file which was modified on a specific date: you'll have to open many, many files to find a specific one. If you're only putting a datestamp on the files (no time) what happens when you save two files the same day? You'll need to expand your naming scheme.
It also breaks down with multiple files. If you have five distinct files being maintained over the long term, each will have multiple 'backup copies' lying around. If you've made a hundred edits on each of those files, how many files do you have lying around the disk?
This solution also makes the entire collection of files vulnerable to a disk crash, or a theft of the computer storing the files. It's especially bad for laptops, which are far easier to steal and far greater targets for theft.
Email it to yourself
While handy for moving files around, I've seen people that will keep versions of their files as attachments to mails they've sent to themselves. If handled appropriately (filtering and all that) it's easy to keep emailed files separate from your regular mailstream. But it still has drawbacks
It doesn't happen often, but email gets lost. Possibly with greater frequency than on many other types of remote services. Of course, this will vary with the email service. If it's a POP3 mailbox the files are still on your local machine, and odds are that's your only copy. Your ISP's email service is probably more reliable than most, but then you're vulnerable to losing access to the files by changing ISPs. Email providers like hotmail -- who will delete your email after 30 days of inactivity -- are a bad choice. There are other providers who will do this so be sure to read the terms of service carefully
The quality of ability to search on a particular file will also vary. Of all the free email providers, Gmail has the best searching and sorting tools
Both of these cases share a common problem: they take up mind space: keeping the files organized and dated requires mental effort. If you use version control, the only thing you need to remember to do is to commit the file after you're done a session of edits, and to update it before editing it if you've edited it somewhere else. Not too bad. Everything else is handled: date and timestamping, and you can get access to old files without needing to have them clutter up the disk. If your version control is on a separate machine somewhere on the web, you also have the advantage that the files can be under the protection of backup routines there. If not, be sure to back up the file repository regularly. You do regular backups, right?
Other Resources that will help you on your way:
- Single-User Subversion: an OnLAMP guide to using subversion by yourself. Has several points which are applicable to other version control software
- Version control for non-programmers
- Making the Jump to Subversion:Mac-centric, but still useful.
- History of software configuration management: an interesting read for those so interested
This is for those of you that haven't upgraded to wordpress 2.1 yet, and are comfortable following instructions using a command line.
It also assumes you have command-line access to your server. But that's a story for another day
Rather than go through the traditional rigamarole with updating wordpress (2.0.9 was released very recently) I decided to try a different approach. What made this possible was that there were no database changes so no upgrade script was needed.
When working with web-based systems, frequently it is a pain to know what version of files you have deployed on a production server. Here is my method of deploying, managing, and updating web-based systems on a production box. It's better than using svn export to update directly from a repository because if used correctly it doesn't require manual copying of unversioned system data to a new checkout, since svn calculates the differences between the tags you switch between.
First, the repository files must be structured correctly. This means that you should have the folders:
/trunk /tags (or /releases) /branches
For reference, see the subversion book section on choosing a repository layout.
If on your web server, you've got a directory /project that contains the contents of the repository URL "/releases/release-a" and you want to switch to using the files in the repository release /releases/release-b, all you have to do is use svn switch $REPOS_URL/releases/release-b /path/to/project. Subversion will then calculate the differences between the two release tags and apply them to your working copy. This keeps all nonversioned files intact, leaving you with a deployed system in the same state. You might want to make the appropriate changes to maintain database settings if you're working on an app that accesses mysql.
You may wish to configure the web server to ignore '.svn' directories, if only for paranoia's sake. To do this, you can use the code:
<files ~ "^\.svn"> Order allow,deny Deny from all </files>
in the appropriate place(s).
When switching, be sure to check for conflicts in the event that someone has made an unexpected change to one of the deployed files (which is not an uncommon occurence in my experience).