[This is a preliminary description of the idea based on notes I made on
1999-06-24. If anyone knows of other work along these lines, or wants to
participate in the development of this, let me know.]
CVFS is a redirecting Linux file system that doesn't manage physical storage
itself, but rather is an indirection layer to reinterpret the (specially
organized) contents of another directory as a repository that is hierarchical
both on the structure (directories and files) and on lineage/history/derivation
(branches and versions).
It may be possible once CVFS is created to create an emulation layer that
appears to users as traditional CVS, although I think the CVFS interface itself
would be superior.
Since CVFS allows users to have limbo files at any leaf node, it is possible
that the user might start a branch and then take another tack, leaving the
limbo files to become stale. Facilities for the detection of stale limbo files
and the creation of a report or (semi-)automatic reaping/purging would be very
useful.
Analogies with file system structure-oriented commands:
Structural and historical analogies.
| Structural |
Historical |
| Command |
Description |
Command |
Description |
| cd |
Change Directory (default = home) |
cv |
Change Version (default = leaf or next fork) |
| ls |
List (Files) |
lv |
List Versions |
| pwd |
Print Working Directory |
pwv |
Print Working Version |
| mkdir |
Make Directory |
mkver |
Make Version (branch) |
| rmdir |
Remove Directory |
rmver |
Remove Version (prune branch) |
CVFS handles the standard file operations specially. For example,
touch, cp, rm, and others are all intercepted to
modify the user's limbo area rather than operate directly on the versioned
files. This is transparent to the user. rm still appears to remove the
file, in that ls will not show it. But, the changes don't make it to
the repository until they are committed. Once a file is in the repository, it
cannot be removed without an explicit pruning operation.
CVFS only allows the modification of directories or files at leaf versions. So,
a branch must be created first before attempting to make any changes at an
interior node.
In software development, usually the derived files (object files and others)
are not managed by the version management system. Instead, each user has a
sandbox which contains these files, and they are not committed to the
repository. In CVFS these are limbo files, and as other changes are made to the
parent directory and committed (such as the addition of another source code
file), the limbo files are promoted to the new version, since it is not
permitted to have limbo files at any version other than a leaf. This retains
the usual semantics of a sandbox. In fact, CVFS limbo files are precisely
equivalent to the ability for each user to have his own sandboxes for any
branches of interest. Although, CVFS permits only one sandbox per user
per branch, but in practice this is probably acceptable, since if one were to
create such a situation in CVS or another version control system, it would be
in anticipation of branching (handled differently in CVFS), or just plain
redundant (dangerous).
A project repository can be mounted in a single well-known location that all
users access by the same name. So, instead of having sandboxes in home
directories or elsewhere, everyone uses the same place. CVFS takes care to be
sure that each user's view of the structure is according to their current
working version, and shows the limbo files associated with their user id. This
again simplifies the user's view to the point where it works almost exactly
like any other shared directory, except that another user cannot see his limbo
files.
The mount command for CVFS gives the repository location instead of a device
location. The repository can be supported by (almost) any other type of
underlying file system, and this is transparent to CVFS, because it accesses
its base storage like a user program.
[[ TODO: Talk more about branching and merging and versioning and labeling. ]]
lv -lR lists along the version hierarchy, showing presence of limbo
files.
lv shows files, noting which are not in the repository, changed
vs. the repository, or up to date with the repository.
The gv (Get Version) command is used to update the user's view to the
repository's contents. It is only relevant for limbo files, since other files
are always seen as their current contents. This is finer granularity than with
CVS because an explicit cvs upd must be run to pull copies of
the latest modifications into the user's sandbox. But with CVFS, the latest
(committed) modifications to any file are immediately visible to the user,
unless the user has a limbo file, in which case the files must be merged
explicitly.
[[ TODO: This may not always be a good thing. Another way to look at it is that
if a person is working at the leaf and another person commits work, the other
person has now created a new leaf, and turned the current version of the other
user into an interior node. In this case, perhaps it should be considered an
automatic (probably temporary) branch, so that the rule about limbo files only
at leaves an be maintained. Then, we would want a quick and easy way for the
merge to happen. For example, when the branch is automatic, the next Get
Version (gv) command automatically does a merge with leaf version of
the parent line, and automatically prunes the temporary branch. Again, this is
probably (at least mostly) hidden from the user, although the current version
may reflect the change to a temporary branch... ]]
[[ TODO: gv is already used for ghostscript, and brings up the X
windows viewing interface. Pick better names/mnemonics for Get Version and Put
Version. ]]
Structural changes (adding a file or directory, moving or renaming, etc.) cause
the parent directory to be out of date, and so require a commit on the parent
directory to put them into effect.
cp is intercepted and done as a copy-on-write such that at first, we
just remember it as a link to a particular version of the source. The source
can change many times, but the copied (pseudo-linked) file still points at the
version it came from, still not taking up extra room. When it is changed, then
the changes are logged relative to the original.
By having the repository be a specially structured regular filesystem entity,
traditional backup and restore techniques can be applied to it.
Below is an example project structure:
/cvfs/project/
README
client/
Makefile
client.c
main.c
protocol.h
server/
Makefile
main.c
server.c
test/
Makefile
Below is how it could appear in the repository:
/var/cvfs/repository/
project/
1.1/
README/
1.1
LIMBO/ # Non-committed user modifications to file
gregor
scott
client/
1.1/
Makefile/
1.1
client.c/
1.1
main.c/
1.1
LIMBO/ # Non-committed user modifications to directory
client
client.o
main.o
protocol.h/
1.1
server/
1.1/
Makefile/
1.1
main.c/
1.1
server.c/
1.1
LIMBO/
gregor/
main.o
server
server.o
test/
1.1/
Makefile/
1.1
LIMBO/
gregor/
INSTALL
doc/
design.txt
[[ TODO: Do we need to have IDs for the files and directories, so that if one
is removed and a node is added with the same name later, they will be distinct?
How to handle tags and comments? ]]
An enhanced ls could know it is looking at a CVFS area and show flags
for update, etc. along the lines of CVS' status command.
chown and chmod and friends cause limbification.
Read-only access doesn't cause the creation of ghosts in limbo.
CVS has its own diff command. It would be nice if CVFS allowed syntax like:
diff foo.c,1.7 foo.c,1.6 when there is a file foo.c in the
current directory having versions 1.7 and 1.6. The idea is
that foo.c would show up in ls' output, and the versions would not.
But, if a request is made to access a file with the comma in it, then CVFS will
allow read access to the particular version after the comma (can this be
done?). Another angle would be: diff foo.c,= foo.c,-1, which would
compare the current version (not the limbo file, if any, since it can be
accessed with an unqualified name) with the one prior. To compare the limbo
file with the most recent version: diff foo.c foo.c,=.
Consider ls foo.c,. Perhaps this could allow us to view the list of
versions.
[[ TODO: Should we use a different character than the comma? How about the
caret? Whatever we choose, CVFS should refuse to create files having that
character in their name. ]]
ls .,1.6.2
more .,1.6.2/foo.c
more foo.c,1.6.2
Can we do something appropriate for the $Id: $ facility
of CVS and RCS? Can we also show presence of limbo file and user?
We will probably want to cache the results of diffs applied to files, with some
expiry policy. Factor in: size, cost to produce, and access history. If a
cached result has only been accessed recently to construct a further result,
then trash the earlier and cache the derived.
[[ TODO: File vs. dir vs. project versions? Numbers vs.
labels? Binary vs. text files -- better not screw with them. Binary
diff. If diffs are too big, just store a copy of new (we can always re-diff if
someone wants to know the diffs). ]]
Dimensional File System: Structure dimension (traditional), Historical
dimension (versions).
Given the version root and a leaf, we have defined a time line, and so we can
view things as-of a particular time.
When you do an rm -rf on a directory that is in CVFS and not in limbo,
you have created a new version of the parent directory, and must do a Put
Version on it to commit it. Put Version should warn for some actions. We should
have the ability to resurrect it later if we need to:
cd foo # my directory
rm -r bar # kill bar
pv . # I mean it
cp -r .,1.7.3/bar . # resurrect it (remember: copy-on-write)
pv . # I mean it
gv -f . # remove all limbo files and revert to versioned files
rv -r ,.3 # remove versions past subversion 3 of this (prune the branch)
rv -r , # remove this version
More on how to revert a file: gv normally merges changes made by
others, but leaves file in limbo if it was in limbo before. gv -f
removes limbo file(s) so that only the versioned file is visible.
Unrecoverable. rm of a file/dir outdates its parent directory.
gv of the directory will restore it. Looking at ,= lets you
still see it.