This is the mail archive of the
mailing list for the eCos project.
hg conversion notes and summary
- From: Alex Schuilenburg <alexs at ecoscentric dot com>
- To: ecos-maintainers at ecos dot sourceware dot org
- Date: Mon, 12 Oct 2009 13:33:53 +0100
- Subject: hg conversion notes and summary
As per my message on ecos-discuss, here is a brief set of notes of my
conversion of anoncvs to hg.
First, since some of you may not be familiar with DRCS systems, a note
on tags. Tags in DRCS systems reflect the state of the repository or
branch at a particular point in time. They cover the *whole*
repository, not individual files as CVS does. This can and does lead to
differences between what "revisions" of files are tagged in CVS and
those tagged in hg.
As a simple example, since CVS tags are simply textual bookmarks of
specific file revisions, there is nothing stopping somebody in CVS
tagging the revisions of one set of files as <xxx_release>, making a
bunch of changes with commits to some of those files so changing the
revisions, and then tagging a second set of files as <xxx_release>.
While you can do this in CVS, you cannot do this in hg because hg tags
apply to a single changeset only.
Of course this is just a simple example. In actual fact there are
another two complicated instances of how our current usage of CVS
results in similar mismatches, but one is enough to prove my point. This
also happened in real life as both the ecos and ecos-net modules were
tagged at different times, with updates to files previously tagged
occurring before the second tag was made. So since tags in DRCS systems
are made against a single changeset, there is no clean correspondence
betwen CVS tags and DRCS tags, and in fact the normal conversion
processes to git/hg/bzr they do not even attempt to preserve tags. I
have made a pretty reasonable guess, however.
Conversion Process Summary
1. Create a copy of the CVS repository, remove modules file and munge
directory structure in CVS repository to match that given by
/Note: This step was necessary to bring in the ecos network
packages into the main tree rather than as a separate checkout.
Also, the //naming of a module 'ecos' to an existing directory
'ecos' confused the heck out of cvsps, as well as some files
having a revision tagged "ecos"./
2. Use a modified cvsps to create a summary set of changes (i.e.
without the actual patch changes). This effectively creates sets
of "atomic" checkins which can be used to match hg changesets.
1. /patchsets are created by cvsps/
2. /changsets are created by hg/
3. /CVS checkins of a group of files or directories are not
atomic - the times of the changes to files recorded by RCS
by a single CVS commit often differ, sometimes by as much as
12 minutes for large commits.
4. /All of CVS diff, cvsps diff and the standard diff are
unable to cope with certain changes to binary files and
either crash or create patches that cause patch to crash/
5. /Some CVS log entries were not UTF8 and came from different
character sets. These needed special attention (e.g. Roland
Ca<DF>ebohm, Daniel N<E9>ri, and the best one... soft spaces
6. /CVS locking was broken at some point as I found an instance
of a tag being performed midway during a checkin by another
user. The CVS history files confirms this. Tag was made by
jifl, checkin by gthomas.
3. Loop through all the patchsets using RCS to create updated or new
files within the corresponding repository, or delete files from
the repository, and commit according to the log patchset log
4. Some patchsets were applicable to multiple branches. That is,
parts or all of some changes within the trunk or a branch
propogated to other branches or the trunk. Such propogations were
restricted to direct descendants or ascendants. Thus, for every
checkin, checks were made to ensure that changes made in one
branch/trunk were propogated when necessary to the trunk/branch.
There were 198 such changes between the trunk and the branches.
Date: 2003/02/24 14:04:35
* sgml/doclist: Reorder in a slightly more logical order with
related bits grouped together.
Add docs for power management, USB (slave, eth slave, and
and NEC uPD8985xx drivers), and synthetic target HAL, eth and
* sgml/.cvsignore: Add gifs and rename ecos.* to ecos-ref.*
1. /The CVS history file and CVS checkouts against a date
confirmed that revisions checked in made after a branch also
appeared in directly related branches./
2. /changesets could not be transferred between branches
because only *some* of the changes in a fair number of them
propogated to other branches. Hence individual commits were
made to propagate only those changes that CVS reported./
3. /When propagating changes, some changes appear within files
within branches *before* their actual commit on another
branch, while other changes magically appear on other
branches sometime *after* the checkin. That is, CVS appears
to invent time travelin both directions/./ This is fixed in
the conversion by only propagating the revision to the other
branches at the same time as the original commit./
5. Create the creation of new branches as a clone from the ancestor.
Ignore the "Branches" tag from cvsps as it is too unreliable. Not
cvsps fault - CVS is just broken. Branches were manually cloned
instead to closesr match the code. This was done to ensure that
the actual changes of the first commit were preserved.
1. /The "Branches" and "Tag" labels within cvsps patchsets were
only used as indicators when a branch or tag was made.
2. /The CVS history file was used as the first reference to
determine the time of the branch
3. /When the history file was not forthcoming (yes, it did not
store every tag/branch, and occasionally even gave totally
bogus information, looking like timezone bugs.) the time of
the branch was calculated to be one second before the first
commit to the branch./
4. /Some files are only appear in a CVS checkout of a branch at
their first change in the branch. These files existed when
the branch was made, so should appear when the branch is
checked out against a time prior to the change, but they did
5. /Some files suddenly appear on a branch with the same
revision as at the branchpoint //on their parent // at some
arbitraty checkout time
6. hg tags are after the time of the last RCS commit of all the files
that have the tag, just before the next commit. In our simple
example which tags a two sets of files at different times, the tag
is made after the "newest" (greatest) time of all files containing
the tag, one second before the next CVS commit.
7. Some revisions of files or patchsets are orphaned. They do not
have a branch and do not belong on the trunk. These summarised below.
8. At periodic intervals, do a full CVS checkout of all active
branches and compare these files against a hg "checkout" at the
same time. If all the comparisons of CVS against hg of the
branches matched, create a checkpoint (a clone of the hg
If there are any differences (ignoring files that do not appear in
cvs but appear in hg), revert to a previous checkpoint and do a
binary chop style search to find which CVS checkin resulted in the
change and make the corresponding hg commit to bring the hg files
in line with CVS. This was termed "Normalising" and the hg commit
message reflected this process. More than occasionally (like
merges with anoncvs) such changes occurred at every small commit.
In these instances, with many CVS checkins occurring as part of
the merge, a single "Normalisation" was done after the multiple
9. At the end of the conversion process, do a full current checkout
of both CVS and hg repos and compare. All files matched. However
the flash_v2 branch which had additional files in hg due to
normalisation. These were removed in one final commit:
Date: 2003/02/21 09:09:49
Merge from trunk - tweak CDL testcase definitions to refer to the
executables rather than the source
Odd usage notes
Finally, I just imported the hg ecos repository into git, for the hell
of it, just to see, and it worked very smoothly. However, I was
surprised to see that the repository was around 10% bigger than hg, upon
which I was informed about git-pack. It seems busy git repositories do
need regular manual maintenance to stay efficient and small. And as a
sub-note, when manually messing with changesets between hg and git, I
was disappointed to find that git requires the SHA-1 to refer to a
changeset while hg allows you to use both SHA-1 and a local id meaning I
ended up with a lot less cut n pastes with hg than I had to do with git.
My preference is obvious: hg Hence my continual lobbying :-) However,
more practically, my choice is mainly from usage and usability
experience. I would encourage you to use both hg and git on different
host platforms before making a decision so you can make your own mind
up. Regular git users should also try to be a bit more open minded and
remember that not every eCos developer uses linux and is a command-line
expert. Neither eCosCentric nor myself have any commercial interest in
any of the DRCS options, nor do we stand to gain or lose commercially if
you choose one over the other. In particular I do not want hg to be
disadvantaged just because eCosCentric and myself have recommended it. I
am also sure the community would appreciate seeing a summary of how and
why you reached your decision if you decide to hold the decision process
behind closed doors.
Finally, if you have any questions, concerns or would like some help
setting up your own hg repository, including setting up push privileges
over https for maintainers (assuming you don't want to be exclusive
ssh), automated email sent when changes are pushed as well as automatic
updates of checked-out versions (e.g. web pages), I have done all of
these and will happily share my experiences or set it up for you on
sourceware. It is pretty much a no brainer - all you need is a web
server (though not even that) with suitable local privileges. OTOH all
of this is well documented and each option only involves a couple of
lines in your hgrc file, so you should be easily able to set it up yourself.
-- Alex Schuilenburg
Managing Director/CEO eCosCentric Limited