This is the mail archive of the mailing list for the eCos project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

hg conversion notes and summary


As per my message on ecos-discuss, here is a brief set of notes of my
conversion of anoncvs to hg.

First, since some of you may not be familiar with DRCS systems, a note
on tags.  Tags in DRCS systems reflect the state of the repository or
branch at a particular point in time.  They cover the *whole*
repository, not individual files as CVS does.  This can and does lead to
differences between what "revisions" of files are tagged in CVS and
those tagged in hg. 

As a simple example, since CVS tags are simply textual bookmarks of
specific file revisions, there is nothing stopping somebody in CVS
tagging the revisions of one set of files as <xxx_release>, making a
bunch of changes with commits to some of those files so changing the
revisions, and then tagging a second set of files as <xxx_release>. 
While you can do this in CVS, you cannot do this in hg because hg tags
apply to a single changeset only. 

Of course this is just a simple example. In actual fact there are
another two complicated instances of how our current usage of CVS
results in similar mismatches, but one is enough to prove my point. This
also happened in real life as both the ecos and ecos-net modules were
tagged at different times, with updates to files previously tagged
occurring before the second tag was made. So since tags in DRCS systems
are made against a single changeset, there is no clean correspondence
betwen CVS tags and DRCS tags, and in fact the normal conversion
processes to git/hg/bzr they do not even attempt to preserve tags.  I
have made a pretty reasonable guess, however.

Conversion Process Summary

   1. Create a copy of the CVS repository, remove modules file and munge
      directory structure in CVS repository to match that given by
      modules file.
      /Note: This step was necessary to bring in the ecos network
      packages into the main tree rather than as a separate checkout.
      Also, the //naming of a module 'ecos' to an existing directory
      'ecos' confused the heck out of cvsps, as well as some files
      having a revision tagged "ecos"./
   2. Use a modified cvsps to create a summary set of changes (i.e.
      without the actual patch changes).  This effectively creates sets
      of "atomic" checkins which can be used to match hg changesets.
         1. /patchsets are created by cvsps/
         2. /changsets are created by hg/
         3. /CVS checkins of a group of files or directories are not
            atomic - the times of the changes to files recorded by RCS
            by a single CVS commit often differ, sometimes by as much as
            12 minutes for large commits.
         4. /All of CVS diff, cvsps diff and the standard diff are
            unable to cope with certain changes to binary files and
            either crash or create patches that cause patch to crash/
         5. /Some CVS log entries were not UTF8 and came from different
            character sets.  These needed special attention (e.g. Roland
            Ca<DF>ebohm, Daniel N<E9>ri, and the best one... soft spaces
         6. /CVS locking was broken at some point as I found an instance
            of a tag being performed midway during a checkin by another
            user.  The CVS history files confirms this.  Tag was made by
            jifl, checkin by gthomas.

   3. Loop through all the patchsets using RCS to create updated or new
      files within the corresponding repository, or delete files from
      the repository, and commit according to the log patchset log

   4. Some patchsets were applicable to multiple branches.  That is,
      parts or all of some changes within the trunk or a branch
      propogated to other branches or the trunk.  Such propogations were
      restricted to direct descendants or ascendants. Thus, for every
      checkin, checks were made to ensure that changes made in one
      branch/trunk were propogated when necessary to the trunk/branch. 
      There were 198 such changes between the trunk and the branches.
      Simple example:

      PatchSet 593
      Date: 2003/02/24 14:04:35
      Author: jlarmour
      Branch: HEAD
      Tag: (none)
              * sgml/doclist: Reorder in a slightly more logical order with
              related bits grouped together.
              Add docs for power management, USB (slave, eth slave, and
              and NEC uPD8985xx drivers), and synthetic target HAL, eth and
              watchdog drivers.

              * sgml/.cvsignore: Add gifs and rename ecos.* to ecos-ref.*


         1. /The CVS history file and CVS checkouts against a date
            confirmed that revisions checked in made after a branch also
            appeared in directly related branches./
         2. /changesets could not be transferred between branches
            because only *some* of the changes in a fair number of them
            propogated to other branches.  Hence individual commits were
            made to propagate only those changes that CVS reported./
         3. /When propagating changes, some changes appear within files
            within branches *before* their actual commit on another
            branch, while other changes magically appear on other
            branches sometime *after* the checkin. That is, CVS appears
            to invent time travelin both directions/./ This is fixed in
            the conversion by only propagating the revision to the other
            branches at the same time as the original commit./
   5. Create the creation of new branches as a clone from the ancestor. 
      Ignore the "Branches" tag from cvsps as it is too unreliable.  Not
      cvsps fault - CVS is just broken. Branches were manually cloned
      instead to closesr match the code. This was done to ensure that
      the actual changes of the first commit were preserved.
         1. /The "Branches" and "Tag" labels within cvsps patchsets were
            only used as indicators when a branch or tag was made. 
         2. /The CVS history file was used as the first reference to
            determine the time of the branch
         3. /When the history file was not forthcoming (yes, it did not
            store every tag/branch, and occasionally even gave totally
            bogus information, looking like timezone bugs.) the time of
            the branch was calculated to be one second before the first
            commit to the branch./
         4. /Some files are only appear in a CVS checkout of a branch at
            their first change in the branch. These files existed when
            the branch was made, so should appear when the branch is
            checked out against a time prior to the change, but they did
         5. /Some files suddenly appear on a branch with the same
            revision as at the branchpoint //on their parent // at some
            arbitraty checkout time

   6. hg tags are after the time of the last RCS commit of all the files
      that have the tag, just before the next commit.  In our simple
      example which tags a two sets of files at different times, the tag
      is made after the "newest" (greatest) time of all files containing
      the tag, one second before the next CVS commit.
   7. Some revisions of files or patchsets are orphaned.  They do not
      have a branch and do not belong on the trunk.  These summarised below.

   8. At periodic intervals, do a full CVS checkout of all active
      branches and compare these files against a hg "checkout" at the
      same time.  If all the comparisons of CVS against hg of the
      branches matched, create a checkpoint (a clone of the hg
      If there are any differences (ignoring files that do not appear in
      cvs but appear in hg), revert to a previous checkpoint and do a
      binary chop style search to find which CVS checkin resulted in the
      change and make the corresponding hg commit to bring the hg files
      in line with CVS. This was termed "Normalising" and the hg commit
      message reflected this process.  More than occasionally (like
      merges with anoncvs) such changes occurred at every small commit. 
      In these instances, with many CVS checkins occurring as part of
      the merge, a single "Normalisation" was done after the multiple
      CVS commits.

   9. At the end of the conversion process, do a full current checkout
      of both CVS and hg repos and compare.  All files matched.  However
      the flash_v2 branch which had additional files in hg due to
      normalisation.  These were removed in one final commit:

Orphaned changes
PatchSet 582
Date: 2003/02/21 09:09:49
Author: bartv
Merge from trunk - tweak CDL testcase definitions to refer to the
executables rather than the source


Odd usage notes

Finally, I just imported the hg ecos repository into git, for the hell
of it, just to see, and it worked very smoothly.  However, I was
surprised to see that the repository was around 10% bigger than hg, upon
which I was informed about git-pack.  It seems busy git repositories do
need regular manual maintenance to stay efficient and small.  And as a
sub-note, when manually messing with changesets between hg and git, I
was disappointed to find that git requires the SHA-1 to refer to a
changeset while hg allows you to use both SHA-1 and a local id meaning I
ended up with a lot less cut n pastes with hg than I had to do with git.

My preference is obvious: hg  Hence my continual lobbying :-) However,
more practically, my choice is mainly from usage and usability
experience. I would encourage you to use both hg and git on different
host platforms before making a decision so you can make your own mind
up. Regular git users should also try to be a bit more open minded and
remember that not every eCos developer uses linux and is a command-line
expert.  Neither eCosCentric nor myself have any commercial interest in
any of the DRCS options, nor do we stand to gain or lose commercially if
you choose one over the other. In particular I do not want hg to be
disadvantaged just because eCosCentric and myself have recommended it. I
am also sure the community would appreciate seeing a summary of how and
why you reached your decision if you decide to hold the decision process
behind closed doors.

Finally, if you have any questions, concerns or would like some help
setting up your own hg repository, including setting up push privileges
over https for maintainers (assuming you don't want to be exclusive
ssh), automated email sent when changes are pushed as well as automatic
updates of checked-out versions (e.g. web pages), I have done all of
these and will happily share my experiences or set it up for you on
sourceware.  It is pretty much a no brainer - all you need is a web
server (though not even that) with suitable local privileges.  OTOH all
of this is well documented and each option only involves a couple of
lines in your hgrc file, so you should be easily able to set it up yourself.

-- Alex Schuilenburg

Managing Director/CEO                                eCosCentric Limited

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]