I am the blog of Hal Fire, and I bring you…

… interesting tidbits of release engineering.

Inter Repository Operations

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Mozilla, like most operations, has the Repositories of Record (RoR) set to only allow “fast forward” updates when new code is landed. In order to fast forward merge, the tip of the destination repository (RoR) must be an ancestor of the commit being pushed from the source repository. In the discussion below, it will be useful to say if a repository is “ahead”, “behind”, or “equal” to another. These states are defined as:

  • If the tip of the two repositories are the same reference, then the two repositories are said to be equal (‘e‘ in table below)
  • Else if the tip of the upstream repository is a ancestor of the tip of the destination repository, the upstream is defined to be behind (‘B‘ in table below) the source repository
  • Otherwise, the upstream repository is ahead (‘A‘ in table below) of the source repository.

Landing a change in the normal (2 repository case: RoR and lander’s repository), the process is logically (assuming no network issues):

  1. Make sure lander’s repository is equivalent to RoR (start with equality)

  2. Apply the changes (RoR is now “Behind” the local repository)

  3. Push the changes to the RoR
    • if the push succeeds, then stop. (equality restored)

    • if the push fails, simultaneous landings were being attempted, and you lost the race.

      When simultaneous landings are attempted, only one will succeed, and the others will need to repeat the landing attempt. The RoR is now “Ahead” of the local repository, and the new upstream changes will need to be incorporated, logically as:

      1. Remove the local changes (“patch -R”, “git stash”, “hg import”, etc.).
      2. Pull the changes from RoR (will apply cleanly, equality restored)
      3. Continue from step 2 above

When an authorized committer wants to land a change set on an hg RoR from git, there are three repositories involved. These are the RoR, the git repository the lander is working in, and internal hggit used for translation. The sections below describe how this affects the normal case above.

Land from git – Happy Path

On the happy path (no commit collisions, no network issues), the steps are identical to the normal path above. The git commands executed by the lander are set by the tool chain to perform any additional operations needed.

Land from git – Commit Collision

Occasionally, multiple people will try to land commits simultaneously, and a commit collision will occur (steps 3a, 3b, & 3c above). As long as the collision is noticed and dealt with before addition changes are committed to the git repository, the tooling will unapply the change to the internal hggit repository.

Land from git – Sad Path

In real life, network connections fail, power outages occur, and other gremlins create the need to deal with “sad paths”. The following sections are only needed when we’re neither on the happy path nor experiencing a normal commit collision.

Because these cases cover every possible case of disaster recovery, it can appear more complex than it is. While there are multiple (6) different sad paths, only one will be in play for a given repository. And the maximum number of operations to recover is only three (3). The relationship between each pair of repositories determines the correct actions to take to restore the repositories to a known, consistent state. The static case is simply:

Simplistic Recovery State Diagram

Simplistic Recovery State Diagram

Note

  1. The simplistic diagram assumes no changes to RoR during the duration of the recovery (not a valid assumption for real life). See the text for information on dealing with the changes.
  2. States “BB” & “BA” are not shown, as they represent invalid states that may require restoring portions of the system from backup before proceeding.

In reality, it is impractical to guarantee the RoR is static during recovery steps. That can be dealt with by applying the process described in the flowchart to restore equality and using the tables below to locate the actions.

The primary goal is to ensure correctness based on the RoR. The secondary goal is to make the interim repository as invisible as possible.

Key RoR <-> hggit hggit <-> git Interpretation Next Step to Equality
Ae Ahead equal someone else landed pull from RoR
AA Ahead Ahead someone else landed [1] pull from RoR
AB Ahead Behind someone else landed [1] back out local changes (3a above)
ee equal equal equal nothing to do
eA equal Ahead someone else landed [2] pull to git
eB equal Behind ready to land push from git
Be Behind equal ready to land [2] push to RoR
BA Behind Ahead prior landing not finished, lost from git [3] corrupted setup, see note
BB Behind Behind prior landing not finished, next started [4] back out local changes (3a above) from 2nd landing

Table Notes

[1](1, 2) This is the common situation of needing to update (and possibly re-merge local changes) prior to landing the change
[2](1, 2) If the automation is working correctly, this is only a transitory stage, and no manual action is needed. IRL, stuff happens, so an explicit recovery path is needed.
[3]This “shouldn’t happen”, as it implies the git repository has been restored from a backup and the “pending landing” in the hggit repository is no longer a part of the git history. If there isn’t a clear understanding of why this occurred, client side repository setup should be considered suspect, and replaced.
[4]

Lander shot themselves in the foot - they have 2 incomplete landings in progress. If they are extremely lucky, they can recover by completing the first landing (“hg push RoR” -> “eB”), and proceed from there.

The deterministic approach, which also must be used if landing of first change set fails, is to back out second landing from hggit and git, then back out first landing from hggit and git.) Then equality can be restored, and each landing redone separately.

DVCS Commands
Next Step Active Repository Command
pull from RoR hggit hg pull
pull to git git git pull RoR
push from git git git push RoR
push to RoR hggit hg push

Note

that if any of the above actions fail, it simply means that we’ve lost another race condition with someone else’s commit. The recovery path is simply to re-evaluate the current state and proceed as indicated (as shown in diagram 1).

Flowchart to Restore Equality

Flowchart to Restore Equality

Flowchart to Restore Equality