I am the blog of Hal Fire, and I bring you… http://dtor.com/halfire/ … interesting tidbits of release engineering. en-us Fri, 02 Oct 2015 00:00:00 -0700 http://dtor.com/halfire/2015/10/02/duo_mfa___viscosity_no_cell_setup.html http://dtor.com/halfire/2015/10/02/duo_mfa___viscosity_no_cell_setup.html duo MFA & viscosity no-cell setup

duo MFA & viscosity no-cell setup

The Duo application is nice if you have a supported mobile device, and it’s usable even when you you have no cell connection via TOTP. However, getting Viscosity to allow both choices took some work for me.

For various reasons, I don’t want to always use the Duo application, so would like for Viscosity to alway prompt for password. (I had already saved a password - a fresh install likely would not have that issue.) That took a bit of work, and some web searches.

  1. Disable any saved passwords for Viscosity. On a Mac, this means opening up “Keychain Access” application, searching for “Viscosity” and deleting any associated entries.

  2. Ask Viscosity to save the “user name” field (optional). I really don’t need this, as my setup uses a certificate to identify me. So it doesn’t matter what I type in the field. But, I like hints, so I told Viscosity to save just the user name field:

    defaults write com.viscosityvpn.Viscosity RememberUsername -bool true

With the above, you’ll be prompted every time. You have to put “something” in the user name field, so I chose to put “push or TOTP” to remind me of the valid values. You can put anything there, just do not check the “Remember details in my Keychain” toggle.

Fri, 02 Oct 2015 00:00:00 -0700
http://dtor.com/halfire/2015/09/22/using_password_store.html http://dtor.com/halfire/2015/09/22/using_password_store.html Using Password Store

Using Password Store

Password Store (aka “pass”) is a very handy wrapper for dealing with pgp encrypted secrets. It greatly simplifies securely working with multiple secrets. This is still true even if you happen to keep your encrypted secrets in non-password-store managed repositories, although that setup isn’t covered in the docs. I’ll show my setup here. (See the Password Store page for usage: “pass show -c <spam>” & “pass search <eggs>” are among my favorites.)

Short version:
  1. Have gpg installed on your machine.

  2. Install Password Store on your machine. There are OS specific instructions. Be sure to enable tab completion for your shell!

  3. Setup a local password store. Scroll down in the usage section to “Setting it up” for instructions.

  4. Clone your secrets repositories to your normal location. Do not clone inside of ~/.password-store/.

  5. Set up symlinks inside of ~/.password-store/ to directories inside your clone of the secrets repository. I did:

    ln -s ~/path/to/secrets-git/passwords rePasswords
    ln -s ~/path/to/secrets-git/keys reKeys
  6. Enjoy command line search and retrieval of all your secrets. (Use the regular method for your separate secrets repository to add and update secrets.)


  • By using symlinks, pass will not allow me to create or update secrets in the other repositories. That prevents mistakes, as the process is different for each of those alternate stores.
  • I prefer to have just one tree of secrets to search, rather than the “multiple configuration” approach documented on the Password Store site.
  • By using symlinks, I can control the global namespace, and use names that make sense to me.
  • I’ve migrated from using KeePassX to using pass for my personal secret management. That is my “main” password-store setup (backed by a git repo).


  • If you’d prefer a GUI, there’s qtpass which also works with the above setup.
Tue, 22 Sep 2015 00:00:00 -0700
http://dtor.com/halfire/2015/07/30/decoding_hashed_known_hosts_files.html http://dtor.com/halfire/2015/07/30/decoding_hashed_known_hosts_files.html Decoding Hashed known_hosts Files

Decoding Hashed known_hosts Files

tl;dr: You might find this gist handy if you enable HashKnownHosts

Modern ssh comes with the option to obfuscate the hosts it can connect to, by enabling the HashKnownHosts option. Modern server installs have that as a default. This is a good thing.

The obfuscation occurs by hashing the first field of the known_hosts file - this field contains the hostname,port and IP address used to connect to a host. Presumably, there is a private ssh key on the host used to make the connection, so this process makes it harder for an attacker to utilize those private keys if the server is ever compromised.

Super! Nifty! Now how do I audit those files? Some services have multiple IP addresses that serve a host, so some updates and changes are legitimate. But which ones? It’s a one way hash, so you can’t decode.

Well, if you had an unhashed copy of the file, you could match host keys and determine the host name & IP. [1] You might just have such a file on your laptop (at least I don’t hash keys locally). [2] (Or build a special file by connecting to the hosts you expect with the options “-o HashKnownHosts=no -o UserKnownHostsFile=/path/to/new_master”.)

I through together a quick python script to do the matching, and it’s at this gist. I hope it’s useful - as I find bugs, I’ll keep it updated.

Bonus Tip: https://github.com/defunkt/gist

Is a very nice way to manage gists from the command line.


[1]A lie - you’ll only get the host name and IP’s that you have connected to while building your reference known_hosts file.
[2]I use other measures to keep my local private keys unusable.
Thu, 30 Jul 2015 00:00:00 -0700
http://dtor.com/halfire/2015/04/01/gmail_multi_inbox.html http://dtor.com/halfire/2015/04/01/gmail_multi_inbox.html GMail multi-inbox

GMail multi-inbox

As much as GMail’s search syntax makes me long for PCRE, there are some unobvious gems laying around.

For example, I get tons of mail about releases. Occasionally, I need to monitor a given release, paying attention to not only the automated progress, but also human generated emails as well. Here’s my current setup:

  • Automated email is marked as read & skips inbox (unless it’s a failure)
  • Any release oriented email is given a special label using a filter similar to “subject:((38.0b1) OR (38 Beta) OR (31. AND "esr")”.

That’s pretty standard. The productivity add is when I use the “multi-inbox” feature in the web ui. I set the top one to be just the unread ones with the special label from today:

newer_than:1d label:SPECIAL_LABEL is:unread

With positioning of “extra panels” to the right side, I get a very focussed look at any issues I need to look at!



No Messages:


I love seeing that “(no messages)” text!

Wed, 01 Apr 2015 00:00:00 -0700
http://dtor.com/halfire/2015/03/10/docker_at_vungle.html http://dtor.com/halfire/2015/03/10/docker_at_vungle.html Docker at Vungle

Docker at Vungle

Tonight I attended the San Francisco Dev Ops meetup at Vungle. The topic was one we often discuss at Mozilla - how to simplify a developer’s life. In this case, the solution they have migrated to is one based on Docker, although I guess the title already gave that away.

Long (but interesting - I’ll update with a link to the video when it becomes available) story short, they are having much more success using DevOps managed Docker containers for development than their previous setup of Virtualbox images built & maintained with Vagrant and Chef.

Vungle’s new hire setup:
  • install Boot2Docker (they are an all Mac dev shop)
  • clone the repository. [1]
  • run docker.sh script which pulls all the base images from DockerHub. This one time image pull gives the new hire time to fill out HR paperwork ;)
  • launch the app in the container and start coding.

Sigh. That’s nice. When you come back from PTO, just re-run the script to get the latest updates - it won’t take nearly as long as only the container deltas need to come down. Presto - back to work!

A couple of other highlights – I hope to do a more detailed post later.

  • They follow the ‘each container has a single purpose’ approach.
  • They use “helper containers” to hold recent (production) data.
  • Devs have a choice in front end development: inside the container (limited tooling) or in the local filesystem (dev’s choice of IDE, etc.). [2]
  • Currently, Docker containers are only being used in development. They are looking down the road to deploying containers in production, but it’s not a major focus at this time.


[1]Thanks to BFG for clarifying that docker-foo is kept in a separate repository from source code. The docker.sh script is in the main source code repository. [Updated 2015-03-11]
[2]More on this later. There are some definite tradeoffs.
Tue, 10 Mar 2015 00:00:00 -0700
http://dtor.com/halfire/2015/02/06/kaizen_the_low_tech_way.html http://dtor.com/halfire/2015/02/06/kaizen_the_low_tech_way.html Kaizen the low tech way

Kaizen the low tech way

On Jan 29, I treated myself to a seminar on Successful Lean Teams, with an emphasis on Kanban & Kaizen techniques. I’d read about both, but found the presentation useful. Many of the other attendees were from the Health Care industry and their perspectives were very enlightening!

Hearing how successful they were in such a high risk, multi-disciplinary, bureaucratic, and highly regulated environment is inspiring. I’m inclined to believe that it would also be achievable in a simple-by-comparison low risk environment of software development. ;)

What these hospitals are using is a light weight, self managed process which:

  • ensures visibility of changes to all impacted folks
  • outlines the expected benefits
  • includes a “trial” to ensure the change has the desired impact
  • has a built in feedback system

That sounds achievable. In several of the settings, the traditional paper and bulletin board approach was used, with 4 columns labeled “New Ideas”, “To Do”, “Doing”, and “Done”. (Not a true Kanban board for several reasons, but Trello would be a reasonable visual approximation; CAB uses spreadsheets.)

Cards move left to right, and could cycle back to “New Ideas” if iteration is needed. “New Ideas” is where things start, and they transition from there (I paraphrase a lot in the following):

  1. Everyone can mark up cards in New Ideas & add alternatives, etc.
  2. A standup is held to select cards to move from “New Ideas” to “To Do”
  3. The card stays in “To Do” for a while to allow concerns to be expressed by other stake holders. Also a team needs to sign up to move the change through the remaining steps. Before the card can move to “Doing”, a “test” (pilot or checkpoints) is agreed on to ensure the change can be evaluated for success.
  4. The team moves the card into “Doing”, and performs PSDA cycles (Plan, Do, Study, Adjust) as needed.
  5. Assuming the change yields the projected results, the change is implemented and the card is moved to “Done”. If the results aren’t as anticipated, the card gets annotated with the lessons learned, and either goes to “Done” (abandon) or back to “New Ideas” (try again) as appropriate.

For me, I’m drawn to the 2nd and 3rd steps. That seems to be the change from current practice in teams I work on. We already have a gazillion bugs filed (1st step). We also can test changes in staging (4th step) and update production (5th step). Well, okay, sometimes we skip the staging run. Occasionally that *really* bites us. (Foot guns, foot guns – get your foot guns here!)

The 2nd and 3rd steps help focus on changes. And make the set of changes happening “nowish” more visible. Other stakeholders then have a small set of items to comment upon. Net result - more changes “stick” with less overall friction.

Painting with a broad brush, this Kaizen approach is essentially what the CAB process is that Mozilla IT implemented successfully. I have experienced the CAB reduce the amount of stress, surprises, and self inflicted damage amongst both inside and outside of IT. Over time, the velocity of changes has increased and backlogs have been reduced. In short, it is a “Good Thing(tm)”.

So, I’m going to see if there is a way to “right size” this process for the smaller teams I’m on now. Stay tuned....

Fri, 06 Feb 2015 00:00:00 -0800
http://dtor.com/halfire/2015/01/23/pyenv_tox_can_get_along.html http://dtor.com/halfire/2015/01/23/pyenv_tox_can_get_along.html Pyevn & Tox Can Get Along

Pyevn & Tox Can Get Along

I fought this for quite a few days on a background project. I finally found the answer, and want to ensure I don’t forget it.


Activate all the python versions you need before running tox.

After I upgraded my laptop to OSX 10.10, I also switched to using pyenv for installing non-system python versions. Things went well (afaict) until they didn’t. All of a sudden, I could not get both my code tests to pass, and my doc build to succeed.

The error message was especially confusing:

pyenv: python2.7: command not found
The `python2.7' command exists in these Python versions:

Searching the web didn’t really shed any enlightenment. I’d find other folks who had the problem. I wasn’t alone. But they all disappeared from the bug traffic over a year ago (example). And with no sign of resolution.

Finally, I tried different search terms, and landed on this post. The secret – you can have multiple pyevn instances “active”. The first listed is the one that a bare python will invoke. The others are available as python*major*.*minor* (e.g. “python3.2”) and python*major* (e.g. “python3”)

Fri, 23 Jan 2015 00:00:00 -0800
http://dtor.com/halfire/2015/01/10/chatops_meetup.html http://dtor.com/halfire/2015/01/10/chatops_meetup.html ChatOps Meetup

ChatOps Meetup

This last Wednesday, I went to a meetup on ChatOps organized by SF DevOps, hosted by Geekdom (who also made recordings available), and sponsored by TrueAbility.

I had two primary goals in attending: I wanted to understand what made ChatOps special, and I wanted to see how much was applicable to my current work at Mozilla. The two presentations helped me accomplish the first. I’m still mulling over the second. (Ironically, I had to shift focus during the event to clean up a deployment-gone-wrong that was very close to one of the success stories mentioned by Dan Chuparkoff.)

My takeaway on why chatops works is that it is less about the tooling (although modern web services make it a lot easier), and more about the process. Like a number of techniques, it appears to be more successful when teams fully embrace their vision of ChatOps, and make implementation a top priority. Success is enhanced when the tooling supports the vision, and that appears to be what all the recent buzz is about – lots of new tools, examples, and lessons learned make it easier to follow the pioneers.

What are the key differentiators?

Heck, many teams use irc for operational coordination. There are scripts which automate steps (some workflows can be invoked from the web even). We’ve got automated configuration, logging, dashboards, and wikis – are we doing ChatOps?

Well, no, we aren’t.

Here are the differences I noted:
  • ChatOps requires everyone both agreeing and committing to a single interface to all operations. (The opsbot, like hubot, lita or Err.) Technical debt (non-conforming legacy systems) will be reworked to fit into ChatOps.
  • ChatOps requires focus and discipline. There are a small number of channels (chat rooms, MUC) that have very specific uses - and folks follow that. High signal to noise ratio. (No animated gifs in the deploy channel - that’s what the lolcat channel is for.)
  • A commitment to explicitly documenting all business rules as executable code.

What do you get for giving up all those options and flexibility? Here was the “ah ha!” concepts for me:

  1. Each ChatOps room is a “shared console” everyone can see and operate. No more screen sharing over video, or “refresh now” coordination!

  2. There is a bot which provides the “facts” about the world. One view accessible by all.

  3. The bot is also the primary way folks interact and modify the system. And it is consistent in usage across all commands. (The bot extensions perform the mapping to whatever the backend needs. The code adapts, not the human!)

  4. The bot knows all and does all:
    • Where’s the documentation?
    • How do I do X?
    • Do X!
    • What is the status of system Y?
  5. The bot is “fail safe” - you can’t bypass the rules. (If you code in a bypass, well, you loaded that foot gun!)

Thus everything is consistent and familiar for users, which helps during those 03:00 forays into a system you aren’t as familiar with. Nirvana ensues (remember, everyone did agree to drink the koolaid above).

Can you get there from here?

The speaker selection was great – Dan was able to speak to the benefits of committing to ChatOps early in a startup’s life. James Fryman (from StackStorm) showed a path for migrating existing operations to a ChatOps model. That pretty much brackets the range, so yeah, it’s doable.

The main hurdle, imo, would be getting the agreement to a total commitment! There are some tensions in deploying such a system at a highly open operation like Mozilla: ideally chat ops is open to everyone, and business rules ensure you can’t do or see anything improper. That means the bot has (somewhere) the credentials to do some very powerful operations. (Dan hopes to get their company to the “no one uses ssh, ever” point.)

My next steps? Still thinking about it a bit – I may load Err onto my laptop and try doing all my local automation via that.

Sat, 10 Jan 2015 00:00:00 -0800
http://dtor.com/halfire/2014/10/02/bz_quick_search.html http://dtor.com/halfire/2014/10/02/bz_quick_search.html bz Quick Search

bz Quick Search

With the new developer services components, I find myself once again updating my Bugzilla Quick Search search plugin. This time, I’ll document it. :)

Here are the steps:

  1. Determine the quick search parameters you want. Experimenting on the Bugzilla Quick Search page is useful.
  2. If this is your first time, install a search engine that you can copy and modify. The bugzilla one is an obvious good choice.
  3. Find the xml file for the search engine in the “searchplugins” directory of your profile. Modify the “template” attribute in the “os:Url” element based on your research in (1). I tend to put all my customization after the special token “{searchTerms}”, as that makes it easier to refine the search on the bugzilla search results page.
  4. Add a keyword to this search, for ease of use in the awesome bar.
  5. Enjoy!

[edit: here’s my current file as a sample]

Thu, 02 Oct 2014 00:00:00 -0700
http://dtor.com/halfire/2014/09/06/hg_server_update.html http://dtor.com/halfire/2014/09/06/hg_server_update.html New Hg Server Status Page

New Hg Server Status Page

Just a quick note to let folks know that the Developer Services team continues to make improvements on Mozilla’s Mercurial server. We’ve set up a status page to make it easier to check on current status.

As we continue to improve monitoring and status displays, you’ll always find the “latest and greatest” on this page. And we’ll keep the page updated with recent improvements to the system. We hope this page will become your first stop whenever you have questions about our Mercurial server.

Sat, 06 Sep 2014 00:00:00 -0700
http://dtor.com/halfire/2014/07/02/2014_06_try_server_update.html http://dtor.com/halfire/2014/07/02/2014_06_try_server_update.html 2014-06 try server update

2014-06 try server update

Chatting with Aki the other day, I realized that word of all the wonderful improvements to the try server issue have not been publicized. A lot of folks have done a lot of work to make things better - here’s a brief summary of the good news.

Try server pushes could appear to take up to 4 hours, during which time others would be locked out.
The major time taker has been found and eliminated: ancestor processing. And we understand the remaining occasional slow downs are related to caching . Fortunately, there are some steps that developers can take now to minimize delays.

What folks can do to help

The biggest remaining slowdown is caused by rebuilding the cache. The cache is only invalidated if the push is interrupted. If you can avoid causing a disconnect until your push is complete, that helps everyone! So, please, no Ctrl-C during the push! The other changes should address the long wait times you used to see.

What has been done to infrastructure

There has long been a belief that many of our hg problems, especially on try, came from the fact that we had r/w NFS mounts of the repositories across multiple machines (both hgssh servers & hgweb servers). For various historical reasons, a large part of this was due to the way pushlog was implemented.

Ben did a lot of work to get sqlite off NFS, and much of the work to synchronize the repositories without NFS has been completed.

What has been done to our hooks

All along, folks have been discussing our try server performance issues with the hg developers. A key confusing issue was that we saw processes “hang” for VERY long times (45 min or more) without making a system call. Kendall managed to observe an hg process in such an infinite-looking-loop-that-eventually-terminated a few times. A stack trace would show it was looking up an hg ancestor without makes system calls or library accesses. In discussions, this confused the hg team as they did not know of any reason that ancestor code should be being invoked during a push.

Thanks to lots of debugging help from glandium one evening, we found and disabled a local hook that invoked the ancestor function on every commit to try. \o/ team work!

Caching – the remaining problem

With the ancestor-invoking-hook disabled, we still saw some longish periods of time where we couldn’t explain why pushes to try appeared hung. Granted it was a much shorter time, and always self corrected, but it was still puzzling.

A number of our old theories, such as “too many heads” were discounted by hg developers as both (a) we didn’t have that many heads, and (b) lots of heads shouldn’t be a significant issue – hg wants to support even more heads than we have on try.

Greg did a wonderful bit of sleuthing to find the impact of ^C during push. Our current belief is once the caching is fixed upstream, we’ll be in a pretty good spot. (Especially with the inclusion of some performance optimizations also possible with the new cache-fixed version.)

What is coming next

To take advantage of all the good stuff upstream Hg versions have, including the bug fixes we want, we’re going to be moving towards removing roadblocks to staying closer to the tip. Historically, we had some issues due to http header sizes and load balancers; ancient python or hg client versions; and similar. The client issues have been addressed, and a proper testing/staging environment is on the horizon.

There are a few competing priorities, so I’m not going to predict a completion date. But I’m positive the future is coming. I hope you have a glimpse into that as well.

Wed, 02 Jul 2014 00:00:00 -0700
http://dtor.com/halfire/2014/06/26/cvs_attic_in_dvcs.html http://dtor.com/halfire/2014/06/26/cvs_attic_in_dvcs.html CVS Attic in DVCS

CVS Attic in DVCS

One handy feature of CVS was the presence of the Attic directory. The primary purpose of the Attic directory was to simplify trunk checkouts, while providing space for both removed and added-only-on-branch files.

As a consequence of this, it was relatively easy to browse all such file names. I often would use this as my “memory” of scripts I had written for specific purposes, but were no longer needed. Often these would form the basis for a future special purpose script.

This isn’t a very commonly needed use case, but I have found myself being a bit reluctant to delete files using DVCS systems, as I wasn’t quite sure how to find things easily in the future.

Well, I finally scratched the itch – here are the tricks I’ve added to my toolkit.

Hg version

A simplistic version, which just shows when file names were deleted, is to add the alias to ~/.hgrc:

attic=log --template '{rev}:{file_dels}\n'

Git version

Very similar for git:

git config --global alias.attic 'log --diff-filter=D --summary'

(Not actually ideal, as not a one liner, but good enough for how often I use this.)

Thu, 26 Jun 2014 00:00:00 -0700
http://dtor.com/halfire/2014/03/06/bluetooth_finder_for_fitbit.html http://dtor.com/halfire/2014/03/06/bluetooth_finder_for_fitbit.html Bluetooth Finder for Fitbit

Bluetooth Finder for Fitbit

Pro tip - if you have a Fitbit or other small BLE device, go get a “bluetooth finder” app for your smartphone or tablet. NOW. No thanks needed.

I ended up spending far-too-long looking for my misplaced black fitbit One last weekend. Turned out the black fitbit was behind a black sock on a shelf in a dark closet. (Next time, I’ll get a fuchsia colored on – I don’t have too many pairs of fuchsia socks.)

After several trips through the house looking, I thought I’d turn to technology. By seeing where in the house I could still sync with my phone, I could confirm it was in the house. I tried setting alarms on the fitbit, but I couldn’t hear them go off. (Likely, the vibrations were completely muffled by the sock. Socks - I should just get rid of them.)

Then I had the bright idea of asking the interwebs for help. Surely, I couldn’t be the first person in this predicament. I was rewarded with this FAQ on the fitbit site, but I’d already followed those suggestions.

Finally, I just searched for “finding bluetooth”, and discovered the tv ads were right: there is an app for that! Since I was on my android tablet at the time, I ended up with Bluetooth Finder, and found my Fitbit within 5 minutes. (I also found a similar app for my iPhone, but I don’t find it as easy to use. Displaying the signal strength on a meter is more natural for me than watching dB numbers.)

Thu, 06 Mar 2014 00:00:00 -0800
http://dtor.com/halfire/2013/11/23/vim_fun_with_vundle.html http://dtor.com/halfire/2013/11/23/vim_fun_with_vundle.html More VIM fun: Vundle

More VIM fun: Vundle

During this last RelEng workweek, I thought I’d try a new VIM plugin for reST: RIV. While that didn’t work out great (yet), it did get me to start using Vundle. Vundle is a quite nice vim plugin manager, and is easier for me to understand than Pathogen.

However, the Vundle docs didn’t cover two cases I care about:

  • converting Pathogen modules for Vundle usage
  • using with bundles not managed by either Pathogen or Vundle. (While Vundle running won’t interfere with unmanaged bundles, the :BundleClean command will claim they are unused and offer to delete them. That’s just too risky for me.)

The two cases appear to have the same solution:

  • ensure all directories in the bundle location (typically ~/.vim/bundles/) are managed by Vundle.
  • use a file:// URI for any bundle you don’t want Vundle to update.

For example, I installed the ctrlp bundle a while back, from the Bitbucket (hg) repository. (Yes, there (now?) is a github repository, but why spoil my fun.) Since the hg checkout already lived in ~/.vim/bundle, I only needed to add the following line to my vimrc file:

Bundle 'file:///~/.vim/bundle/ctrlp.vim/'

Vundle no longer offers to delete that repository when BundleClean is run.

I suspect I’ll get errors if I ever asked Vundle to update that repo, but that isn’t in my plans. I believe my major use case for Vundle will be to trial install plugins, and then BundleClean will clean things up safely.

Sat, 23 Nov 2013 00:00:00 -0800
http://dtor.com/halfire/2013/09/06/mysql_mac_venv_notes.html http://dtor.com/halfire/2013/09/06/mysql_mac_venv_notes.html MySQL & Python on Mac

MySQL & Python on Mac

I don’t have MySQL installed globally, so need to do this dance every time I add it to a new virtualenv:

  1. Install the bindings in the virtual env. The package name is MySQL-python.

  2. Symlink libmysqlclient.18.dylib from the /usr/local/mysql/lib tree into site-packages of the virtualenv

  3. Add the following to the virtual env’s activate script::


  4. optionally add /usr/local/mysql/bin to PATH as well.

Fri, 06 Sep 2013 00:00:00 -0700
http://dtor.com/halfire/2013/06/19/inter_repo_actions.html http://dtor.com/halfire/2013/06/19/inter_repo_actions.html Inter Repository Operations

Inter Repository Operations

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Mozilla, like most operations, has the Repositories of Record (RoR) set to only allow “fast forward” updates when new code is landed. In order to fast forward merge, the tip of the destination repository (RoR) must be an ancestor of the commit being pushed from the source repository. In the discussion below, it will be useful to say if a repository is “ahead”, “behind”, or “equal” to another. These states are defined as:

  • If the tip of the two repositories are the same reference, then the two repositories are said to be equal (‘e‘ in table below)
  • Else if the tip of the upstream repository is a ancestor of the tip of the destination repository, the upstream is defined to be behind (‘B‘ in table below) the source repository
  • Otherwise, the upstream repository is ahead (‘A‘ in table below) of the source repository.

Landing a change in the normal (2 repository case: RoR and lander’s repository), the process is logically (assuming no network issues):

  1. Make sure lander’s repository is equivalent to RoR (start with equality)

  2. Apply the changes (RoR is now “Behind” the local repository)

  3. Push the changes to the RoR
    • if the push succeeds, then stop. (equality restored)

    • if the push fails, simultaneous landings were being attempted, and you lost the race.

      When simultaneous landings are attempted, only one will succeed, and the others will need to repeat the landing attempt. The RoR is now “Ahead” of the local repository, and the new upstream changes will need to be incorporated, logically as:

      1. Remove the local changes (“patch -R”, “git stash”, “hg import”, etc.).
      2. Pull the changes from RoR (will apply cleanly, equality restored)
      3. Continue from step 2 above

When an authorized committer wants to land a change set on an hg RoR from git, there are three repositories involved. These are the RoR, the git repository the lander is working in, and internal hggit used for translation. The sections below describe how this affects the normal case above.

Land from git – Happy Path

On the happy path (no commit collisions, no network issues), the steps are identical to the normal path above. The git commands executed by the lander are set by the tool chain to perform any additional operations needed.

Land from git – Commit Collision

Occasionally, multiple people will try to land commits simultaneously, and a commit collision will occur (steps 3a, 3b, & 3c above). As long as the collision is noticed and dealt with before addition changes are committed to the git repository, the tooling will unapply the change to the internal hggit repository.

Land from git – Sad Path

In real life, network connections fail, power outages occur, and other gremlins create the need to deal with “sad paths”. The following sections are only needed when we’re neither on the happy path nor experiencing a normal commit collision.

Because these cases cover every possible case of disaster recovery, it can appear more complex than it is. While there are multiple (6) different sad paths, only one will be in play for a given repository. And the maximum number of operations to recover is only three (3). The relationship between each pair of repositories determines the correct actions to take to restore the repositories to a known, consistent state. The static case is simply:

Simplistic Recovery State Diagram

Simplistic Recovery State Diagram


  1. The simplistic diagram assumes no changes to RoR during the duration of the recovery (not a valid assumption for real life). See the text for information on dealing with the changes.
  2. States “BB” & “BA” are not shown, as they represent invalid states that may require restoring portions of the system from backup before proceeding.

In reality, it is impractical to guarantee the RoR is static during recovery steps. That can be dealt with by applying the process described in the flowchart to restore equality and using the tables below to locate the actions.

The primary goal is to ensure correctness based on the RoR. The secondary goal is to make the interim repository as invisible as possible.

Key RoR <-> hggit hggit <-> git Interpretation Next Step to Equality
Ae Ahead equal someone else landed pull from RoR
AA Ahead Ahead someone else landed [1] pull from RoR
AB Ahead Behind someone else landed [1] back out local changes (3a above)
ee equal equal equal nothing to do
eA equal Ahead someone else landed [2] pull to git
eB equal Behind ready to land push from git
Be Behind equal ready to land [2] push to RoR
BA Behind Ahead prior landing not finished, lost from git [3] corrupted setup, see note
BB Behind Behind prior landing not finished, next started [4] back out local changes (3a above) from 2nd landing

Table Notes

[1](1, 2) This is the common situation of needing to update (and possibly re-merge local changes) prior to landing the change
[2](1, 2) If the automation is working correctly, this is only a transitory stage, and no manual action is needed. IRL, stuff happens, so an explicit recovery path is needed.
[3]This “shouldn’t happen”, as it implies the git repository has been restored from a backup and the “pending landing” in the hggit repository is no longer a part of the git history. If there isn’t a clear understanding of why this occurred, client side repository setup should be considered suspect, and replaced.

Lander shot themselves in the foot - they have 2 incomplete landings in progress. If they are extremely lucky, they can recover by completing the first landing (“hg push RoR” -> “eB”), and proceed from there.

The deterministic approach, which also must be used if landing of first change set fails, is to back out second landing from hggit and git, then back out first landing from hggit and git.) Then equality can be restored, and each landing redone separately.

DVCS Commands
Next Step Active Repository Command
pull from RoR hggit hg pull
pull to git git git pull RoR
push from git git git push RoR
push to RoR hggit hg push


that if any of the above actions fail, it simply means that we’ve lost another race condition with someone else’s commit. The recovery path is simply to re-evaluate the current state and proceed as indicated (as shown in diagram 1).

Flowchart to Restore Equality

Flowchart to Restore Equality

Flowchart to Restore Equality

Wed, 19 Jun 2013 00:00:00 -0700
http://dtor.com/halfire/2013/05/20/2013_Releng_talk.html http://dtor.com/halfire/2013/05/20/2013_Releng_talk.html Using hg & git for the same codebase

Using hg & git for the same codebase

Speaker Notes

Following are the slides I presented at the RELENG 2013 workshop on May 20th, 2013. Paragraphs formatted like this were not part of the presented slides - they are very rough speaker notes.

If you prefer, you may view a PDF version.

Hal Wine hwine@mozilla.com
Release Engineering
Mozilla Corporation

Issues and solutions encountered in maintaining a single code base under active development in both hg & git formats.


  • Mozilla Corporation operates an extensive build farm that is mostly used to build binary products installed by the end user. Mozilla has been using Mercurial repositories for this since converting from CVS in 2007. We currently use a 6 week “Rapid Release” cycle for most products.

    Speaker Notes

    We currently have upwards of 4,000 hosts involved in the continuous integration and testing of Mozilla products. These hosts do approximately 140 hours of work on each commit.

  • Firefox Operating System is a new product that ships source to be incoporated by various partners in the mobile phone industry. These partners, experienced with the Android build process, require source be delivered via git repositories. This is close to a “Continuous Release” process.

    Speaker Notes

    A large part of the FxOS product is code used in the browser products. That is in Mercurial and needs to be converted to git. Most new code modules for FxOS are developed on github, and need to be converted to Mercurial for use in our CI & build systems.


  • What we initially set out to do:
    • Make it purely a developer choice which dvcs to use.

      Speaker Notes

      Ideal was to allow developers to make dvcs as personal a choice as editor.

    • Support multiple social coding sites.

      Speaker Notes

      These social coding sites, such as github and bitbucket, make it much easier for new community members to contribute.

  • That was much tougher than anticipated.
    In theory, git & hg are very close...
    ... In practice, “the devil is in the details”.
  • Where we are:
    • Changed direction to support FFOS release to partners.
    • Quickly mirror Repository of Record (RoR) between git & hg.
    • CI/build system remains Mercurial centric.

Challenge Areas

  • Changesets have different hashes in Mercurial and git.
    • We added tooling to support both in static documents such as manifest files.
    • All tools continue to use hg hash as primary value for indexing and linking.
  • Propagation delays of changesets to the “other” system.

    Speaker Notes

    For most use cases, the approximately 20 minute average we’re achieving is acceptable.

    • Compounded by hash differences between two systems.

      Speaker Notes

      A common use case here is a developer wanting to start a self serve build. If the commit was to git, the self serve build won’t be successful until that commit is converted to hg.

      We are continuing work on this. It is closely tied to determining which commit broke the build, when multiple repositories are involved.

  • Build details
    • Movable tags are not popular in git based workflows, but have been a common technique at Mozilla to mark “latest”.

Challenge Areas (Con’t)

  • Mixed philosophies are often linked with mixed repositories.
    • Android never wants history to appear to change. Downstream servers allow only fast forward changesets and deny deletions.

    • Mozilla uses “RoR is authoritative”.

      Speaker Notes

      Either approach is self consistent. It is when the two need to interact that challenges arrise.

  • Conversion failures
    • Occasional hg-git conversion failures, due to implementation details of hg & git.

      Speaker Notes

      • Dates in export patches (e.g. hg uses seconds, git uses minutes, in time)
      • Email validation (git stricter than hg)
    • Since commit already accepted by hg, hg-git must be modified

      Speaker Notes

      This requires inhouse resources to respond urgently to patch the conversion machinery. Without conversion, there are no builds.

Alternate Approaches

  • To support your own “use the DVCS you want” infrastructure requires:
    • production quality hg server
    • production quality git server
    • in house ability to address conversion issues (as already mentioned)
  • I’m aware of two commercial alternatives. Both of these use a centralized RoR which supports git and/or hg interfaces for developer interaction.

    Speaker Notes

    And at least one explicitly does not have a git back end.

  • You can leave it to developers to scratch their own itch independently. Given diversity of workflows, this may be more cost effective than obtaining consensus.

Future Research

Areas of particular interest for further study include:

  • What is the set of enforceable assertions which would ensure the tooling can maintain lossless conversion between DVCS?

  • What minimum conditions must be maintained in conversions to preclude downstream conflicts?

  • What workflows can be supported to minimize issues?

  • Are there best practice incident management protocols for addressing problem commits.

    Speaker Notes

    The common example is a commit contains sensitive material it should not. There are cases were limiting the scope of distribution can have significant business value.

Mon, 20 May 2013 00:00:00 -0700
http://dtor.com/halfire/2012/04/10/the_dev_cycle.html http://dtor.com/halfire/2012/04/10/the_dev_cycle.html The canonical commit/push/land cycle at Mozilla

The canonical commit/push/land cycle at Mozilla

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Untangling the terminology

In the old days, before DVCS, “commit” only had only one real purpose. It was how you published your work to the rest of the world (or your project’s world at least). With DVCS, you are likely committing quite often, but still only occasionally publishing.

Tue, 10 Apr 2012 00:00:00 -0700
http://dtor.com/halfire/2012/03/08/new_commit_workflow.html http://dtor.com/halfire/2012/03/08/new_commit_workflow.html Changes to commit workflow

Changes to commit workflow

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

With all the changes to support git, how will that affect a committer’s workflow? (For developer impact, see this post.)

The primary goal is to work within the existing Mozilla commit policy [1]. Working within that constraint, the idea is “as little as possible”, and this post will try to describe how big “as little” is.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Read more...]]>
Thu, 08 Mar 2012 00:00:00 -0800
http://dtor.com/halfire/2012/03/07/new_workflows.html http://dtor.com/halfire/2012/03/07/new_workflows.html Changes to Developer workflow

Changes to Developer workflow

[Refer to the main page for additional context.]

With all the changes to support git, how will that affect a developer’s workflow? (The committer’s workflow will be covered in a future post.)

The idea is “not much at all”, and this post will try to define “not much”.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Read more...]]>
Wed, 07 Mar 2012 00:00:00 -0800
http://dtor.com/halfire/2012/03/02/survey_data_summary.html http://dtor.com/halfire/2012/03/02/survey_data_summary.html DVCS Survey Summary

DVCS Survey Summary


A long time ago (December of 2011), I sent out a brief survey on DVCS usage to Mozilla folks (and asked them to spread wider). While there were only 42 responses, there were some interesting patterns.


I am neither a statistician nor a psychometrician.

I believe you can see the raw summary at via this link. What follows are the informal inferences I drew from the results (remember the disclaimer).

Commit to git is not the issue:

]]> Fri, 02 Mar 2012 00:00:00 -0800 http://dtor.com/halfire/2012/02/21/wowzy_thats_a_feature.html http://dtor.com/halfire/2012/02/21/wowzy_thats_a_feature.html ... the Wowza feature of git

... the Wowza feature of git


Wowza! I found the killer feature in git - you can have your cake and eat it, too!

Every time I’ve had to move to a new VCS, there’s never been enough time available to move the complete history correctly. Linux had this problem in spades when they moved off BitKeeper onto git in a very-short-time.

The solution? Take your time to convert the history correctly (or not, you can correct later), then allow developers who want it to prepend it on their machines, without making their repo operate any differently from the latest one.

Read on for more about replace/graft feature.

Tue, 21 Feb 2012 00:00:00 -0800
http://dtor.com/halfire/2012/02/01/releng_as-is_snapshot.html http://dtor.com/halfire/2012/02/01/releng_as-is_snapshot.html Releng As Is - January 2012

Releng As Is - January 2012

[Refer to the main page for additional context.]

Where we are in January 2012

The purpose of this post is to present a very high level picture of the current Firefox build & release process as a set of requirements. Some of these services are provided or supported by groups outside of releng (particularly it & webdev). This diagram will be useful in understanding the impact of changes.

Wed, 01 Feb 2012 00:00:00 -0800
http://dtor.com/halfire/2012/01/26/releng_and_git_project.html http://dtor.com/halfire/2012/01/26/releng_and_git_project.html Releng & Git - Project Overview

Releng & Git - Project Overview

This is the first in a series of posts about the “support git in releng” project. The goal of this project, as stated in bug 713782, is:

... The idea here is to see if we can support git in Mozilla’s RelEng infrastructure, to at least the same standard (or better) as we already currently support hg.

My hope is that blog posts will be a better forum for discussion than the tracking bug 713782, or a wiki page, at this stage.

These posts will highlight the various issues, so that the vague definitions above become clear, as do the intermediate steps needed to achieve completion.

Thu, 26 Jan 2012 00:00:00 -0800
http://dtor.com/halfire/2012/01/26/dvcs_end_result.html http://dtor.com/halfire/2012/01/26/dvcs_end_result.html The Ideal Future

The Ideal Future

[Refer to the main page for additional context.]

Based on discussions to date, everyone seems to have similar ideas about what “supporting git for releng” means. Later posts will highlight the work needed to ensure the ideal can be achieved, and how to arrive there.

For this post, I intend to limit the viewpoint and scope to that of the developer impact. Release notions (such as “system of record”) and scaling issues won’t be mentioned here. (N.B. Those concerns will be a key part of the path to verifying feasibility, but do not change the goal.)

As a reminder, I’m just talking about repositories that are used to produce products. [1]

Read more...]]>
Thu, 26 Jan 2012 00:00:00 -0800
http://dtor.com/halfire/2012/01/22/changed_viewpoint.html http://dtor.com/halfire/2012/01/22/changed_viewpoint.html ... a View from Outside

... a View from Outside


One of the things that excited me about the opportunity to work at Mozilla was the chance to change perspectives. After working in many closed environments, I knew the open source world of Mozilla would be different. And that would lead to a re-examination of basic questions, such as:

Q: Are there any significant differences in the role a VCS plays at Mozilla than at j-random-private-enterprise?

A: At the scale of Mozilla Products [1], I don’t believe there are.

But the question is important to ask! (And I hope to ask more of them.)

Sun, 22 Jan 2012 00:00:00 -0800