I am the blog of Hal Fire, and I bring you… http://dtor.com/halfire/ … interesting tidbits of release engineering. en-us Wed, 02 Jul 2014 00:00:00 -0700 http://dtor.com/halfire/2014/07/02/2014_06_try_server_update.html http://dtor.com/halfire/2014/07/02/2014_06_try_server_update.html 2014-06 try server update

2014-06 try server update

Chatting with Aki the other day, I realized that word of all the wonderful improvements to the try server issue have not been publicized. A lot of folks have done a lot of work to make things better - here’s a brief summary of the good news.

Before:
Try server pushes could appear to take up to 4 hours, during which time others would be locked out.
Now:
The major time taker has been found and eliminated: ancestor processing. And we understand the remaining occasional slow downs are related to caching . Fortunately, there are some steps that developers can take now to minimize delays.

What folks can do to help

The biggest remaining slowdown is caused by rebuilding the cache. The cache is only invalidated if the push is interrupted. If you can avoid causing a disconnect until your push is complete, that helps everyone! So, please, no Ctrl-C during the push! The other changes should address the long wait times you used to see.

What has been done to infrastructure

There has long been a belief that many of our hg problems, especially on try, came from the fact that we had r/w NFS mounts of the repositories across multiple machines (both hgssh servers & hgweb servers). For various historical reasons, a large part of this was due to the way pushlog was implemented.

Ben did a lot of work to get sqlite off NFS, and much of the work to synchronize the repositories without NFS has been completed.

What has been done to our hooks

All along, folks have been discussing our try server performance issues with the hg developers. A key confusing issue was that we saw processes “hang” for VERY long times (45 min or more) without making a system call. Kendall managed to observe an hg process in such an infinite-looking-loop-that-eventually-terminated a few times. A stack trace would show it was looking up an hg ancestor without makes system calls or library accesses. In discussions, this confused the hg team as they did not know of any reason that ancestor code should be being invoked during a push.

Thanks to lots of debugging help from glandium one evening, we found and disabled a local hook that invoked the ancestor function on every commit to try. \o/ team work!

Caching – the remaining problem

With the ancestor-invoking-hook disabled, we still saw some longish periods of time where we couldn’t explain why pushes to try appeared hung. Granted it was a much shorter time, and always self corrected, but it was still puzzling.

A number of our old theories, such as “too many heads” were discounted by hg developers as both (a) we didn’t have that many heads, and (b) lots of heads shouldn’t be a significant issue – hg wants to support even more heads than we have on try.

Greg did a wonderful bit of sleuthing to find the impact of ^C during push. Our current belief is once the caching is fixed upstream, we’ll be in a pretty good spot. (Especially with the inclusion of some performance optimizations also possible with the new cache-fixed version.)

What is coming next

To take advantage of all the good stuff upstream Hg versions have, including the bug fixes we want, we’re going to be moving towards removing roadblocks to staying closer to the tip. Historically, we had some issues due to http header sizes and load balancers; ancient python or hg client versions; and similar. The client issues have been addressed, and a proper testing/staging environment is on the horizon.

There are a few competing priorities, so I’m not going to predict a completion date. But I’m positive the future is coming. I hope you have a glimpse into that as well.

]]>
Wed, 02 Jul 2014 00:00:00 -0700
http://dtor.com/halfire/2014/06/26/cvs_attic_in_dvcs.html http://dtor.com/halfire/2014/06/26/cvs_attic_in_dvcs.html CVS Attic in DVCS

CVS Attic in DVCS

One handy feature of CVS was the presence of the Attic directory. The primary purpose of the Attic directory was to simplify trunk checkouts, while providing space for both removed and added-only-on-branch files.

As a consequence of this, it was relatively easy to browse all such file names. I often would use this as my “memory” of scripts I had written for specific purposes, but were no longer needed. Often these would form the basis for a future special purpose script.

This isn’t a very commonly needed use case, but I have found myself being a bit reluctant to delete files using DVCS systems, as I wasn’t quite sure how to find things easily in the future.

Well, I finally scratched the itch – here are the tricks I’ve added to my toolkit.

Hg version

A simplistic version, which just shows when file names were deleted, is to add the alias to ~/.hgrc:

[alias]
attic=log --template '{rev}:{file_dels}\n'

Git version

Very similar for git:

git config --global alias.attic 'log --diff-filter=D --summary'

(Not actually ideal, as not a one liner, but good enough for how often I use this.)

]]>
Thu, 26 Jun 2014 00:00:00 -0700
http://dtor.com/halfire/2014/03/06/bluetooth_finder_for_fitbit.html http://dtor.com/halfire/2014/03/06/bluetooth_finder_for_fitbit.html Bluetooth Finder for Fitbit

Bluetooth Finder for Fitbit

Pro tip - if you have a Fitbit or other small BLE device, go get a “bluetooth finder” app for your smartphone or tablet. NOW. No thanks needed.

I ended up spending far-too-long looking for my misplaced black fitbit One last weekend. Turned out the black fitbit was behind a black sock on a shelf in a dark closet. (Next time, I’ll get a fuchsia colored on – I don’t have too many pairs of fuchsia socks.)

After several trips through the house looking, I thought I’d turn to technology. By seeing where in the house I could still sync with my phone, I could confirm it was in the house. I tried setting alarms on the fitbit, but I couldn’t hear them go off. (Likely, the vibrations were completely muffled by the sock. Socks - I should just get rid of them.)

Then I had the bright idea of asking the interwebs for help. Surely, I couldn’t be the first person in this predicament. I was rewarded with this FAQ on the fitbit site, but I’d already followed those suggestions.

Finally, I just searched for “finding bluetooth”, and discovered the tv ads were right: there is an app for that! Since I was on my android tablet at the time, I ended up with Bluetooth Finder, and found my Fitbit within 5 minutes. (I also found a similar app for my iPhone, but I don’t find it as easy to use. Displaying the signal strength on a meter is more natural for me than watching dB numbers.)

]]>
Thu, 06 Mar 2014 00:00:00 -0800
http://dtor.com/halfire/2013/11/23/vim_fun_with_vundle.html http://dtor.com/halfire/2013/11/23/vim_fun_with_vundle.html More VIM fun: Vundle

More VIM fun: Vundle

During this last RelEng workweek, I thought I’d try a new VIM plugin for reST: RIV. While that didn’t work out great (yet), it did get me to start using Vundle. Vundle is a quite nice vim plugin manager, and is easier for me to understand than Pathogen.

However, the Vundle docs didn’t cover two cases I care about:

  • converting Pathogen modules for Vundle usage
  • using with bundles not managed by either Pathogen or Vundle. (While Vundle running won’t interfere with unmanaged bundles, the :BundleClean command will claim they are unused and offer to delete them. That’s just too risky for me.)

The two cases appear to have the same solution:

  • ensure all directories in the bundle location (typically ~/.vim/bundles/) are managed by Vundle.
  • use a file:// URI for any bundle you don’t want Vundle to update.

For example, I installed the ctrlp bundle a while back, from the Bitbucket (hg) repository. (Yes, there (now?) is a github repository, but why spoil my fun.) Since the hg checkout already lived in ~/.vim/bundle, I only needed to add the following line to my vimrc file:

Bundle 'file:///~/.vim/bundle/ctrlp.vim/'

Vundle no longer offers to delete that repository when BundleClean is run.

I suspect I’ll get errors if I ever asked Vundle to update that repo, but that isn’t in my plans. I believe my major use case for Vundle will be to trial install plugins, and then BundleClean will clean things up safely.

]]>
Sat, 23 Nov 2013 00:00:00 -0800
http://dtor.com/halfire/2013/09/06/mysql_mac_venv_notes.html http://dtor.com/halfire/2013/09/06/mysql_mac_venv_notes.html MySQL & Python on Mac

MySQL & Python on Mac

I don’t have MySQL installed globally, so need to do this dance every time I add it to a new virtualenv:

  1. Install the bindings in the virtual env. The package name is MySQL-python.

  2. Symlink libmysqlclient.18.dylib from the /usr/local/mysql/lib tree into site-packages of the virtualenv

  3. Add the following to the virtual env’s activate script::

    DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/path/to/venv/site-package

  4. optionally add /usr/local/mysql/bin to PATH as well.

]]>
Fri, 06 Sep 2013 00:00:00 -0700
http://dtor.com/halfire/2013/06/19/inter_repo_actions.html http://dtor.com/halfire/2013/06/19/inter_repo_actions.html Inter Repository Operations

Inter Repository Operations

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Mozilla, like most operations, has the Repositories of Record (RoR) set to only allow “fast forward” updates when new code is landed. In order to fast forward merge, the tip of the destination repository (RoR) must be an ancestor of the commit being pushed from the source repository. In the discussion below, it will be useful to say if a repository is “ahead”, “behind”, or “equal” to another. These states are defined as:

  • If the tip of the two repositories are the same reference, then the two repositories are said to be equal (‘e‘ in table below)
  • Else if the tip of the upstream repository is a ancestor of the tip of the destination repository, the upstream is defined to be behind (‘B‘ in table below) the source repository
  • Otherwise, the upstream repository is ahead (‘A‘ in table below) of the source repository.

Landing a change in the normal (2 repository case: RoR and lander’s repository), the process is logically (assuming no network issues):

  1. Make sure lander’s repository is equivalent to RoR (start with equality)

  2. Apply the changes (RoR is now “Behind” the local repository)

  3. Push the changes to the RoR
    • if the push succeeds, then stop. (equality restored)

    • if the push fails, simultaneous landings were being attempted, and you lost the race.

      When simultaneous landings are attempted, only one will succeed, and the others will need to repeat the landing attempt. The RoR is now “Ahead” of the local repository, and the new upstream changes will need to be incorporated, logically as:

      1. Remove the local changes (“patch -R”, “git stash”, “hg import”, etc.).
      2. Pull the changes from RoR (will apply cleanly, equality restored)
      3. Continue from step 2 above

When an authorized committer wants to land a change set on an hg RoR from git, there are three repositories involved. These are the RoR, the git repository the lander is working in, and internal hggit used for translation. The sections below describe how this affects the normal case above.

Land from git – Happy Path

On the happy path (no commit collisions, no network issues), the steps are identical to the normal path above. The git commands executed by the lander are set by the tool chain to perform any additional operations needed.

Land from git – Commit Collision

Occasionally, multiple people will try to land commits simultaneously, and a commit collision will occur (steps 3a, 3b, & 3c above). As long as the collision is noticed and dealt with before addition changes are committed to the git repository, the tooling will unapply the change to the internal hggit repository.

Land from git – Sad Path

In real life, network connections fail, power outages occur, and other gremlins create the need to deal with “sad paths”. The following sections are only needed when we’re neither on the happy path nor experiencing a normal commit collision.

Because these cases cover every possible case of disaster recovery, it can appear more complex than it is. While there are multiple (6) different sad paths, only one will be in play for a given repository. And the maximum number of operations to recover is only three (3). The relationship between each pair of repositories determines the correct actions to take to restore the repositories to a known, consistent state. The static case is simply:

Simplistic Recovery State Diagram

Simplistic Recovery State Diagram

Note

  1. The simplistic diagram assumes no changes to RoR during the duration of the recovery (not a valid assumption for real life). See the text for information on dealing with the changes.
  2. States “BB” & “BA” are not shown, as they represent invalid states that may require restoring portions of the system from backup before proceeding.

In reality, it is impractical to guarantee the RoR is static during recovery steps. That can be dealt with by applying the process described in the flowchart to restore equality and using the tables below to locate the actions.

The primary goal is to ensure correctness based on the RoR. The secondary goal is to make the interim repository as invisible as possible.

Key RoR <-> hggit hggit <-> git Interpretation Next Step to Equality
Ae Ahead equal someone else landed pull from RoR
AA Ahead Ahead someone else landed [1] pull from RoR
AB Ahead Behind someone else landed [1] back out local changes (3a above)
ee equal equal equal nothing to do
eA equal Ahead someone else landed [2] pull to git
eB equal Behind ready to land push from git
Be Behind equal ready to land [2] push to RoR
BA Behind Ahead prior landing not finished, lost from git [3] corrupted setup, see note
BB Behind Behind prior landing not finished, next started [4] back out local changes (3a above) from 2nd landing

Table Notes

[1](1, 2) This is the common situation of needing to update (and possibly re-merge local changes) prior to landing the change
[2](1, 2) If the automation is working correctly, this is only a transitory stage, and no manual action is needed. IRL, stuff happens, so an explicit recovery path is needed.
[3]This “shouldn’t happen”, as it implies the git repository has been restored from a backup and the “pending landing” in the hggit repository is no longer a part of the git history. If there isn’t a clear understanding of why this occurred, client side repository setup should be considered suspect, and replaced.
[4]

Lander shot themselves in the foot - they have 2 incomplete landings in progress. If they are extremely lucky, they can recover by completing the first landing (“hg push RoR” -> “eB”), and proceed from there.

The deterministic approach, which also must be used if landing of first change set fails, is to back out second landing from hggit and git, then back out first landing from hggit and git.) Then equality can be restored, and each landing redone separately.

DVCS Commands
Next Step Active Repository Command
pull from RoR hggit hg pull
pull to git git git pull RoR
push from git git git push RoR
push to RoR hggit hg push

Note

that if any of the above actions fail, it simply means that we’ve lost another race condition with someone else’s commit. The recovery path is simply to re-evaluate the current state and proceed as indicated (as shown in diagram 1).

Flowchart to Restore Equality

Flowchart to Restore Equality

Flowchart to Restore Equality

]]>
Wed, 19 Jun 2013 00:00:00 -0700
http://dtor.com/halfire/2013/05/20/2013_Releng_talk.html http://dtor.com/halfire/2013/05/20/2013_Releng_talk.html Using hg & git for the same codebase

Using hg & git for the same codebase

Speaker Notes

Following are the slides I presented at the RELENG 2013 workshop on May 20th, 2013. Paragraphs formatted like this were not part of the presented slides - they are very rough speaker notes.

If you prefer, you may view a PDF version.

Hal Wine hwine@mozilla.com
Release Engineering
Mozilla Corporation

Issues and solutions encountered in maintaining a single code base under active development in both hg & git formats.

Background

  • Mozilla Corporation operates an extensive build farm that is mostly used to build binary products installed by the end user. Mozilla has been using Mercurial repositories for this since converting from CVS in 2007. We currently use a 6 week “Rapid Release” cycle for most products.

    Speaker Notes

    We currently have upwards of 4,000 hosts involved in the continuous integration and testing of Mozilla products. These hosts do approximately 140 hours of work on each commit.

  • Firefox Operating System is a new product that ships source to be incoporated by various partners in the mobile phone industry. These partners, experienced with the Android build process, require source be delivered via git repositories. This is close to a “Continuous Release” process.

    Speaker Notes

    A large part of the FxOS product is code used in the browser products. That is in Mercurial and needs to be converted to git. Most new code modules for FxOS are developed on github, and need to be converted to Mercurial for use in our CI & build systems.

Summary

  • What we initially set out to do:
    • Make it purely a developer choice which dvcs to use.

      Speaker Notes

      Ideal was to allow developers to make dvcs as personal a choice as editor.

    • Support multiple social coding sites.

      Speaker Notes

      These social coding sites, such as github and bitbucket, make it much easier for new community members to contribute.

  • That was much tougher than anticipated.
    In theory, git & hg are very close...
    ... In practice, “the devil is in the details”.
  • Where we are:
    • Changed direction to support FFOS release to partners.
    • Quickly mirror Repository of Record (RoR) between git & hg.
    • CI/build system remains Mercurial centric.

Challenge Areas

  • Changesets have different hashes in Mercurial and git.
    • We added tooling to support both in static documents such as manifest files.
    • All tools continue to use hg hash as primary value for indexing and linking.
  • Propagation delays of changesets to the “other” system.

    Speaker Notes

    For most use cases, the approximately 20 minute average we’re achieving is acceptable.

    • Compounded by hash differences between two systems.

      Speaker Notes

      A common use case here is a developer wanting to start a self serve build. If the commit was to git, the self serve build won’t be successful until that commit is converted to hg.

      We are continuing work on this. It is closely tied to determining which commit broke the build, when multiple repositories are involved.

  • Build details
    • Movable tags are not popular in git based workflows, but have been a common technique at Mozilla to mark “latest”.

Challenge Areas (Con’t)

  • Mixed philosophies are often linked with mixed repositories.
    • Android never wants history to appear to change. Downstream servers allow only fast forward changesets and deny deletions.

    • Mozilla uses “RoR is authoritative”.

      Speaker Notes

      Either approach is self consistent. It is when the two need to interact that challenges arrise.

  • Conversion failures
    • Occasional hg-git conversion failures, due to implementation details of hg & git.

      Speaker Notes

      • Dates in export patches (e.g. hg uses seconds, git uses minutes, in time)
      • Email validation (git stricter than hg)
    • Since commit already accepted by hg, hg-git must be modified

      Speaker Notes

      This requires inhouse resources to respond urgently to patch the conversion machinery. Without conversion, there are no builds.

Alternate Approaches

  • To support your own “use the DVCS you want” infrastructure requires:
    • production quality hg server
    • production quality git server
    • in house ability to address conversion issues (as already mentioned)
  • I’m aware of two commercial alternatives. Both of these use a centralized RoR which supports git and/or hg interfaces for developer interaction.

    Speaker Notes

    And at least one explicitly does not have a git back end.

  • You can leave it to developers to scratch their own itch independently. Given diversity of workflows, this may be more cost effective than obtaining consensus.

Future Research

Areas of particular interest for further study include:

  • What is the set of enforceable assertions which would ensure the tooling can maintain lossless conversion between DVCS?

  • What minimum conditions must be maintained in conversions to preclude downstream conflicts?

  • What workflows can be supported to minimize issues?

  • Are there best practice incident management protocols for addressing problem commits.

    Speaker Notes

    The common example is a commit contains sensitive material it should not. There are cases were limiting the scope of distribution can have significant business value.

]]>
Mon, 20 May 2013 00:00:00 -0700
http://dtor.com/halfire/2012/04/10/the_dev_cycle.html http://dtor.com/halfire/2012/04/10/the_dev_cycle.html The canonical commit/push/land cycle at Mozilla

The canonical commit/push/land cycle at Mozilla

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Untangling the terminology

In the old days, before DVCS, “commit” only had only one real purpose. It was how you published your work to the rest of the world (or your project’s world at least). With DVCS, you are likely committing quite often, but still only occasionally publishing.

Read more...
]]>
Tue, 10 Apr 2012 00:00:00 -0700
http://dtor.com/halfire/2012/03/08/new_commit_workflow.html http://dtor.com/halfire/2012/03/08/new_commit_workflow.html Changes to commit workflow

Changes to commit workflow

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

With all the changes to support git, how will that affect a committer’s workflow? (For developer impact, see this post.)

The primary goal is to work within the existing Mozilla commit policy [1]. Working within that constraint, the idea is “as little as possible”, and this post will try to describe how big “as little” is.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Read more...]]>
Thu, 08 Mar 2012 00:00:00 -0800
http://dtor.com/halfire/2012/03/07/new_workflows.html http://dtor.com/halfire/2012/03/07/new_workflows.html Changes to Developer workflow

Changes to Developer workflow

[Refer to the main page for additional context.]

With all the changes to support git, how will that affect a developer’s workflow? (The committer’s workflow will be covered in a future post.)

The idea is “not much at all”, and this post will try to define “not much”.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Read more...]]>
Wed, 07 Mar 2012 00:00:00 -0800
http://dtor.com/halfire/2012/03/02/survey_data_summary.html http://dtor.com/halfire/2012/03/02/survey_data_summary.html DVCS Survey Summary

DVCS Survey Summary

Summary

A long time ago (December of 2011), I sent out a brief survey on DVCS usage to Mozilla folks (and asked them to spread wider). While there were only 42 responses, there were some interesting patterns.

Disclaimer!

I am neither a statistician nor a psychometrician.

I believe you can see the raw summary at via this link. What follows are the informal inferences I drew from the results (remember the disclaimer).

Commit to git is not the issue:

Read more...
]]> Fri, 02 Mar 2012 00:00:00 -0800 http://dtor.com/halfire/2012/02/21/wowzy_thats_a_feature.html http://dtor.com/halfire/2012/02/21/wowzy_thats_a_feature.html ... the Wowza feature of git

... the Wowza feature of git

tl;dr

Wowza! I found the killer feature in git - you can have your cake and eat it, too!

Every time I’ve had to move to a new VCS, there’s never been enough time available to move the complete history correctly. Linux had this problem in spades when they moved off BitKeeper onto git in a very-short-time.

The solution? Take your time to convert the history correctly (or not, you can correct later), then allow developers who want it to prepend it on their machines, without making their repo operate any differently from the latest one.

Read on for more about replace/graft feature.

Read more...
]]>
Tue, 21 Feb 2012 00:00:00 -0800
http://dtor.com/halfire/2012/02/01/releng_as-is_snapshot.html http://dtor.com/halfire/2012/02/01/releng_as-is_snapshot.html Releng As Is - January 2012

Releng As Is - January 2012

[Refer to the main page for additional context.]

Where we are in January 2012

The purpose of this post is to present a very high level picture of the current Firefox build & release process as a set of requirements. Some of these services are provided or supported by groups outside of releng (particularly it & webdev). This diagram will be useful in understanding the impact of changes.

Read more...
]]>
Wed, 01 Feb 2012 00:00:00 -0800
http://dtor.com/halfire/2012/01/26/releng_and_git_project.html http://dtor.com/halfire/2012/01/26/releng_and_git_project.html Releng & Git - Project Overview

Releng & Git - Project Overview

This is the first in a series of posts about the “support git in releng” project. The goal of this project, as stated in bug 713782, is:

... The idea here is to see if we can support git in Mozilla’s RelEng infrastructure, to at least the same standard (or better) as we already currently support hg.

My hope is that blog posts will be a better forum for discussion than the tracking bug 713782, or a wiki page, at this stage.

These posts will highlight the various issues, so that the vague definitions above become clear, as do the intermediate steps needed to achieve completion.

Read more...
]]>
Thu, 26 Jan 2012 00:00:00 -0800
http://dtor.com/halfire/2012/01/26/dvcs_end_result.html http://dtor.com/halfire/2012/01/26/dvcs_end_result.html The Ideal Future

The Ideal Future

[Refer to the main page for additional context.]

Based on discussions to date, everyone seems to have similar ideas about what “supporting git for releng” means. Later posts will highlight the work needed to ensure the ideal can be achieved, and how to arrive there.

For this post, I intend to limit the viewpoint and scope to that of the developer impact. Release notions (such as “system of record”) and scaling issues won’t be mentioned here. (N.B. Those concerns will be a key part of the path to verifying feasibility, but do not change the goal.)

As a reminder, I’m just talking about repositories that are used to produce products. [1]

Read more...]]>
Thu, 26 Jan 2012 00:00:00 -0800
http://dtor.com/halfire/2012/01/22/changed_viewpoint.html http://dtor.com/halfire/2012/01/22/changed_viewpoint.html ... a View from Outside

... a View from Outside

tl;dr

One of the things that excited me about the opportunity to work at Mozilla was the chance to change perspectives. After working in many closed environments, I knew the open source world of Mozilla would be different. And that would lead to a re-examination of basic questions, such as:

Q: Are there any significant differences in the role a VCS plays at Mozilla than at j-random-private-enterprise?

A: At the scale of Mozilla Products [1], I don’t believe there are.

But the question is important to ask! (And I hope to ask more of them.)

Read more...
]]>
Sun, 22 Jan 2012 00:00:00 -0800