I am the blog of Hal Fire, and I bring you…

TIL: fixing folder access from bash

Thu, 29 Jun 2023 00:00:00 -0700

TIL: fixing folder access from bash

I had a few scary moments the other day – I went to ls ~/Desktop and received a permission denied error!!!!

Ultimately, this wasn’t a “real” problem, just that my terminal program had lost permission to access that directory. But before I figured that out, I went down the rabbit hole of weird Apple file system attributes. (I always forget that the command is xattr and not chattr.)

Things got really confusing when I noticed that the com.apple.macl permission on ~/Desktop had a value of “-1”! And that wasn’t changeable from the terminal! Not even via sudo. Shades of selinux!

Eventually, the rabbit warren led me to a blog post, which quoted a hacker news article about a side effect of pasting a Finder link into a terminal window. Ta da! Copy/paste and the problem was solved. (Of course, I have no idea how the corruption happened, which is a different issue.)

Backing up WSL

Sat, 07 Mar 2020 00:00:00 -0800

Backing up WSL

I love using WSL – most of my daily work is done there. Almost all the rest done with cloud based tools, so the only thing I need to backup is WSL.

The problem is, my company’s backup software of choice will only handle “real” windows files. It gets quite unhappy if you ask it to backup the WSL virtual drive.

My solution: bup. While not the “latest hotness”, it was trivial to install and run. I ended up writing a wrapper script to add a “--backup” option, and default my destination.

My approach:

Install bup
Designate a Windows directory as a destination. I chose “%HOMEPATH%\bup”
Write a wrapper script, to avoid having to remember the bup options and command sequence. The important parts are:

#!/usr/bin/env bash
# wrap bup with my default location
# support my default usage

# use the WSL location of the Windows directory
export BUP_DIR="${BUP_DIR:-/c/Users/hwine/bup}"
real_bup=$(type -ap bup | tail -n +2 | head -1)

if $do_backup; then
    time "${real_bup}" index "${HOME}"
    time "${real_bup}" save -n "${HOME##*/}" "${HOME}"
else
    "${real_bup}" "$@"
fi

Note that the “bup index” operation is the long pole on any backup. After a typical day’s work, the index takes about 5 minutes, and the actual backup is less than 10 seconds.

Installing Git pre-commit globally.

Sat, 28 Dec 2019 00:00:00 -0800

Installing Git pre-commit globally.

While there are a number of instructions about installing pre-commit globally on the web, I didn’t find one with all the extras to convince my colleagues. This is that:

Windows Insider Fastring update tips

Sat, 28 Dec 2019 00:00:00 -0800

Windows Insider Fastring update tips

Recently, I’ve been on the Windows Insider fast ring, where updates come twice a week. At times, updates would not succeed on first try, but would on second. This gradually got worse and worse, and I found a workaround.

WSL Tips

Thu, 25 Jul 2019 00:00:00 -0700

WSL Tips

Starting a new tag for various WSL (Windows Subsystem for Linux). These will likely get less relevant over time. (I am _so_ looking forward to the 2019 Fall update.)

These tips are what got me started, and consists of both WSL specific practices, but also maintaining a similar approach for working in other operating systems (Windows and MacOS being the top.

Using Auto Increment Fields to Your Advantage

Fri, 21 Oct 2016 00:00:00 -0700

Using Auto Increment Fields to Your Advantage

I just found, and read, Clément Delafargue’s post “Why Auto Increment Is A Terrible Idea” (via @CoreRamiro). I agree that an opaque primary key is very nice and clean from an information architecture viewpoint.

However, in practice, a serial (or monotonically increasing) key can be handy to have around. I was reminded of this during a recent situation where we (app developers & ops) needed to be highly confident that a replica was consistent before performing a failover. (None of us had access to the back end to see what the DB thought the replication lag was.)

Sphinx tip: mailhtml

Mon, 29 Aug 2016 00:00:00 -0700

Sphinx tip: mailhtml

I often find that I want to email around a doc I’ve put together with sphinx (I often use the *diag or graphviz extensions). Sadly, the world hasn’t embraced the obvious way of supporting this via ePub [1] readers everwhere. What I want is plain html output, with nothing fancy. There’s probably a style out there, but I just add the following target to the Makefile generated by sphinx-quickstart:

mailhtml:
        $(SPHINXBUILD) -b singlehtml -D html_theme=epub $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
        @echo
        @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."

[1]	My favorite reader is a the Firefox extension “EPUBReader”

Py Bay 2016 - a First Report

Mon, 22 Aug 2016 00:00:00 -0700

Py Bay 2016 - a First Report

PyBay held their first local Python conference this last weekend (Friday, August 19 through Sunday, August 21). What a great event! I just wanted to get down some first impressions - I hope to do more after the slides and videos are up.

Legacy vcs-sync is dead! Long live vcs-sync!

Mon, 18 Jul 2016 00:00:00 -0700

Legacy vcs-sync is dead! Long live vcs-sync!

tl;dr: No need to panic - modern vcs-sync will continue to support the gecko-dev & gecko-projects repositories.

Today’s the day to celebrate! No more bash scripts running in screen sessions providing dvcs conversion experiences. Woot!!!

I’ll do a historical retrospective in a bit. Right now, it’s time to PARTY!!!!!

End of an Experiment

Tue, 12 Jul 2016 00:00:00 -0700

End of an Experiment

tl;dr: We’ll be shutting down the Firefox mirrors on Bitbucket.

A long time ago we started an experiment to see if there was any support for developing Mozilla products on social coding sites. Well, the community-at-large has spoken, with the results many predicted:

YES!!! when the social coding site is GitHub

No, when the social coding site is Bitbucket

Enterprise Software Writers R US

Sat, 23 Apr 2016 00:00:00 -0700

Enterprise Software Writers R US

Someone just accused me of writing Enterprise Software!!!!!

Well, the “someone” is Mahmoud Hashemi from PayPal, and I heard him on the Talk Python To Me podcast (episode 54). That whole episode is quite interesting - go listen to it.

pyenv & virtualenv can get along

Sat, 23 Apr 2016 00:00:00 -0700

pyenv & virtualenv can get along

In my (apparently) continuing list of tiny hassles with PyEnv, I finally figured out how to “fix” the PyEnv notion of a virtualenv. This may apply only to my setup: my main python version is managed by homebrew.

ipython & venvs

Tue, 19 Apr 2016 00:00:00 -0700

ipython & venvs

As of IPython 4, the procedure for generating kernels in venvs has changed a bit. After some research, the following works for me:

. path/to/venv/bin/activate # or whatever
pip install ipykernel
python -m ipykernel install --user \
    --name myenv --display-name "Python (myenv)"

If you’re running the jupyter notebook, do a full page reload to get the new kernel name displayed in the menu.

Tuning Legacy vcs-sync for 2x profit!

Thu, 03 Dec 2015 00:00:00 -0800

Tuning Legacy vcs-sync for 2x profit!

One of the challenges of maintaining a legacy system is deciding how much effort should be invested in improvements. Since modern vcs-sync is “right around the corner”, I have been avoiding looking at improvements to legacy (which is still the production version for all build farm use cases).

While adding another gaia branch, I noticed that the conversion path for active branches was both highly variable and frustratingly long. It usually took 40 minutes for a commit to an active branch to trigger a build farm build. And worse, that time could easily be 60 minutes if the stars didn’t align properly. (Actually, that’s the conversion time for git -> hg. There’s an additional 5-7 minutes, worst case, for b2g_bumper to generate the trigger.)

The full details are in bug 1226805, but a simple rearrangement of the jobs removed the 50% variability in the times and cut the average time by 50% as well. That’s a savings of 20-40 minutes per gaia push!

Moral: don’t take your eye off the legacy systems – there still can be some gold waiting to be found!

Complexity & * Practices

Fri, 13 Nov 2015 00:00:00 -0800

Complexity & * Practices

I was fortunate enough to be able to attend Dev Ops Days Silicon Valley this year. One of the main talks was given by Jason Hand, and he made some great points. I wanted to highlight two of them in this post:

Post Mortems are really learning events, so you should hold them when things go right, right? RIGHT!! (Seriously, why wouldn’t you want to spot your best ideas and repeat them?)

Systems are hard – if you’re pushing the envelope, you’re teetering on the line between complexity and chaos. And we’re all pushing the envelope these days - either by getting fancy or getting lean.

Post Mortems as Learning Events

Our industry has talked a lot about “Blameless Post Mortems”, and techniques for holding them. Well, we can call them “blameless” all we want, but if we only hold them when things go wrong, folks will get the message loud and clear.

If they are truly blameless learning events, then you would also hold them when things go right. And go meh. Radical idea? Not really - why else would sports teams study game films when they win? (This point was also made in a great Ignite by Katie Rose: GridIronOps - go read her slides.)

My $0.02 is - this would also give us a chance to celebrate success. That is something we do not do enough, and we all know the dedication and hard work it takes to not have things go sideways.

And, by the way, terminology matters during the learning event. The person who is accountable for an operation is just that: capable of giving an account of the operation. Accountability is not responsibility.

Terminology and Systems – Setting the right expectations

Part way through Jason’s talk, he has this awesome slide about how system complexity relates to monitoring which relates to problem resolution. Go look at slide 19 - here’s some of what I find amazing in that slide:

It is not a straight line with a destination. Your most stable system can suddenly display inexplicable behavior due to any number of environmental reasons. And you’re back in the chaotic world with all that implies.

Systems can progress out of chaos, but that is an uphill battle. Knowing which stage a system is in (roughly) informs the approach to problem resolution.

Note the wording choices: “known” vs “unknowable” – for all but the “obvious” case, it will be confusing. That is a property of the system, not a matter of staff competency.

While not in his slide, Jason spoke to how each level really has different expectations. Or should have, but often the appropriate expectation is not set. Here’s how he related each level to industry terms.

Best Practices:

The only level with enough certainty to be able to expect the “best” is the known and familiar one. This is the “obvious” one, because we’ve all done exactly this before over a long enough time period to fully characterize the system, its boundaries, and abnormal behavior.

Here, cause and effect are tightly linked. Automation (in real time) is possible.

Good Practices:

Once we back away from such certainty, it is only realistic to have less certainty in our responses. With the increased uncertainty, the linkage of cause and effect is more tenuous.

Even if we have all the event history and logs in front of us, more analysis is needed before appropriate corrective action can be determined. Even with automation, there is a latency to the response.

Emergent Practices:

Okay, now we are pushing the envelope. The system is complex, and we are still learning. We may not have all the data at hand, and may need to poke the system to see what parts are stuck.

Cause and effect should be related, but how will not be visible until afterwards. There is much to learn.

Novel Practices:

For chaotic systems, everything is new. A lot is truly unknowable because that situation has never occurred before. Many parts of the system are effectively black boxes. Thus resolution will often be a process of trying something, waiting to see the results, and responding to the new conditions.

Next Steps

There is so much more in that diagram I want to explore. The connecting of problem resolution behavior to complexity level feels very powerful.

My experience tells me that many of these subjective terms are highly context sensitive, and in no way absolute. Problem resolution at 0300 local with a bad case of the flu just has a way of making “obvious” systems appear quite complex or even chaotic.

By observing the behavior of someone trying to resolve a problem, you may be able to get a sense of how that person views that system at that time. If that isn’t the consensus view, then there is a gap. And gaps can be bridged with training or documentation or experience.

duo MFA & viscosity no-cell setup

Fri, 02 Oct 2015 00:00:00 -0700

duo MFA & viscosity no-cell setup

The Duo application is nice if you have a supported mobile device, and it’s usable even when you you have no cell connection via TOTP. However, getting Viscosity to allow both choices took some work for me.

For various reasons, I don’t want to always use the Duo application, so would like for Viscosity to alway prompt for password. (I had already saved a password - a fresh install likely would not have that issue.) That took a bit of work, and some web searches.

Disable any saved passwords for Viscosity. On a Mac, this means opening up “Keychain Access” application, searching for “Viscosity” and deleting any associated entries.
Ask Viscosity to save the “user name” field (optional). I really don’t need this, as my setup uses a certificate to identify me. So it doesn’t matter what I type in the field. But, I like hints, so I told Viscosity to save just the user name field:
```
defaults write com.viscosityvpn.Viscosity RememberUsername -bool true
```

With the above, you’ll be prompted every time. You have to put “something” in the user name field, so I chose to put “push or TOTP” to remind me of the valid values. You can put anything there, just do not check the “Remember details in my Keychain” toggle.

Using Password Store

Tue, 22 Sep 2015 00:00:00 -0700

Using Password Store

Password Store (aka “pass”) is a very handy wrapper for dealing with pgp encrypted secrets. It greatly simplifies securely working with multiple secrets. This is still true even if you happen to keep your encrypted secrets in non-password-store managed repositories, although that setup isn’t covered in the docs. I’ll show my setup here. (See the Password Store page for usage: “pass show -c ” & “pass search ” are among my favorites.)

Short version:

Have gpg installed on your machine.
Install Password Store on your machine. There are OS specific instructions. Be sure to enable tab completion for your shell!
Setup a local password store. Scroll down in the usage section to “Setting it up” for instructions.
Clone your secrets repositories to your normal location. Do not clone inside of ~/.password-store/.
Set up symlinks inside of ~/.password-store/ to directories inside your clone of the secrets repository. I did:
```
ln -s ~/path/to/secrets-git/passwords rePasswords
ln -s ~/path/to/secrets-git/keys reKeys
```
Enjoy command line search and retrieval of all your secrets. (Use the regular method for your separate secrets repository to add and update secrets.)

Rationale:

By using symlinks, pass will not allow me to create or update secrets in the other repositories. That prevents mistakes, as the process is different for each of those alternate stores.
I prefer to have just one tree of secrets to search, rather than the “multiple configuration” approach documented on the Password Store site.
By using symlinks, I can control the global namespace, and use names that make sense to me.
I’ve migrated from using KeePassX to using pass for my personal secret management. That is my “main” password-store setup (backed by a git repo).

Notes:

If you’d prefer a GUI, there’s qtpass which also works with the above setup.

Decoding Hashed known_hosts Files

Thu, 30 Jul 2015 00:00:00 -0700

Decoding Hashed known_hosts Files

tl;dr: You might find this gist handy if you enable HashKnownHosts

Modern ssh comes with the option to obfuscate the hosts it can connect to, by enabling the HashKnownHosts option. Modern server installs have that as a default. This is a good thing.

The obfuscation occurs by hashing the first field of the known_hosts file - this field contains the hostname,port and IP address used to connect to a host. Presumably, there is a private ssh key on the host used to make the connection, so this process makes it harder for an attacker to utilize those private keys if the server is ever compromised.

Super! Nifty! Now how do I audit those files? Some services have multiple IP addresses that serve a host, so some updates and changes are legitimate. But which ones? It’s a one way hash, so you can’t decode.

Well, if you had an unhashed copy of the file, you could match host keys and determine the host name & IP. [1] You might just have such a file on your laptop (at least I don’t hash keys locally). [2] (Or build a special file by connecting to the hosts you expect with the options “-o HashKnownHosts=no -o UserKnownHostsFile=/path/to/new_master”.)

I through together a quick python script to do the matching, and it’s at this gist. I hope it’s useful - as I find bugs, I’ll keep it updated.

Bonus Tip: https://github.com/defunkt/gist

Is a very nice way to manage gists from the command line.

Footnotes

[1]	A lie - you’ll only get the host name and IP’s that you have connected to while building your reference known_hosts file.

[2]	I use other measures to keep my local private keys unusable.

GMail multi-inbox

Wed, 01 Apr 2015 00:00:00 -0700

GMail multi-inbox

As much as GMail’s search syntax makes me long for PCRE, there are some unobvious gems laying around.

For example, I get tons of mail about releases. Occasionally, I need to monitor a given release, paying attention to not only the automated progress, but also human generated emails as well. Here’s my current setup:

Automated email is marked as read & skips inbox (unless it’s a failure)
Any release oriented email is given a special label using a filter similar to “subject:((38.0b1) OR (38 Beta) OR (31. AND "esr")”.

That’s pretty standard. The productivity add is when I use the “multi-inbox” feature in the web ui. I set the top one to be just the unread ones with the special label from today:

newer_than:1d label:SPECIAL_LABEL is:unread

With positioning of “extra panels” to the right side, I get a very focussed look at any issues I need to look at!

Messages:

No Messages:

I love seeing that “(no messages)” text!

Docker at Vungle

Tue, 10 Mar 2015 00:00:00 -0700

Docker at Vungle

Tonight I attended the San Francisco Dev Ops meetup at Vungle. The topic was one we often discuss at Mozilla - how to simplify a developer’s life. In this case, the solution they have migrated to is one based on Docker, although I guess the title already gave that away.

Long (but interesting - I’ll update with a link to the video when it becomes available) story short, they are having much more success using DevOps managed Docker containers for development than their previous setup of Virtualbox images built & maintained with Vagrant and Chef.

Vungle’s new hire setup:

install Boot2Docker (they are an all Mac dev shop)
clone the repository. [1]
run docker.sh script which pulls all the base images from DockerHub. This one time image pull gives the new hire time to fill out HR paperwork ;)
launch the app in the container and start coding.

Sigh. That’s nice. When you come back from PTO, just re-run the script to get the latest updates - it won’t take nearly as long as only the container deltas need to come down. Presto - back to work!

A couple of other highlights – I hope to do a more detailed post later.

They follow the ‘each container has a single purpose’ approach.

They use “helper containers” to hold recent (production) data.

Devs have a choice in front end development: inside the container (limited tooling) or in the local filesystem (dev’s choice of IDE, etc.). [2]

Currently, Docker containers are only being used in development. They are looking down the road to deploying containers in production, but it’s not a major focus at this time.

Footnotes

[1]	Thanks to BFG for clarifying that docker-foo is kept in a separate repository from source code. The docker.sh script is in the main source code repository. [Updated 2015-03-11]

[2]	More on this later. There are some definite tradeoffs.

Kaizen the low tech way

Fri, 06 Feb 2015 00:00:00 -0800

Kaizen the low tech way

On Jan 29, I treated myself to a seminar on Successful Lean Teams, with an emphasis on Kanban & Kaizen techniques. I’d read about both, but found the presentation useful. Many of the other attendees were from the Health Care industry and their perspectives were very enlightening!

Hearing how successful they were in such a high risk, multi-disciplinary, bureaucratic, and highly regulated environment is inspiring. I’m inclined to believe that it would also be achievable in a simple-by-comparison low risk environment of software development. ;)

What these hospitals are using is a light weight, self managed process which:

ensures visibility of changes to all impacted folks

outlines the expected benefits

includes a “trial” to ensure the change has the desired impact

has a built in feedback system

That sounds achievable. In several of the settings, the traditional paper and bulletin board approach was used, with 4 columns labeled “New Ideas”, “To Do”, “Doing”, and “Done”. (Not a true Kanban board for several reasons, but Trello would be a reasonable visual approximation; CAB uses spreadsheets.)

Cards move left to right, and could cycle back to “New Ideas” if iteration is needed. “New Ideas” is where things start, and they transition from there (I paraphrase a lot in the following):

Everyone can mark up cards in New Ideas & add alternatives, etc.
A standup is held to select cards to move from “New Ideas” to “To Do”
The card stays in “To Do” for a while to allow concerns to be expressed by other stake holders. Also a team needs to sign up to move the change through the remaining steps. Before the card can move to “Doing”, a “test” (pilot or checkpoints) is agreed on to ensure the change can be evaluated for success.
The team moves the card into “Doing”, and performs PSDA cycles (Plan, Do, Study, Adjust) as needed.
Assuming the change yields the projected results, the change is implemented and the card is moved to “Done”. If the results aren’t as anticipated, the card gets annotated with the lessons learned, and either goes to “Done” (abandon) or back to “New Ideas” (try again) as appropriate.

For me, I’m drawn to the 2nd and 3rd steps. That seems to be the change from current practice in teams I work on. We already have a gazillion bugs filed (1st step). We also can test changes in staging (4th step) and update production (5th step). Well, okay, sometimes we skip the staging run. Occasionally that *really* bites us. (Foot guns, foot guns – get your foot guns here!)

The 2nd and 3rd steps help focus on changes. And make the set of changes happening “nowish” more visible. Other stakeholders then have a small set of items to comment upon. Net result - more changes “stick” with less overall friction.

Painting with a broad brush, this Kaizen approach is essentially what the CAB process is that Mozilla IT implemented successfully. I have experienced the CAB reduce the amount of stress, surprises, and self inflicted damage amongst both inside and outside of IT. Over time, the velocity of changes has increased and backlogs have been reduced. In short, it is a “Good Thing(tm)”.

So, I’m going to see if there is a way to “right size” this process for the smaller teams I’m on now. Stay tuned….

Pyevn & Tox Can Get Along

Fri, 23 Jan 2015 00:00:00 -0800

Pyevn & Tox Can Get Along

I fought this for quite a few days on a background project. I finally found the answer, and want to ensure I don’t forget it.

tl;dr:

Activate all the python versions you need before running tox.

After I upgraded my laptop to OSX 10.10, I also switched to using pyenv for installing non-system python versions. Things went well (afaict) until they didn’t. All of a sudden, I could not get both my code tests to pass, and my doc build to succeed.

The error message was especially confusing:

pyenv: python2.7: command not found
The `python2.7' command exists in these Python versions:
2.7.5

Searching the web didn’t really shed any enlightenment. I’d find other folks who had the problem. I wasn’t alone. But they all disappeared from the bug traffic over a year ago (example). And with no sign of resolution.

Finally, I tried different search terms, and landed on this post. The secret – you can have multiple pyevn instances “active”. The first listed is the one that a bare python will invoke. The others are available as python*major*.*minor* (e.g. “python3.2”) and python*major* (e.g. “python3”)

ChatOps Meetup

Sat, 10 Jan 2015 00:00:00 -0800

ChatOps Meetup

This last Wednesday, I went to a meetup on ChatOps organized by SF DevOps, hosted by Geekdom (who also made recordings available), and sponsored by TrueAbility.

I had two primary goals in attending: I wanted to understand what made ChatOps special, and I wanted to see how much was applicable to my current work at Mozilla. The two presentations helped me accomplish the first. I’m still mulling over the second. (Ironically, I had to shift focus during the event to clean up a deployment-gone-wrong that was very close to one of the success stories mentioned by Dan Chuparkoff.)

My takeaway on why chatops works is that it is less about the tooling (although modern web services make it a lot easier), and more about the process. Like a number of techniques, it appears to be more successful when teams fully embrace their vision of ChatOps, and make implementation a top priority. Success is enhanced when the tooling supports the vision, and that appears to be what all the recent buzz is about – lots of new tools, examples, and lessons learned make it easier to follow the pioneers.

What are the key differentiators?

Heck, many teams use irc for operational coordination. There are scripts which automate steps (some workflows can be invoked from the web even). We’ve got automated configuration, logging, dashboards, and wikis – are we doing ChatOps?

Well, no, we aren’t.

Here are the differences I noted:

ChatOps requires everyone both agreeing and committing to a single interface to all operations. (The opsbot, like hubot, lita or Err.) Technical debt (non-conforming legacy systems) will be reworked to fit into ChatOps.
ChatOps requires focus and discipline. There are a small number of channels (chat rooms, MUC) that have very specific uses - and folks follow that. High signal to noise ratio. (No animated gifs in the deploy channel - that’s what the lolcat channel is for.)
A commitment to explicitly documenting all business rules as executable code.

What do you get for giving up all those options and flexibility? Here was the “ah ha!” concepts for me:

Each ChatOps room is a “shared console” everyone can see and operate. No more screen sharing over video, or “refresh now” coordination!

There is a bot which provides the “facts” about the world. One view accessible by all.

The bot is also the primary way folks interact and modify the system. And it is consistent in usage across all commands. (The bot extensions perform the mapping to whatever the backend needs. The code adapts, not the human!)

The bot knows all and does all:

Where’s the documentation?

How do I do X?

Do X!

What is the status of system Y?

The bot is “fail safe” - you can’t bypass the rules. (If you code in a bypass, well, you loaded that foot gun!)

Thus everything is consistent and familiar for users, which helps during those 03:00 forays into a system you aren’t as familiar with. Nirvana ensues (remember, everyone did agree to drink the koolaid above).

Can you get there from here?

The speaker selection was great – Dan was able to speak to the benefits of committing to ChatOps early in a startup’s life. James Fryman (from StackStorm) showed a path for migrating existing operations to a ChatOps model. That pretty much brackets the range, so yeah, it’s doable.

The main hurdle, imo, would be getting the agreement to a total commitment! There are some tensions in deploying such a system at a highly open operation like Mozilla: ideally chat ops is open to everyone, and business rules ensure you can’t do or see anything improper. That means the bot has (somewhere) the credentials to do some very powerful operations. (Dan hopes to get their company to the “no one uses ssh, ever” point.)

My next steps? Still thinking about it a bit – I may load Err onto my laptop and try doing all my local automation via that.

bz Quick Search

Thu, 02 Oct 2014 00:00:00 -0700

bz Quick Search

With the new developer services components, I find myself once again updating my Bugzilla Quick Search search plugin. This time, I’ll document it. :)

Here are the steps:

Determine the quick search parameters you want. Experimenting on the Bugzilla Quick Search page is useful.
If this is your first time, install a search engine that you can copy and modify. The bugzilla one is an obvious good choice.
Find the xml file for the search engine in the “searchplugins” directory of your profile. Modify the “template” attribute in the “os:Url” element based on your research in (1). I tend to put all my customization after the special token “{searchTerms}”, as that makes it easier to refine the search on the bugzilla search results page.
Add a keyword to this search, for ease of use in the awesome bar.
Enjoy!

[edit: here’s my current file as a sample]

New Hg Server Status Page

Sat, 06 Sep 2014 00:00:00 -0700

New Hg Server Status Page

Just a quick note to let folks know that the Developer Services team continues to make improvements on Mozilla’s Mercurial server. We’ve set up a status page to make it easier to check on current status.

As we continue to improve monitoring and status displays, you’ll always find the “latest and greatest” on this page. And we’ll keep the page updated with recent improvements to the system. We hope this page will become your first stop whenever you have questions about our Mercurial server.

2014-06 try server update

Wed, 02 Jul 2014 00:00:00 -0700

2014-06 try server update

Chatting with Aki the other day, I realized that word of all the wonderful improvements to the try server issue have not been publicized. A lot of folks have done a lot of work to make things better - here’s a brief summary of the good news.

Before:: Try server pushes could appear to take up to 4 hours, during which time others would be locked out.
Now:: The major time taker has been found and eliminated: ancestor processing. And we understand the remaining occasional slow downs are related to caching . Fortunately, there are some steps that developers can take now to minimize delays.

What folks can do to help

The biggest remaining slowdown is caused by rebuilding the cache. The cache is only invalidated if the push is interrupted. If you can avoid causing a disconnect until your push is complete, that helps everyone! So, please, no Ctrl-C during the push! The other changes should address the long wait times you used to see.

What has been done to infrastructure

There has long been a belief that many of our hg problems, especially on try, came from the fact that we had r/w NFS mounts of the repositories across multiple machines (both hgssh servers & hgweb servers). For various historical reasons, a large part of this was due to the way pushlog was implemented.

Ben did a lot of work to get sqlite off NFS, and much of the work to synchronize the repositories without NFS has been completed.

What has been done to our hooks

All along, folks have been discussing our try server performance issues with the hg developers. A key confusing issue was that we saw processes “hang” for VERY long times (45 min or more) without making a system call. Kendall managed to observe an hg process in such an infinite-looking-loop-that-eventually-terminated a few times. A stack trace would show it was looking up an hg ancestor without makes system calls or library accesses. In discussions, this confused the hg team as they did not know of any reason that ancestor code should be being invoked during a push.

Thanks to lots of debugging help from glandium one evening, we found and disabled a local hook that invoked the ancestor function on every commit to try. \o/ team work!

Caching – the remaining problem

With the ancestor-invoking-hook disabled, we still saw some longish periods of time where we couldn’t explain why pushes to try appeared hung. Granted it was a much shorter time, and always self corrected, but it was still puzzling.

A number of our old theories, such as “too many heads” were discounted by hg developers as both (a) we didn’t have that many heads, and (b) lots of heads shouldn’t be a significant issue – hg wants to support even more heads than we have on try.

Greg did a wonderful bit of sleuthing to find the impact of ^C during push. Our current belief is once the caching is fixed upstream, we’ll be in a pretty good spot. (Especially with the inclusion of some performance optimizations also possible with the new cache-fixed version.)

What is coming next

To take advantage of all the good stuff upstream Hg versions have, including the bug fixes we want, we’re going to be moving towards removing roadblocks to staying closer to the tip. Historically, we had some issues due to http header sizes and load balancers; ancient python or hg client versions; and similar. The client issues have been addressed, and a proper testing/staging environment is on the horizon.

There are a few competing priorities, so I’m not going to predict a completion date. But I’m positive the future is coming. I hope you have a glimpse into that as well.

CVS Attic in DVCS

Thu, 26 Jun 2014 00:00:00 -0700

CVS Attic in DVCS

One handy feature of CVS was the presence of the Attic directory. The primary purpose of the Attic directory was to simplify trunk checkouts, while providing space for both removed and added-only-on-branch files.

As a consequence of this, it was relatively easy to browse all such file names. I often would use this as my “memory” of scripts I had written for specific purposes, but were no longer needed. Often these would form the basis for a future special purpose script.

This isn’t a very commonly needed use case, but I have found myself being a bit reluctant to delete files using DVCS systems, as I wasn’t quite sure how to find things easily in the future.

Well, I finally scratched the itch – here are the tricks I’ve added to my toolkit.

Hg version

A simplistic version, which just shows when file names were deleted, is to add the alias to ~/.hgrc:

[alias]
attic=log --template '{rev}:{file_dels}\n'

Git version

Very similar for git:

git config --global alias.attic 'log --diff-filter=D --summary'

(Not actually ideal, as not a one liner, but good enough for how often I use this.)

Bluetooth Finder for Fitbit

Thu, 06 Mar 2014 00:00:00 -0800

Bluetooth Finder for Fitbit

Pro tip - if you have a Fitbit or other small BLE device, go get a “bluetooth finder” app for your smartphone or tablet. NOW. No thanks needed.

I ended up spending far-too-long looking for my misplaced black fitbit One last weekend. Turned out the black fitbit was behind a black sock on a shelf in a dark closet. (Next time, I’ll get a fuchsia colored on – I don’t have too many pairs of fuchsia socks.)

After several trips through the house looking, I thought I’d turn to technology. By seeing where in the house I could still sync with my phone, I could confirm it was in the house. I tried setting alarms on the fitbit, but I couldn’t hear them go off. (Likely, the vibrations were completely muffled by the sock. Socks - I should just get rid of them.)

Then I had the bright idea of asking the interwebs for help. Surely, I couldn’t be the first person in this predicament. I was rewarded with this FAQ on the fitbit site, but I’d already followed those suggestions.

Finally, I just searched for “finding bluetooth”, and discovered the tv ads were right: there is an app for that! Since I was on my android tablet at the time, I ended up with Bluetooth Finder, and found my Fitbit within 5 minutes. (I also found a similar app for my iPhone, but I don’t find it as easy to use. Displaying the signal strength on a meter is more natural for me than watching dB numbers.)

More VIM fun: Vundle

Sat, 23 Nov 2013 00:00:00 -0800

More VIM fun: Vundle

During this last RelEng workweek, I thought I’d try a new VIM plugin for reST: RIV. While that didn’t work out great (yet), it did get me to start using Vundle. Vundle is a quite nice vim plugin manager, and is easier for me to understand than Pathogen.

However, the Vundle docs didn’t cover two cases I care about:

converting Pathogen modules for Vundle usage

using with bundles not managed by either Pathogen or Vundle. (While Vundle running won’t interfere with unmanaged bundles, the :BundleClean command will claim they are unused and offer to delete them. That’s just too risky for me.)

The two cases appear to have the same solution:

ensure all directories in the bundle location (typically ~/.vim/bundles/) are managed by Vundle.

use a file:// URI for any bundle you don’t want Vundle to update.

For example, I installed the ctrlp bundle a while back, from the Bitbucket (hg) repository. (Yes, there (now?) is a github repository, but why spoil my fun.) Since the hg checkout already lived in ~/.vim/bundle, I only needed to add the following line to my vimrc file:

Bundle 'file:///~/.vim/bundle/ctrlp.vim/'

Vundle no longer offers to delete that repository when BundleClean is run.

I suspect I’ll get errors if I ever asked Vundle to update that repo, but that isn’t in my plans. I believe my major use case for Vundle will be to trial install plugins, and then BundleClean will clean things up safely.

MySQL & Python on Mac

Fri, 06 Sep 2013 00:00:00 -0700

MySQL & Python on Mac

I don’t have MySQL installed globally, so need to do this dance every time I add it to a new virtualenv:

Install the bindings in the virtual env. The package name is MySQL-python.

Symlink libmysqlclient.18.dylib from the /usr/local/mysql/lib tree into site-packages of the virtualenv

Add the following to the virtual env’s activate script::

DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/path/to/venv/site-package

optionally add /usr/local/mysql/bin to PATH as well.

Inter Repository Operations

Wed, 19 Jun 2013 00:00:00 -0700

Inter Repository Operations

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Mozilla, like most operations, has the Repositories of Record (RoR) set to only allow “fast forward” updates when new code is landed. In order to fast forward merge, the tip of the destination repository (RoR) must be an ancestor of the commit being pushed from the source repository. In the discussion below, it will be useful to say if a repository is “ahead”, “behind”, or “equal” to another. These states are defined as:

If the tip of the two repositories are the same reference, then the two repositories are said to be equal (’e’ in table below)

Else if the tip of the upstream repository is a ancestor of the tip of the destination repository, the upstream is defined to be behind (’B’ in table below) the source repository

Otherwise, the upstream repository is ahead (’A’ in table below) of the source repository.

Landing a change in the normal (2 repository case: RoR and lander’s repository), the process is logically (assuming no network issues):

Make sure lander’s repository is equivalent to RoR (start with equality)

Apply the changes (RoR is now “Behind” the local repository)

Push the changes to the RoR

if the push succeeds, then stop. (equality restored)

if the push fails, simultaneous landings were being attempted, and you lost the race.

When simultaneous landings are attempted, only one will succeed, and the others will need to repeat the landing attempt. The RoR is now “Ahead” of the local repository, and the new upstream changes will need to be incorporated, logically as:

Remove the local changes (“patch -R”, “git stash”, “hg import”, etc.).

Pull the changes from RoR (will apply cleanly, equality restored)

Continue from step 2 above

When an authorized committer wants to land a change set on an hg RoR from git, there are three repositories involved. These are the RoR, the git repository the lander is working in, and internal hggit used for translation. The sections below describe how this affects the normal case above.

Land from git – Happy Path

On the happy path (no commit collisions, no network issues), the steps are identical to the normal path above. The git commands executed by the lander are set by the tool chain to perform any additional operations needed.

Land from git – Commit Collision

Occasionally, multiple people will try to land commits simultaneously, and a commit collision will occur (steps 3a, 3b, & 3c above). As long as the collision is noticed and dealt with before addition changes are committed to the git repository, the tooling will unapply the change to the internal hggit repository.

Land from git – Sad Path

In real life, network connections fail, power outages occur, and other gremlins create the need to deal with “sad paths”. The following sections are only needed when we’re neither on the happy path nor experiencing a normal commit collision.

Because these cases cover every possible case of disaster recovery, it can appear more complex than it is. While there are multiple (6) different sad paths, only one will be in play for a given repository. And the maximum number of operations to recover is only three (3). The relationship between each pair of repositories determines the correct actions to take to restore the repositories to a known, consistent state. The static case is simply:

Simplistic Recovery State Diagram

Note

The simplistic diagram assumes no changes to RoR during the duration of the recovery (not a valid assumption for real life). See the text for information on dealing with the changes.
States “BB” & “BA” are not shown, as they represent invalid states that may require restoring portions of the system from backup before proceeding.

In reality, it is impractical to guarantee the RoR is static during recovery steps. That can be dealt with by applying the process described in the flowchart to restore equality and using the tables below to locate the actions.

The primary goal is to ensure correctness based on the RoR. The secondary goal is to make the interim repository as invisible as possible.

Key	RoR <-> hggit	hggit <-> git	Interpretation	Next Step to Equality
Ae	Ahead	equal	someone else landed	pull from RoR
AA	Ahead	Ahead	someone else landed [1]	pull from RoR
AB	Ahead	Behind	someone else landed [1]	back out local changes (3a above)
ee	equal	equal	equal	nothing to do
eA	equal	Ahead	someone else landed [2]	pull to git
eB	equal	Behind	ready to land	push from git
Be	Behind	equal	ready to land [2]	push to RoR
BA	Behind	Ahead	prior landing not finished, lost from git [3]	corrupted setup, see note
BB	Behind	Behind	prior landing not finished, next started [4]	back out local changes (3a above) from 2nd landing

Table Notes

[1] (1, 2) This is the common situation of needing to update (and possibly re-merge local changes) prior to landing the change

[2] (1, 2) If the automation is working correctly, this is only a transitory stage, and no manual action is needed. IRL, stuff happens, so an explicit recovery path is needed.

[3] This “shouldn’t happen”, as it implies the git repository has been restored from a backup and the “pending landing” in the hggit repository is no longer a part of the git history. If there isn’t a clear understanding of why this occurred, client side repository setup should be considered suspect, and replaced.

[4]
Lander shot themselves in the foot - they have 2 incomplete landings in progress. If they are extremely lucky, they can recover by completing the first landing (“hg push RoR” -> “eB”), and proceed from there.

The deterministic approach, which also must be used if landing of first change set fails, is to back out second landing from hggit and git, then back out first landing from hggit and git.) Then equality can be restored, and each landing redone separately.

**DVCS Commands**
Next Step	Active Repository	Command
pull from RoR	hggit	hg pull
pull to git	git	git pull RoR
push from git	git	git push RoR
push to RoR	hggit	hg push

Note

that if any of the above actions fail, it simply means that we’ve lost another race condition with someone else’s commit. The recovery path is simply to re-evaluate the current state and proceed as indicated (as shown in diagram 1).

Flowchart to Restore Equality

Flowchart to Restore Equality

Using hg & git for the same codebase

Mon, 20 May 2013 00:00:00 -0700

Using hg & git for the same codebase

Speaker Notes

Following are the slides I presented at the RELENG 2013 workshop on May 20th, 2013. Paragraphs formatted like this were not part of the presented slides - they are very rough speaker notes.

If you prefer, you may view a PDF version.

Hal Wine hwine@mozilla.com
Release Engineering
Mozilla Corporation

Issues and solutions encountered in maintaining a single code base under active development in both hg & git formats.

Background

Mozilla Corporation operates an extensive build farm that is mostly used to build binary products installed by the end user. Mozilla has been using Mercurial repositories for this since converting from CVS in 2007. We currently use a 6 week “Rapid Release” cycle for most products.

Speaker Notes

We currently have upwards of 4,000 hosts involved in the continuous integration and testing of Mozilla products. These hosts do approximately 140 hours of work on each commit.
Firefox Operating System is a new product that ships source to be incoporated by various partners in the mobile phone industry. These partners, experienced with the Android build process, require source be delivered via git repositories. This is close to a “Continuous Release” process.

Speaker Notes

A large part of the FxOS product is code used in the browser products. That is in Mercurial and needs to be converted to git. Most new code modules for FxOS are developed on github, and need to be converted to Mercurial for use in our CI & build systems.

Summary

What we initially set out to do:
- Make it purely a developer choice which dvcs to use.
  
  Speaker Notes
  
  Ideal was to allow developers to make dvcs as personal a choice as editor.
- Support multiple social coding sites.
  
  Speaker Notes
  
  These social coding sites, such as github and bitbucket, make it much easier for new community members to contribute.
That was much tougher than anticipated.

In theory, git & hg are very close…

… In practice, “the devil is in the details”.
Where we are:
- Changed direction to support FFOS release to partners.
- Quickly mirror Repository of Record (RoR) between git & hg.
- CI/build system remains Mercurial centric.

Challenge Areas

Changesets have different hashes in Mercurial and git.
- We added tooling to support both in static documents such as manifest files.
- All tools continue to use hg hash as primary value for indexing and linking.
Propagation delays of changesets to the “other” system.
Speaker Notes

For most use cases, the approximately 20 minute average we’re achieving is acceptable.
- Compounded by hash differences between two systems.
  
  Speaker Notes
  
  A common use case here is a developer wanting to start a self serve build. If the commit was to git, the self serve build won’t be successful until that commit is converted to hg.
  
  We are continuing work on this. It is closely tied to determining which commit broke the build, when multiple repositories are involved.
Build details
- Movable tags are not popular in git based workflows, but have been a common technique at Mozilla to mark “latest”.

Challenge Areas (Con’t)

Mixed philosophies are often linked with mixed repositories.
- Android never wants history to appear to change. Downstream servers allow only fast forward changesets and deny deletions.
- Mozilla uses “RoR is authoritative”.
  
  Speaker Notes
  
  Either approach is self consistent. It is when the two need to interact that challenges arrise.

Conversion failures
- Occasional hg-git conversion failures, due to implementation details of hg & git.
  
  Speaker Notes
  
  Dates in export patches (e.g. hg uses seconds, git uses minutes, in time)
  
  Email validation (git stricter than hg)
- Since commit already accepted by hg, hg-git must be modified
  
  Speaker Notes
  
  This requires inhouse resources to respond urgently to patch the conversion machinery. Without conversion, there are no builds.

Alternate Approaches

To support your own “use the DVCS you want” infrastructure requires:
- production quality hg server
- production quality git server
- in house ability to address conversion issues (as already mentioned)
I’m aware of two commercial alternatives. Both of these use a centralized RoR which supports git and/or hg interfaces for developer interaction.

Speaker Notes

And at least one explicitly does not have a git back end.
You can leave it to developers to scratch their own itch independently. Given diversity of workflows, this may be more cost effective than obtaining consensus.

Future Research

Areas of particular interest for further study include:

What is the set of enforceable assertions which would ensure the tooling can maintain lossless conversion between DVCS?
What minimum conditions must be maintained in conversions to preclude downstream conflicts?
What workflows can be supported to minimize issues?
Are there best practice incident management protocols for addressing problem commits.

Speaker Notes

The common example is a commit contains sensitive material it should not. There are cases were limiting the scope of distribution can have significant business value.

Thank You

References:

Blog:

http://dtor.com/halfire/

The canonical commit/push/land cycle at Mozilla

Tue, 10 Apr 2012 00:00:00 -0700

The canonical commit/push/land cycle at Mozilla

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Untangling the terminology

In the old days, before DVCS, “commit” only had only one real purpose. It was how you published your work to the rest of the world (or your project’s world at least). With DVCS, you are likely committing quite often, but still only occasionally publishing.

Changes to commit workflow

Thu, 08 Mar 2012 00:00:00 -0800

Changes to commit workflow

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

With all the changes to support git, how will that affect a committer’s workflow? (For developer impact, see this post.)

The primary goal is to work within the existing Mozilla commit policy [1]. Working within that constraint, the idea is “as little as possible”, and this post will try to describe how big “as little” is.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Changes to Developer workflow

Wed, 07 Mar 2012 00:00:00 -0800

Changes to Developer workflow

[Refer to the main page for additional context.]

With all the changes to support git, how will that affect a developer’s workflow? (The committer’s workflow will be covered in a future post.)

The idea is “not much at all”, and this post will try to define “not much”.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

DVCS Survey Summary

Fri, 02 Mar 2012 00:00:00 -0800

DVCS Survey Summary

Summary

A long time ago (December of 2011), I sent out a brief survey on DVCS usage to Mozilla folks (and asked them to spread wider). While there were only 42 responses, there were some interesting patterns.

Disclaimer!

I am neither a statistician nor a psychometrician.

I believe you can see the raw summary at via this link. What follows are the informal inferences I drew from the results (remember the disclaimer).

Commit to git is not the issue:

… the Wowza feature of git

Tue, 21 Feb 2012 00:00:00 -0800

… the Wowza feature of git

tl;dr

Wowza! I found the killer feature in git - you can have your cake and eat it, too!

Every time I’ve had to move to a new VCS, there’s never been enough time available to move the complete history correctly. Linux had this problem in spades when they moved off BitKeeper onto git in a very-short-time.

The solution? Take your time to convert the history correctly (or not, you can correct later), then allow developers who want it to prepend it on their machines, without making their repo operate any differently from the latest one.

Read on for more about replace/graft feature.

Releng As Is - January 2012

Wed, 01 Feb 2012 00:00:00 -0800

Releng As Is - January 2012

[Refer to the main page for additional context.]

Where we are in January 2012

The purpose of this post is to present a very high level picture of the current Firefox build & release process as a set of requirements. Some of these services are provided or supported by groups outside of releng (particularly it & webdev). This diagram will be useful in understanding the impact of changes.

Releng & Git - Project Overview

Thu, 26 Jan 2012 00:00:00 -0800

Releng & Git - Project Overview

Contents

Releng & Git - Project Overview

This is the first in a series of posts about the “support git in releng” project. The goal of this project, as stated in bug 713782, is:

… The idea here is to see if we can support git in Mozilla’s RelEng infrastructure, to at least the same standard (or better) as we already currently support hg.

My hope is that blog posts will be a better forum for discussion than the tracking bug 713782, or a wiki page, at this stage.

These posts will highlight the various issues, so that the vague definitions above become clear, as do the intermediate steps needed to achieve completion.

The Ideal Future

Thu, 26 Jan 2012 00:00:00 -0800

The Ideal Future

[Refer to the main page for additional context.]

Based on discussions to date, everyone seems to have similar ideas about what “supporting git for releng” means. Later posts will highlight the work needed to ensure the ideal can be achieved, and how to arrive there.

For this post, I intend to limit the viewpoint and scope to that of the developer impact. Release notions (such as “system of record”) and scaling issues won’t be mentioned here. (N.B. Those concerns will be a key part of the path to verifying feasibility, but do not change the goal.)

As a reminder, I’m just talking about repositories that are used to produce products. [1]

… a View from Outside

Sun, 22 Jan 2012 00:00:00 -0800

… a View from Outside

tl;dr

One of the things that excited me about the opportunity to work at Mozilla was the chance to change perspectives. After working in many closed environments, I knew the open source world of Mozilla would be different. And that would lead to a re-examination of basic questions, such as:

Q: Are there any significant differences in the role a VCS plays at Mozilla than at j-random-private-enterprise?

A: At the scale of Mozilla Products [1], I don’t believe there are.

But the question is important to ask! (And I hope to ask more of them.)