ChatOps Meetup

This last Wednesday, I went to a meetup on ChatOps organized by SF DevOps, hosted by Geekdom (who also made recordings available), and sponsored by TrueAbility.

I had two primary goals in attending: I wanted to understand what made ChatOps special, and I wanted to see how much was applicable to my current work at Mozilla. The two presentations helped me accomplish the first. I’m still mulling over the second. (Ironically, I had to shift focus during the event to clean up a deployment-gone-wrong that was very close to one of the success stories mentioned by Dan Chuparkoff.)

My takeaway on why chatops works is that it is less about the tooling (although modern web services make it a lot easier), and more about the process. Like a number of techniques, it appears to be more successful when teams fully embrace their vision of ChatOps, and make implementation a top priority. Success is enhanced when the tooling supports the vision, and that appears to be what all the recent buzz is about – lots of new tools, examples, and lessons learned make it easier to follow the pioneers.

What are the key differentiators?

Heck, many teams use irc for operational coordination. There are scripts which automate steps (some workflows can be invoked from the web even). We’ve got automated configuration, logging, dashboards, and wikis – are we doing ChatOps?

Well, no, we aren’t.

Here are the differences I noted:
  • ChatOps requires everyone both agreeing and committing to a single interface to all operations. (The opsbot, like hubot, lita or Err.) Technical debt (non-conforming legacy systems) will be reworked to fit into ChatOps.
  • ChatOps requires focus and discipline. There are a small number of channels (chat rooms, MUC) that have very specific uses - and folks follow that. High signal to noise ratio. (No animated gifs in the deploy channel - that’s what the lolcat channel is for.)
  • A commitment to explicitly documenting all business rules as executable code.

What do you get for giving up all those options and flexibility? Here was the “ah ha!” concepts for me:

  1. Each ChatOps room is a “shared console” everyone can see and operate. No more screen sharing over video, or “refresh now” coordination!
  2. There is a bot which provides the “facts” about the world. One view accessible by all.
  3. The bot is also the primary way folks interact and modify the system. And it is consistent in usage across all commands. (The bot extensions perform the mapping to whatever the backend needs. The code adapts, not the human!)
  4. The bot knows all and does all:
    • Where’s the documentation?
    • How do I do X?
    • Do X!
    • What is the status of system Y?
  5. The bot is “fail safe” - you can’t bypass the rules. (If you code in a bypass, well, you loaded that foot gun!)

Thus everything is consistent and familiar for users, which helps during those 03:00 forays into a system you aren’t as familiar with. Nirvana ensues (remember, everyone did agree to drink the koolaid above).

Can you get there from here?

The speaker selection was great – Dan was able to speak to the benefits of committing to ChatOps early in a startup’s life. James Fryman (from StackStorm) showed a path for migrating existing operations to a ChatOps model. That pretty much brackets the range, so yeah, it’s doable.

The main hurdle, imo, would be getting the agreement to a total commitment! There are some tensions in deploying such a system at a highly open operation like Mozilla: ideally chat ops is open to everyone, and business rules ensure you can’t do or see anything improper. That means the bot has (somewhere) the credentials to do some very powerful operations. (Dan hopes to get their company to the “no one uses ssh, ever” point.)

My next steps? Still thinking about it a bit – I may load Err onto my laptop and try doing all my local automation via that.