Category: DevOps

Redunancy Planning, more work than adding one of everything

Since I started my career, redundancy has been featured in almost every deployment discussion. The general best practice is to add an additional element for each service tier, also know as N+1 redundancy. This approach is straight forward, but many people would actually be surprised by how often these schemes fail. At a very famous…

April 14, 2012
Three Monitoring Tenants

This week, I was seeing a drop in average back-end performance at work, we had an average drop in page load performance from ~250ms to around 500ms. This seemed to be an intermittent problem and we searched through out graphs at NewRelic with no clear culprit. Then we started looking at our internal MediaWiki profiling…

March 8, 2012
Groovy, A Reasonable JVM Language for DevOps

I’ve worked at several environments where most of our product was run through the JVM. I’ve always used the information available to me in Mbeans, but the overhead of exposing them to a monitoring system like Ganglia or Nagios has always been problematic. What I’ve been looking for is a simple JVM language that allows…

March 6, 2012
A few things you should know about EC2

Availability Zones are Randomized Between Accounts I had someone from Amazon tell me this, so I assume this to be true. In order to prevent people from gaming the system availability and over allocating instances in a singe az, zones ids are randomized across customers. So for any two accounts us-east-1a != us-east-1a. Amazon promises…

February 27, 2012
Configuration Management Tools Still Fall Short

I have a gripe with almost every configuration management tool I’ve used. I’m most familiar with chef, but I’ve used puppet a bit, so I apologize to the fine people at OpsCode in advance since my examples will be chef based. The Cake is a Lie Every time I run chef I tell my self…

February 23, 2012
SSH Do’s and Don’ts

Do Use SSH Keys When ever you can use a key for SSH. Once you create it, you can distribute the public side widely to enable access where ever you need it. Generating one is easy: ssh-keygen -t dsa Don’t Use a Blank Passphrase on Your Key This key is now your identity. Protect it.…

February 22, 2012
Techincal Debt Better Than Not Doing It

Its time to admit that sometimes it’s okay to incur technical debt, particularly when it comes to getting it done. So many times, I’ve run into to places that have constipated operations environments, or automation processes because something is hard to do automatically. If you can’t automated it, don’t block all other tasks because of…

February 16, 2012
User Acceptance Testing for Successful Failovers

Things fail, we all know that. What most people don’t take into account is that things fail in combination and unexpected ways. We spend time and effort planning redundancy and failover schemes to seamlessly continue operations, but often neglect to fully test these plans before rolling services and equipment into production. What inevitably happens is…

February 8, 2012
Solr Query Change Beats JVM Tuning

I’ve been spending the last few days at work trying to improve our search performance, and have been banging my head against the dismax query target and parser in Solr. For those not familiar with the Dismax, its a simplified parser for Solr that eliminates the complexity from the Standard query parser. Instead of search…

February 7, 2012
Dealing with Outages

No matter what service you’re building, at some point you can expect to have an outage. Even if your software is designed and scaled perfectly, one of your service providers may let you down, leading to a flurry of calls from customers. Plus the internet has many natural enemies in life (rodents, backhoes, and bullets),…

January 30, 2012