Category Archives: solr

DevOps java solr

Solr Upgrade Surprise and Using Kill To Debug It

At work, we’ve recently upgraded to the latest and greatest stable version of Solr (3.6), and moved from using the dismax parser to the edismax parser. The initial performance of Solr was very poor in our environment, and we removed the initial set of search features we had planned to deploy trying to get the CPU utilization in order.

Once we finally, rolled back a set of features Solr seemed to be behaving optimally. Below is what we were seeing as we looked at our search servers CPU:
Solr CPU Usage Pre and Post Fix
Throughout the day we had periods where we saw large CPU spikes, but they didn’t really seem to affect throughput or average latency of the server. None the less we suspected there was still an issue, and started looking for a root cause.

Kill -3 To The Rescue

If you’ve never used kill -3, its perhaps one of the most useful Java debugging utilities around. It tells the JVM to produce a full thread dump, which it will then print to the STDOUT of the process. I became familiar with this when trying to hunt down treads in a Tomcat container that were blocking the process from exiting. Issuing kill -3 would give you enough information to find the problematic thread, and work with development to fix it.

In this case, I was hunting for a hint as to what went wrong with our search. I issued kill -3 during a spike, and got something like this:


Looking at the the output, I noticed that we had a lot threads calling FuzzyTermEnum. I thought this was strange, and sounded like an expensive search method. I talked with the developer, and we expected that the tilde character was being ignored by edismax. At the very least being escaped by our library, since it was included in the characters to escape. I checked the request logs, and we had people looking for exact titles that contained ~. This turned a 300ms query into a query that timed out, due to the size of our index. Further inspection of the thread dump revealed that we were also allowing the * to be used in query terms as well. Terms like *s ended up being equally problematic.

A Solr Surprize

We hadn’t sufficiently tested edismax, and we’re surprised that it ran ~,+,^, and * when escaped. I didn’t find any documentation that stated this directly, but I didn’t really expect to. We double checked our Solr library to see if that it was properly escaping the special characters in the query, but they we’re still being processed by Solr. On a hunch we tried double escaping the characters, which resolved the issue.

I’m not sure if this is a well known problem with edismax, but if you’re seeing odd CPU spikes this is definitely worth checking for. In addition, when trying to get to a root of a tough problem kill -3 can be a great shortcut. It saved me a bunch of painful debugging, and really eliminated almost all my guess work.

DevOps solr

Groovy, A Reasonable JVM Language for DevOps

I’ve worked at several environments where most of our product was run through the JVM. I’ve always used the information available to me in Mbeans, but the overhead of exposing them to a monitoring system like Ganglia or Nagios has always been problematic. What I’ve been looking for is a simple JVM language that allows me to use any native object. In 2008 I found Groovy, and have used it as my JVM glue ever since.

Meet Groovy

I’ll skip from going into too much detail here, but Groovy is a simple dynamically typed language for the JVM. Its syntaxt and style is similar to that of Ruby. You can find more information here.

Solr JMX to Ganglia Example

This is a quick script to get data from JMX into ganglia fromSsolr. The syntax is simplfied, and missing the typical java boiler plate required to run from the command line. Not only that, but because of the simplified typing model, its very easy for me to build strings and then execute a command. This allows me to quickly construct a script like I would in Perl or Python, but also allows me easy access JMX, and any Java library of my choosing. The primary drawback will be the time it takes to instantiate the JVM, which you can work around if you can’t deal with the launch time.

So, if your working in a Java with a bunch of Java apps, think about giving Groovy a chance for writing some of your monitoring tests, and metric collectors. It is a simpler language than Java to put together those little applications that can tell you how your system is performing, and well within the reach of your average DevOps engineer.

DevOps solr

Solr Query Change Beats JVM Tuning

I’ve been spending the last few days at work trying to improve our search performance, and have been banging my head against the dismax query target and parser in Solr. For those not familiar with the Dismax, its a simplified parser for Solr that eliminates the complexity from the Standard query parser. Instead of search terms like “field_name:value” you can simple enter “value”, but you can no longer search for a specific term in a specific field.

Our search index has grown in the last few months by 20% and our JVM and Solr setups were beginning to groan under the weight of the data. I went through a few rounds of JVM tuning, which reduced garbage collection time to less than 2%, and with some Solr configuration options managed to bring our typical query back under 5 seconds. This felt like a major win, until I adjusted the query.

Looking at our query parameters on search I noticed we were using the “fq” parameter to specify the id of the particular site we were looking for. These queries were taking anywhere from 5-15 seconds across our 360GB index, and I suspected that we were pulling in data to the JVM only to filter it away. The garbage collection graphs seemed to indicate this as well, since we had a very slow growing heap, and our eden space was emptying very quickly even with 20G allocated to it. When I changed from dismax to the standard target and specified the site id, I noticed my search time went from 5 seconds to .06 seconds, so started reading, and came across an article on nested queries. My idea was that this would allow me to apply a constraint to the initial set of data returned, using the standard search target, and then perform a full text search using dismax and achieve the same results.

Original Query(grossly simplified):

Becomes the following nested query:

Original Query Time : 5 seconds
Nested Query Time : 87 milliseconds

Both return identical results. So, if performing a query against a large index and you want to use dismax, you should try using a nested search. You’re likely see much better performance, particularly if you’re filtering based on a facet. And this gives you a relatively easy way to specify the value of a field, and still want to use a dismax query.