elasticsearch Index Templates

I’ve been working on setting up an elasticsearch cluster for logstash. Since logstash has unique write throughput and storage requirements, there are a few recommended index settings for logstash — see this wiki page and this blog post.

By default, logstash creates a new index for each day’s logs, so these index settings have to be configured using an index template. If an index is configured directly, the settings would only apply to the current day’s index and tomorrow’s index would be created with the default settings again. An index template applies to all new indexes that match a pattern such as logstash-*, which will match logstash-2013.03.18, logstash-2013.03.19, etc.

As with most settings in elasticsearch, there are two ways to configure index templates. They can be configured through the API, or they can be stored in a configuration file. The latter is helpful when configuring a cluster that is not up and running. In my case, I am using chef to configure the elasticsearch nodes, so it’s not guaranteed that the cluster is up when the recipe executes.

Unfortunately, it took me a long time to figure out how to get the configuration file method working. As this thread suggests, I put the file in the right place — #{config.path}/templates/logstash_template.json — and I made sure to configure each master-eligible node. I even read through the feature and the associated commit to make sure the documentation was in sync with the code. elasticsearch just wasn’t picking up the settings.

Continue reading

elasticsearch EC2 Discovery

On a private network, elasticsearch nodes will automatically discover peers using multicast. Nodes configured with a common cluster name will magically find each other when they boot up and form a cluster. It’s wonderful, magical, and a little scary — elasticsearch nodes will likely be the first to become sentient in a robot uprising.

On AWS and most other clouds, multicast is not allowed. (Rackspace supports broadcast and multicast.) This leaves two options: use unicast discovery and explicitly list out each node in discovery.zen.ping.unicast.hosts, or use the EC2 discovery method provided by the cloud-aws plugin. The former is fairly brittle due to the dynamic nature of the cloud. The latter uses the EC2 API to enumerate hosts, essentially populating discovery.zen.ping.unicast.hosts dynamically. This guide does a great job of covering the process, so I won’t go into the details here. Instead, I will try to offer a few tips on the setup process.

Continue reading