org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream

One of the nodes in my elasticsearch cluster was acting kind of flaky. It would lose network connectivity just long enough to drop out of the cluster, then rejoin, then drop out again. This caused a lot of shard reallocation, which didn’t help things at all. After a few drop/rejoins, I decided to spin up a new cluster node and retire the flaky one.

This made the problem much, much worse. Over the next day or two, I experienced all kinds of strange behavior. Replica shards randomly stopped replicating from primaries, and attempts to allocate replicas to new nodes would sometimes fail. At times, even the cluster status command would fail, rendering elasticsearch-head unusable. These messages were very common in the logs:

[2013-04-24 00:59:53,539][WARN ][action.index ] [hostname] Failed to perform index on replica [logstash-2013.04.24][3]
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
        at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(
Caused by: unexpected end of block data
[2013-04-24 00:59:53,544][WARN ][cluster.action.shard ] [hostname] sending failed shard for [logstash-2013.04.24][3], node[obkgtvEVS3q59PlJWfY03g], [R], s[INITIALIZING], reason
[Failed to perform [index] on replica, message [RemoteTransportException[Failed to deserialize exception response from stream]; nested: TransportSerializationException[Failed to deserialize exception response from stream]; nested: StreamCorruptedException[unexpected end of block data]; ]]
[2013-04-24 00:59:53,544][WARN ][transport.netty ] [hostname] Message not fully read (response) for [4541427] handler$AsyncShardOperationAction$4@7a8295ea, error [true], resetting

(The stack traces are truncated in the output above; see this gist for the full output.)

Unfortunately, googling for the exceptions didn’t really help. Many posts either showed different exceptions, or they were unanswered. The closest thing I could find was this post suggesting not to mix versions of elasticsearch. I knew I was running the same version of elasticsearch on all my nodes (0.20.2), so it appeared to be a dead end.

I finally realized what was causing the problem — when I spun up the new cluster node, it installed a newer JVM, version 1.7.0_21. The rest of the cluster was spun up weeks ago and had version 1.7.0_17 of the JVM installed. Even though the version of elasticsearch was the same across nodes, the the JVM version was not. Upgrading the rest of the cluster to use the same JVM version fixed the problem.


7 thoughts on “org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream

  1. Thanks that was very helpful. Amazing how sensitive it is. java 6.34 and 6.35 mismatch. Once they were the same, the error came across.

  2. This is due to a bug in Java where they changed how InetAddress gets serialized over the wire sadly. ES uses Java serialization only when it serializes exception over the wire, and sadly that causes it.

  3. 但是好像用Spring-data-Elasticsearch的jar ,并且Elasticsearch版本为1.1.1的时候,开发环境下1.6或者1.7这样跨大版本都没问题哦,但是我直接使用Elasticsearch的原生语法就报这样的错了。

  4. But as with the Spring-data-elasticsearch jar, and for 1.1.1 Elasticsearch version, version 1.6 or 1.7 such big development environment would be all right oh, but I directly use Elasticsearch native so that grammar is wrong.

  5. Doesn’t the Elasticsearch installation depend on the Elasticsearch settings and the Java Home directory and path selected, and also the JAVA_HOME and JRE_HOME system variables – not the various Java versions installed on their own?

Leave a Reply

Your email address will not be published. Required fields are marked *