Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

ElasticSearch on EC2 - runs into problem recovering when one of the nodes times out then recovers

Caching of filters, merging of indices.
(26 lines)
Upgrade from 0.16.0 to 0.16.2
(37 lines)
Jun 1, 2011
Alex At Ikanow
Alex At Ikanow
This is on 0.15.2-1 (I can't easily move up to 0.16.* in the near
future, so just let me know if this has been fixed in a more recent
version and I'll shut up - from a quick scan I couldn't see any
candidate fixes in the change files though)

On a 2-node (test/reference) cluster running on EC2, one of the nodes
("node A") ran out of memory (due to one of the other processes) and
hung for a few minutes (eg no ssh connectivity) before the offending
process was killed and it returned to normal.

During that time "node B" detected a loss of connectivity to "node A"
and removed it. When "node A" recovered, it did not get added back to
"node B"'s list.

So at that point:
 - nodeA believed it was part of a 2-node cluster (and shared new
documents posted to it)
 - nodeB believed it was part of a 1-node cluster (and obviously did
not pass any new documents to nodeA)

The log file entries from "node B", which might make the above clearer
(just pasted below below since it's so short):



My YML configuration is very simple (just pasted below below since
it's so short):



Once I restarted node B (first manually deleting all its documents to
make life simpler), everything returned to a working state (with "node
A" as the master and "node B" as the slave).

Again - apologies if this is already fixed. If not, shall I create an
issue? Not 100% what the desired functionality should be ("node A"
regains mastership? "node A" becomes a slave?), but presumably not the
above :)

Any thoughts appreciated - and great job on ElasticSearch!

Alex
apig### @ikanow.com



Reply
Tags: quick scan
Messages in this thread
ElasticSearch on EC2 - runs into problem recovering when one of the nodes times out then recovers
elasticsearch backup not recovering as intended
April 15, 2011 02:15:41 PM
We are running ES 14.1 and restored data from our backup folder, started the master and then nodes. Cluster health shows status as green, but it only brought 2/4 indices. In addition, the 2 that were brought online only contain 3 and 4 docs each.…
Created: (DERBY-4971) Query runs 10 times more slowly with optimizer turned ON
January 13, 2011 11:16:13 AM
Query runs 10 times more slowly with optimizer turned ON
slaveok reads still getting directed to "Recovering" slave nodes
February 11, 2011 10:01:05 PM
In a replSet Is this normal behavior? Doesn't really make sense to me. Not sure if it's on the server side, or the C# driver side. Either way, it causes reads to fail with a "not master" error. I would think that if a node is in the "Recovering"…
Problem with insert data to elasticsearch
October 25, 2010 05:18:39 AM
Hi, I'm using latest elasticsearch release 0.12.0 I have problem with insert data to elasticsearch. I'm using "elasticsearch" PHP client to insert data. Insert data is done in loop. PHP script insert 0 - 10000 entries and reload itself. Next it…
ElasticSearch as Windows service - problem
February 24, 2011 06:26:29 AM
Hi, I have been running ElasticSearch on Linux and Mac OSX successfully for a while. I was going to set it up on a Windows 2008 server as a service. It is a 64 bit server. I've set up elasticsearch 0.15.0, and I can start it and stop it on the…
Urgent problem: Dovecot fetches the same mails multiple times per POP
January 10, 2011 09:09:40 AM
I have a problem since a couple of hours: I manage several mboxes on a virtual server using Dovecot (over POP3). Until a few hours ago, it worked well, but since 11:50, the e-mail clients which access Dovecot's mailboxes, receive all e-mails new…
Search the http://elasticsearch.org website with elasticsearch itself
February 11, 2011 08:53:09 AM
Hi, I've spent couple of hours during the last days with implementing an ElasticSearch-backed search for the ES website (so it could be “self-hosted”, so to speak), as we have been talking about it on IRC the other day. First, you can try the…
nodes/stats problem
February 1, 2011 01:48:55 AM
Hi, When I tried to get node statistics from ES cluster with 2 nodes on 2 different machines, I got empty node statistics response *{ cluster_name: "rossCluster" nodes: { } } * This happens after I start the second node.…
Created: (ZOOKEEPER-964) How to avoid dead nodes generated? These nodes can't be deleted because the
December 26, 2010 09:27:37 PM
How to avoid dead nodes generated? These nodes can't be deleted because there parent don't have delete and setacl permission.
PATCH/puppet 1/1] [#4256] External nodes parameters can now be assigned to nodes
July 17, 2010 08:09:57 PM
From: Matt Robinson <matt@puppetlabs.com> Node parameters were made a reader instead of an accessor in commit b82b4ef04282ca0006931562f60459a1591b6268 Author: Luke Kanies <luke@reductivelabs.com> Date: Wed Jan 6 17:42:42 2010…
PATCH/puppet 1/1] [#4256] External nodes parameters can now be assigned to nodes
July 16, 2010 05:27:28 PM
Node parameters were made a reader instead of an accessor in commit b82b4ef04282ca0006931562f60459a1591b6268 Author: Luke Kanies <luke@reductivelabs.com> Date: Wed Jan 6 17:42:42 2010 -0800 All non-transient parser references…
Created: (JCR-2913) Shared nodes disappear suddenly - Database corruption : Cannot delete nodes anym
March 12, 2011 05:25:22 PM
Shared nodes disappear suddenly - Database corruption : Cannot delete nodes anymore : Error is Node with id 'X" does not have shared parent with id: 'Y'
Recovering a bad replica set
March 3, 2011 09:42:48 AM
I tried to create a replicaSet of two servers, using the instructions on the help site, one running version 1.7.6, the other 1.8.0-rc0. I did the following: - Ran both servers with the "--replSet foo", and on the primary server ran "rs.initiate()".…
Query for nodes based on presence of descendant nodes
April 12, 2011 02:03:12 PM
Hi, I would like to query for nodes based on the presence of a descendant node. In psuedo code, something like: select * from nt:base where hasDescendant('./relative/path/to/descendant/node') I am trying to create the query with SQL2 and I can…
Passenger, Puppet/ nodes.pp ignored with external nodes script?
February 15, 2011 08:12:25 PM
Hey All, I'm still experimenting with a puppet backend for 6000 hosts. I have switched from mongrel to a passenger backend on a single puppetserver. Afterwards any node that is not defined in the external nodes script, is now rejected even if…
Overwrite default settings in nodes using external nodes.
September 29, 2010 08:00:47 AM
In my puppet environment I tryed to implement default configuration that can be extended in child node definition. For instance: * All the linux SSH servers must allow connect two groups: group1 and group2 * Each node (or classnode) should…
Adding nodes wrong/data not balanced across nodes
October 27, 2010 11:38:57 AM
I have 6 nodes, with a RF of 3. I didn't set auto bootstrap to true in the conf file, is this why I'm getting such non-balanced data storage? I tried running nodetool loadbalance on the nodes, one by one, but that didn't really seem to help. …
Re: Recovering from bad format AOF when aof file is 5.4 GB
September 16, 2010 11:52:54 AM
Thank you very much Luis, I can confirm the tool is doing the right thing :) Cheers, Salvatore On Tue, Sep 14, 2010 at 3:29 PM, Luis Lavena <luisla### @gmail.com> wrote: > On Tue, Sep 14, 2010 at 10:21 AM, Salvatore Sanfilippo >…
S3 gateway not recovering after restart
May 16, 2011 03:50:46 AM
Hi, I am currently trying out the s3 gateway and upon a cluster restart I am getting the following error message: [2011-05-16 09:58:48,052][WARN ][index.gateway.s3 ] [Burner] [testindex][0] listed commit_point [commit-24m]/[2758], but not…
OOM recovering failed node with many CFs
May 26, 2011 09:28:43 AM
I can't seem to be able to recover a failed node on a database where i did many updates to the schema. I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but it can't be changed right now), and ReplicationFactor=2. I shut…
How do I find why a python scripts runs in significantly different running times on different machines?
May 9, 2011
The Facts: I am working on a NoteBook with Intel Core 2 Duo 2,26 GHz and 4 Gigabyte of Ram. It has a Apache Server and a MySQL Server running. My…
Recovering a InnoDB database
February 1, 2011
I had a problem with our InnoDB database. I tried the innodb_force_recovery = 4 option in /etc/my.cnf MySQL can't be used but runs at 100% CPU. Is…
Stored procedure problem, runs on local but not on server
March 28, 2011
I am having a strange problem with MySQL Stored Procedure. I have written a simple stored procedure as follows: { DELIMITER $$ CREATE…
Problem when both sendmail and mimedefang runs as unpriviledged user
March 3, 2011
Hi! I'm trying to implement an anti-spam solution and according to advices I get here from answers to my other questions, I have decided to go for…
Playframework + Morphia + MongoDb + ElasticSearch = Disater?
March 26, 2011
Guys I have a couple of questions: Q1. I am using Play 1.1.1 with the Morphia-MongoDB module and ElasticSearch for Indexing. Has any one tried this?…
Problem calling stored procedure in a prepared statement multiple times
January 23, 2011
I'm using PHP to process some XML that I'm getting from a web service and to insert the results into a database. I've created a stored procedure to…
Sed non greedy match: matching xml nodes
February 15, 2011
Follow up to this question $test = "sed -n '1h;1!H;\${;g;s/<item=\"".$name.".*</\item>/".trim(xml)."/g;p;}' ".$file;…
Hadoop on EC2 error: could only be replicated to 0 nodes, instead of 1
January 18, 2011
I'm running a very small hadoop cluster on EC2. I'm starting a cluster using whirr (version: whirr-0.2.0-incubating) with 1 (jobtracker + namenode)…
Hadoop / Ubuntu 10.10 EC2 - Nodes hanging
May 25, 2011
Hadoop version: 0.20.2 We are encountering a very strange problem. Basically, we're trying to get a new hadoop cluster up on new AMI's - Ubuntu…
How many slave nodes in single node
May 21, 2011
Hey guys! I was wondering, how many slave nodes are there in Hadoop single node cluster?? Thank you!
MongoDB - Cluster of small/many or few/big nodes
January 21, 2011
Can it be said what is best in a general case (where the database size is really big): To have a MongoDB cluster consisting of a larger number of…
How to identify clusters of nodes in a network
May 22, 2011
I have a table describing several sets of connected nodes: node origin_node REFERENCES node start_time end_time and I want to find out how many…
What is the purpose of the -nodes argument in openssl?
February 19, 2011
What is the purpose of the -nodes argument in openssl?
HDFS error: could only be replicated to 0 nodes, instead of 1
March 13, 2011
Hi, I've created a ubuntu single node hadoop cluster in EC2. Testing a simple file upload to hdfs works from the EC2 machine, but doesn't work from…
Mysql dump runs, but does nothing
March 6, 2011
I am trying to restore a database using mysqldump, but nothing seems to happen. I get some output on the screen, but the program stops before it…
What does Redis do when it runs out of RAM memory?
February 21, 2011
This might be easy question but I am having hard time finding the answer. How does Redis 2.0 handle when it runs out of maximum allocated memory for…
Postgres runs very slow
May 4, 2011
Hi and thank you for reading. I am working on a distributed system and have 8 clusters, which have a Postgres instance each. However, one of the…
Is there a git implementation that runs on top of couchdb?
May 24, 2011
I've seen this old (defunct) bit of news here: http://news.ycombinator.com/item?id=573699 speaking about an implementation of couchdb using git and…
PHP-FPM runs PHP scripts as root
April 26, 2011
I have a web server setup using nginx and PHP-FPM listening on a Unix socket. In my php-fpm.conf, I have specified user = www group = www When I run…
Why MySql query runs twice?
January 24, 2011
I got a page which is URLRewrited by the .htaccess which runs a MySQL query twice eventhough I place it once, no loops no nothing. .htaccess…