Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Rebalance hangs

mogstored hangs unexpectedly
(28 lines)
about mogilefs mogstored
(94 lines)
Aug 31, 2011
MadCamel
MadCamel
I'm using 2.53 and rebalance is hanging. I'm not exactly sure where to
poke to figure out why. Nothing indicitave of a problem is showing up
in !watch or syslog. Any suggestions?

To reproduce the problem I run the following commands:
mogadm rebalance policy --options "from_percent_used=70
to_percent_free=20 limit_type=device limit_by=size limit=4g
fid_age=old"
mogadm rebalance reset
mogadm rebalance start

After a short period (a few minutes) all the counters in rebalance
status become completely static. Nothing is moving.

Rebalance is running
Rebalance status:
             bytes_queued = 4294966845
           completed_devs = ,33
              fids_queued = 1883
                    limit = 0
             sdev_current = 32
             sdev_lastfid = 437110809
               sdev_limit = 4294967296
              source_devs =
90,21,102,7,26,99,18,16,55,27,95,57,61,108,20,109,92,103,89,10,31.......
            time_finished = 0
             time_started = 1314829070
             time_stopped = 0

Doing an strace of a replicate process shows a lot of this:
write(259, "worker_bored 100 replicate rebalance\r\n", 38) = 38


Reply
Tags: rebalance
Messages in this thread
Rebalance hangs
reply Re: Rebalance hangs
(13 lines) Aug 31, 2011 19:31
Neo Rebalance!
September 19, 2010 01:58:50 AM
Yo, I've finally uploaded a working version of the rebalance implementation I've been thinking about for forever. It's inteface is very raw, and I'd like to hear feedback about ways to make it easier to use. I'll be filling in a wiki page with…
Re: Does anyone actively use rebalance?
July 31, 2010 03:33:06 AM
Rebalance has been immensely useful to us. When adding new storage nodes, we are able to mitigate the damage from HD failures, are able to add nodes more sparingly, and just generally balances out load across storage nodes. ( 12 storage nodes, 6…
Rebalance error
June 15, 2011 10:04:39 AM
Hi, !watch is reporting this while running rebalance: :: [replicate(11617)] Rebalance for DevFID[d=133;f=7675034] (http://10.0.1.103:7500/dev133/0/007/675/0007675034.fid) failed: HTTP delete (due to 'did_rebalance;ret=1') failed: can't connect…
Re: Rebalance error
June 16, 2011 01:43:20 AM
No. Just a few per minute... On 06/16/2011 08:39 AM, dormando wrote: > > On Thu, 16 Jun 2011, Arkadi Colson wrote: > >> Can this be the reason that rebalance does not cleanup the over-replicated >> files? > How many of…
mogilefs rebalance
May 9, 2011 03:58:35 AM
Dear Listmembers, I have a question about mogilefs rebalance. We're using mogilefs in a virtual machine, and we have to migrate it. The question is, that if the rebalance is too slow for us, can we use rsyncing instead? After rsync, we simply…
Rebalance Stuck
May 23, 2011 01:33:56 PM
We have a mogilefs cluster with a stuck rebalance. It won't stop, I can't change settings, and because of this I can't do a fsck. kel### @mogtracker1prod:~$ mogadm rebalance settings rebal_host = mogtracker2prod …
Rebalance.pm patch
October 28, 2010 05:39:20 AM
Hi, I'm finally trying the new Rebalance stuff and stubbled upon an error when using a global limit. When the JobMaster tries to load the saved rebal_state, Rebalance.pm croaks because it doesn't recognize the saved setting 'limit'. bye, Martijn.…
Rebalance stuck?
September 8, 2010 05:46:17 AM
Hi, I have a problem with enable_rebalance. It seems that after a while it stopped working. Now i read in Store.pm about List::Util::shuffle() not being really random, is this still true? I've also setup a test environment to analyse the…
Rebalance docs, Roadmap, etc
September 19, 2010 07:51:34 PM
First off, I have posted initial documentation for the unreleased rebalance overhaul: http://code.google.com/p/mogilefs/wiki/Rebalance Next, I have posted a quick Roadmap of all the major features I've been planning on working on: …
Re: source socket, rebalance issues
August 23, 2010 06:28:41 PM
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 8/23/10 12:24 PM, dormando wrote: >> It looks like the maxconns default is 10K and I don't set it explicitly. >> I assume that is connections that would show up in netstat? I'm seeing…
riak_core data handoff/rebalance
June 30, 2011 04:04:53 PM
I'm trying to understand when and how data handoff between nodes is triggered in riak_core. I was under the impression that data would be shuffled when a new node joined the cluster or an existing node left the cluster, but maybe that's not the…
source socket, rebalance issues
August 19, 2010 02:18:34 PM
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm running mogilefs 2.30 with 14 hosts with 4 devices each. When I !watch on one of my trackers, I see lots (several to many a minute) of messages like these: :: [replicate(9979)] Unable to create…
Strange problem with mindevcount=2 during FSCK after rebalance
April 27, 2011 03:22:22 AM
Hi all, Currently I'm on version 2.46 with mindevcount=2 for all my classes across 3 hosts (each has 3 devices). I used to have only 2 hosts and everything works perfectly. Recently I added the 3rd host and did a rebalance to move some files from…
python client server side routing issue after rebalance
April 13, 2011 08:31:21 PM
Good Afternoon, Voldemort developers, I recently expanded my V cluster from 4 node to 8 nodes. We have just under 5 million keys and the rebalance took about 6 hours without issues. I had started the four new servers with blank partitions in the…
Cluster is in inconsistent state got conflicting clocks version when doing a rebalance
May 5, 2011 02:18:40 AM
All, I'm attempting to shrink my cluster from 8 to 4 nodes (we replaced the first four nodes with new hardware which are much faster). However when I ran voldemort-rebalance.sh I got the following error: $ ./bin/voldemort-rebalance.sh --url…
Devcount = 1 when files don't exist even after fsck && rebalance has replicated files too mu
April 27, 2011 04:53:23 AM
Hi all, I have zoning setup between 2 datacentre locations and mindevcount of 1 (to get 1 file in each location). Here are my settings: ~# mogadm settings list network_zones = zone1,zone2 schema_version = 13 …
.16.3 Hangs
July 14, 2011 11:34:44 PM
I am still having trouble with Elastic Search. After a day of tens of thousands of inserts via the bulk api we notice that Elastic Search hangs. The REST api stops responding and the CPU is at 100 percent. The box has 16GB of Ram and a 7200 RPM…
Map job hangs indefinitely
June 22, 2011 01:22:42 AM
Hi, I am starting a job from the map of another job. Following are quick mock of the code snippets that I use. But the 2nd job hangs indefinitely after the 1st task attempt fails. There is not even a 2nd attempt. This runs fine on a cluster with…
Consumer Hangs
January 28, 2011 09:35:42 AM
Hi, Using ActiveMQ5.2 and NMS1.4.0., I created a client which acts as Producer and another client which acts as durable consumer for a topic. Producer starts sending messages to this topic, messages are received at consumer. Now consumer is…
vfs resolver hangs when using FTP
November 23, 2010 04:12:24 AM
Hi We have encountered a problem when attempting to use the VFS resolver to pull dependencies from an FTP server. The long and short of it is that Ivy just "hangs" when it attempts to connect to the repository. I think it may be a firewall issue.…
How can I get Cassandra to automatically rebalance?
April 24, 2010
I am looking at a cassandra cluster, but the administration effort seems to be quite high. Is there any way I can configure Cassandra to rebalance…
Rails Server hangs, Rake db.migrate hangs, appears to be hanging on DB server connection, but I can connect to DB server without a problem.
June 6, 2011
I'm switching a project from Rails2 to Rails3. I run: rails server The server starts up without errors: => Booting WEBrick => Rails 3.0.7…
Apache 2.0.55/PHP 5.3.5 Hangs
January 17, 2011
Recently inherited a Windows 2k3 server running XAMPP, including ancient copies of PHP, MySQL and Apache. I'm attempting to install a second,…
PHP - curl_exec hangs
June 10, 2011
I am having a strange problem with the below php function. function requestPost($url, $data) { set_time_limit(60); $output = array(); $curlSession =…
Exim4 / Thunderbird hangs
April 24, 2011
Hi, I'm trying to send e-mail with Exim4. That works fine with Telnet. But when I try Thunderbird, it says 'connection timeout'. The exim log is…
BaseHTTPRequestHandler hangs when being run by pythonw.exe 3.1
February 24, 2011
The following code works fine with python.exe but fails with pythonw.exe. I'm using Python 3.1 on Windows 7. from http.server import…
Hibernate spring hangs
May 31, 2011
hello, I'm working on an hibernate Spring Mysql app, sometimes when i make a gethibernateTemplate()get(class,id) i can see a bunch of HQL in the…
FTP upload hangs randomly
May 25, 2011
What may be the reason of random hanging during ftp uploads? It happens on some computers on my network, on others not. Every computer has Win7 pro.…
MySQL truncate hangs
January 14, 2011
I'm trying to truncate a table with 300,000 rows. When no other queries are running, I run the truncate query, and it just hangs. show processlist;…
Query hangs on COUNT()
February 21, 2011
I have a pretty complex query which i have joined a lot of tables in to a view say yes its reporting i have finally a view report_final When i do…
Web application hangs in tomcat 6.0.21/7.0.11 without clue
March 16, 2011
Hi there I just have a web application which hangs after few minutes. Basically, this is an application to build and preview forms, very customized,…
Ptrace attach to vsftpd hangs
March 21, 2011
I am trying to ptrace a vsftpd server process on linux to be able to get control whenever vsftpd process makes a system call. I start the vsftpd…
Android app hangs at SQL select query.
March 31, 2011
My program involves interaction with SQLite in a fairly regular basis, and in the beginning of the app, I call a query mDatabase.rawQuery("SELECT…
Sendmail hangs when domain MX doesn't exist
November 10, 2010
I'm using PHP's SwiftMailer library to bulk send emails (following CANSPAM and RFCs). Everything works fine until I run across a domain that does…
BIRT report viewer hangs
June 19, 2011
Using BIRT 2.3.1, when I run a report, the BIRT web viewer simply hangs. It displays the toolbar, but does not display the "loading..." progress…
Ftp linux problem - hangs after quit
March 9, 2011
I am using an external ftp server to get some files, and my program hangs when closing the server only in one of the environments. I checked it…
Virtualbox ftp hangs on list command
March 2, 2011
Hi all, I have virtual box installed on a windows 7 64-bit computer, with Cent OS 5.5 as guest os. I want to be able to use ftp between those. I've…
NSRegularExpression:enumerateMatchesInString hangs when called more than once
June 21, 2011
In the context of an iPhone app I am developing, I am parsing some html to extract data to map, using NSRegularExpression. This information is…
PHP + Fcgid hangs if download interrupted
December 22, 2010
I'm using a LAMP setup with PHP running through mod_fcgid . For most requests this works well, but I've noticed that when I download a file but…
PHP + Fcgid hangs if download interrupted
January 24, 2011
Note: this is the same as this SO post , but is possibly more appropriate here as I suspect the problem is server config related rather than code…