Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Neo Rebalance!

MogileFS::Network and mindevcount=0
(44 lines)
Rebalance docs, Roadmap, etc
(39 lines)
Sep 19, 2010
Dormando
Dormando
Yo,

I've finally uploaded a working version of the rebalance
implementation I've been thinking about for forever.

It's inteface is very raw, and I'd like to hear feedback about ways to
make it easier to use. I'll be filling in a wiki page with usage
instructions. The basics:

*********NOTE*********** the old drain/rebalance code is now completely
gone. If you mark a device as "drain" it means it will no longer get new
files, but it will not actively remove files from the device. You need to
run a rebalance to do that. This handily gives us a state that means
"readonly but delete-able", which we needed anyway.

$ mogadm rebalance
Help for 'rebalance' command:
 (enter any command prefix, leaving off options, for further help)

  mogadm rebalance policy [opts]                     Add or adjust the
current policy
  mogadm rebalance settings                          Display rebalance
settings
  mogadm rebalance start                             Start a rebalance job
  mogadm rebalance status                            Show status of
current rebalance job
  mogadm rebalance stop                              Stop a rebalance job
  mogadm rebalance test                              Show what devices the
current policy would match


$ mogadm rebalance settings
             rebal_policy = from_percent_used=95 to_percent_free=50
limit_type=device limit_by=size limit=5g fid_age=old

$ mogadm rebalance policy --options="from_hosts=3 to_percent_free=50"

$ mogadm rebalance test
Tested rebalance policy...
Policy: etc

Source devices:
 - 100
 - 102
 - 103
 - 104
Destination devices:
 - 156
 - 157
 - 158
 - 159

(This will get a lot of formatting work. Useful for seeing what your
drafted policy would match before executing anything).

$ mogadm rebalance status
Rebalance is running
Rebalance status:
             bytes_queued = 126008251219
           completed_devs =
,102,125,151,148,106,138,114,153,137,139,129,110,147,135,112,140,124,104,131,121,154,126,117
              fids_queued = 519021
             sdev_current = 119
             sdev_lastfid = 54646960511250969
               sdev_limit = 2840763873
              source_devs =
108,115,103,113,152,142,107,133,149,123,136,116,144,141,100,128,120,134,150,155,130,122,143,105,146,111,132


Reply
Tags: intefaceimplementation
Messages in this thread
Neo Rebalance!
reply Re: Neo Rebalance!
(91 lines) Sep 19, 2010 14:36
reply Re: Neo Rebalance!
(92 lines) Sep 21, 2010 02:15
reply Re: Neo Rebalance!
(103 lines) Sep 21, 2010 18:50
Similar Threads
Re: Does anyone actively use rebalance?
Rebalance has been immensely useful to us. When adding new storage
nodes, we are able to mitigate the damage from HD failures, are able
to add nodes more sparingly, and just generally balances out load
across storage nodes.

( 12 storage nodes, 6 drives each if it helps )

On Wed, Jul 14, 2010 at 3:12 AM, dormando <dorm### @rydia.net>
wrote:
 Hey,

 Are there any of you out there who active use the existing
"rebalance"
 feature and have measureable benefits from it? Please confirm that
you
 aren't just running it because it felt like a good idea, and that you
 actually get results from it?

 I have a larger plan for rewriting rebalance by wiring it over the
new
 drain code, but I can also just get the new drain code out very
quickly
 which will fix many problems for many people.

 However in the process I might disable/destroy the existing rebalance
 code, and it'll stay that way until we can finish writing the
rebalance
 stuff on top of it.

 If there're enough complainers I'll try to not break the old code, or
just
 wait until I can replace all of it at once...

 Thanks,
 -Dormando



Rebalance stuck?
Hi,

I have a problem with enable_rebalance. It seems that after a while it
stopped
working. Now i read in Store.pm about List::Util::shuffle() not being
really
random, is this still true?

I've also setup a test environment to analyse the rebalance problems and
after
three days i'm still not very sure why it also seems to get stuck with only
14
files. The current device usage situation is as follows:

Checking devices...
  host device         size(G)    used(G)    free(G)   use%   ob state  
I/O%
  ---- ------------ ---------- ---------- ---------- ------ ----------
-----
  [ 1] dev1             0.176      0.055      0.121  31.01%  writeable  
0.0
  [ 1] dev2             0.176      0.074      0.102  42.16%  writeable  
0.0
  [ 2] dev4             0.176      0.128      0.048  72.51%  writeable  
0.0
  [ 3] dev3             0.176      0.123      0.053  70.03%  writeable  
0.0
  ---- ------------ ---------- ---------- ---------- ------
             total:     0.703      0.379      0.324  53.93%

where the devices are spread along the three hosts like this:

virtualmedia1 [1]: alive
                   used(G) free(G) total(G)
  dev1: alive      0.054   0.122   0.176  
  dev2: alive      0.073   0.103   0.176  

virtualmedia2 [2]: alive
                   used(G) free(G) total(G)
  dev4: alive      0.127   0.049   0.176  

virtualmedia3 [3]: alive
                   used(G) free(G) total(G)
  dev3: alive      0.123   0.053   0.176  


I've set enable_rebalance=1 and never gets reset. For all rebalance attemps
it says:

Rebalance for DevFID[d=3;f=36]
(http://192.168.210.1:7500/dev3/0/000/000/0000000036.fid) failed: no
suitable destination devices available

and it seems it tries to itterate over all fids endlessly.

Could someone possibly shed some light on this? In our production
environment
we have a new server on which each device is full for only 9% while the
other
two are filled around 80% and rebalance doesn't work there either. We
really
need this because we pay bandwith per server (if we consume too much) so we
really need to get this balanced.


thanks in advance,

Martijn


source socket, rebalance issues
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I'm running mogilefs 2.30 with 14 hosts with 4 devices each. When I
!watch on one of my trackers, I see lots (several to many a minute) of
messages like these:

:: [replicate(9979)] Unable to create source socket to 10.2.128.90:7500
for /dev90128/0/670/784/0670784277.fid
:: [replicate(9979)] Failed copying fid 670784277 from devid 90128 to
devid 96208 (error type: src_error)
:: [replicate(9979)] copy_error: error copying fid 670784277 from devid
90128 during replication
:: [replicate(9977)] Unable to create source socket to 10.2.131.210:7500
for /dev210321/0/670/783/0670783216.fid
:: [replicate(9977)] Failed copying fid 670783216 from devid 210321 to
devid 96208 (error type: src_error)

I ran a fsck a while ago(now long completed, according to status) and
occasionally see lines like this:

:: [fsck(9970)] node 10.2.128.90 seems to be down in get_file_size
:: [fsck(9970)] Connectivity problem reaching device 90228 on host
10.2.128.90

very rarely do I actually see real monitor timeouts.

mogadm check shows the cluster is fairly bored with not much IO on the
hosts. The DB machine isn't overloaded, either. I'm using the zonelocal
and network plugins.

I'm also noticing that I have a significant number of files way over
replicated.  My replication policy has a max of 4 for any class,
however, out of my 212Mn files in mogilefs, about 2.5Mn have 10 or more
copies.  Many millions more are replicated 6 times or more.

I added some new nodes and ran a rebalanace.  After only a couple of
percents it would stop.  So, I start it again but it stops after a
couple more percents, repeat.

Any thoughts appreciated.

here's some more info:

!stats
uptime 8367694
pending_queries 0
processing_queries 0
bored_queryworkers 10
queries 2648773
work_queue_for_delete 70
work_queue_for_fsck 150
work_queue_for_replicate 10

!jobs
delete count 1
delete desired 1
delete pids 21633
fsck count 1
fsck desired 1
fsck pids 9970
job_master count 1
job_master desired 1
job_master pids 9971
monitor count 1
monitor desired 1
monitor pids 9986
queryworker count 10
queryworker desired 10
queryworker pids 533 1976 2957 12534 22680 24926 27103 27710 28132 29976
reaper count 1
reaper desired 1
reaper pids 22503
replicate count 5
replicate desired 5
replicate pids 9965 10016 14167 14317 31029


mogilefsd.conf:
db_dsn = DBI:mysql:blah:blah
local_network = 10.0.128.0/22

db_user = ...
db_pass = ...
listen = 0.0.0.0:7001
conf_port = 7001
listener_jobs = 10
delete_jobs = 1
replicate_jobs = 5
mog_root = /var/lib/mogdata
reaper_jobs = 1
plugins = ZoneLocal

mogstored.conf:
httplisten=0.0.0.0:7500
mgmtlisten=0.0.0.0:7501
docroot=/var/lib/mogdata

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFMbYNa+Idx1gGGQ1YRAgyEAJ9Rxjbo9ajioA3cb8iRJWJLpG19egCfXzxA
ot3kTHy2+5k5ZRmxpvWD1tw=
=e/jJ
-----END PGP SIGNATURE-----


Re: source socket, rebalance issues
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/23/10 12:24 PM, dormando wrote:
> It looks like the maxconns default is 10K and I don't set it
explicitly.
>  I assume that is connections that would show up in netstat?  I'm
seeing
> less than 1K active connections there.
>
> I'm even seeing this occasionally:
>
> ro### @a0100:/etc/mogilefs# mogadm check
> Checking trackers...
>   127.0.0.1:7001 ... REQUEST FAILURE (is the tracker up?)
> Unable to retrieve host information from tracker(s).
>
 
 Is it hovering around 1k as in suspiciously close to 1k? or well
below 1k?
 There's a chance that it would have failed to increase the maxconns
if not
 started from root or from a user with adjusted maxconns.

Actually, I just changed my methodology slightly to weed out the
TIME_WAITS and other stuff and now run:

netstat -anp | grep mogstored | wc -l

and now see only 300-400 sockets open and still see timeouts when I !watch

 
 when you run mogadm check, is it failing immediately or does it feel
like
 a timeout?

When it fails, which is probably only 5% of the time, it seems to do so
pretty quickly, which could be in about 2s.  When I normally do a
"mogadm check" there's a bit of a pause before it returns "ok" and when
it fails the pause length is about the same amount of time.







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFMcwP6+Idx1gGGQ1YRAmnjAJ4qL+0yWP/gPcxP+rJPI4rlRCWIbACfQuLr
TfXVy4kxju3kvaatxo/cXrU=
=ohD5
-----END PGP SIGNATURE-----