Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Capacity planning and Re: Handling disk-full scenarios

Giant sets of ordered data
(13 lines)
Changing replication factor from 2 to 3
(17 lines)
Jun 2, 2010
Ian Soboroff
Ian Soboroff
Reading some more (someone break in when I lose my clue ;-)

Reading the streams page in the wiki about anticompaction, I think the
best
approach to take when a node gets its disks overfull, is to set the
compaction thresholds to 0 on all nodes, decommission the overfull node,
wait for stuff to get redistributed, and then clean off the decommissioned
node and bootstrap it.  Since the disks are too full for an
anticompaction,
you can't move the token on that node.

Given this, I wonder about the right approach to capacity planning.  If I
want to store, say, 500M rows, and I know based on current cfstats that
the
mean compacted size row is 27k, how much overhead is there on top of the
13.5 TB of raw data?

Trying to compute from what I have, in cfstats I have a total "Spaced used
(total)" of around 1.6TB (this is only a subset of the data loaded so
far),
but when I could data directories using du(1) I get around 23TB already
used.


On Wed, Jun 2, 2010 at 11:29 AM, Ian Soboroff <isobo### @gmail.com>
wrote:

 Ok, answered part of this myself.  You can stop a node, move files
around
 on the data disks, as long as they stay in the right keyspace
directories,
 and all is fine.

 Now, I have a single Data.db file which is 900GB and is compacted. 
The
 drive its on is only 1.5TB, so it can't anticompact at all.  Is there
 anything I can do?  The replication factor is 3, so one idea is to
take down
 the node, blow away the huge file, adjust the token, and restart the
node.
 At that point I'm not sure what to tell the new node or other nodes
to do...
 do I need to run a repair, or a cleanup, or a loadbalance, or ...
what?

 It would be great to be able to fix a storage quota on a
per-data-directory
 basis, to ensure that enough capacity is retained for anticompaction.
 Default 45% quota, adjustable for the brave.

 Ian


 On Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff
<isobo### @gmail.com> wrote:

> My nodes have 5 disks and are using them separately as data
disks.  The
> usage on the disks is not uniform, and one is nearly full.  Is
there some
> way to manually balance the files across the disks?  Pretty much
anything
> done via nodetool incurs an anticompaction with obviously fails. 
system/ is
> not the problem, it's in my data's keyspace.
>
> Ian
>
>



Reply
Tags: readingnodesthresholdscompaction
Similar Threads
Handling disk-full scenarios
My nodes have 5 disks and are using them separately as data disks.  The
usage on the disks is not uniform, and one is nearly full.  Is there some
way to manually balance the files across the disks?  Pretty much anything
done via nodetool incurs an anticompaction with obviously fails.  system/
is
not the problem, it's in my data's keyspace.

Ian


full disk woes
Hey HDFS gurus -

I searched around the list archives and jira, but didn't see an existing
discussion about this.

I'm having issues where HDFS in general has free space, however, certain
machines -- and certain disks -- become full. For example, below is disk
usage for an average looking node for this cluster, meaning the balancer
won't want to move data off this machine.

Originally, I wanted to alert when HDFS in general was getting full, but
that doesn't work in practice because certain machines fill up. And I
can't
look at the per-machine stats, because individual disks fill up. I really
don't want to care about individual disks in HDFS but it seems they can
cause actual problems.

Does anyone else run into machines with overfull disks? Any tips on how to
avoid getting into this situation?


Configured capacity: 7.72 TB
Used: 6.43 TB

Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c1d0p1      65G   15G   46G  25% /
tmpfs                  31G     0   31G   0% /dev/shm
/dev/cciss/c0d0       275G  217G   45G  83% /data/disk000
/dev/cciss/c0d1       275G  219G   43G  84% /data/disk001
/dev/cciss/c0d2       275G  216G   46G  83% /data/disk002
/dev/cciss/c0d3       275G  220G   42G  85% /data/disk003
/dev/cciss/c0d4       275G  248G   14G  95% /data/disk004
/dev/cciss/c0d5       275G  219G   43G  84% /data/disk005
/dev/cciss/c0d6       275G  219G   43G  84% /data/disk006
/dev/cciss/c0d7       275G  213G   49G  82% /data/disk007
/dev/cciss/c0d8       275G  220G   42G  85% /data/disk008
/dev/cciss/c0d9       275G  208G   54G  80% /data/disk009
/dev/cciss/c0d10      275G  216G   46G  83% /data/disk010
/dev/cciss/c0d11      275G  218G   44G  84% /data/disk011
/dev/cciss/c0d12      275G  223G   39G  86% /data/disk012
/dev/cciss/c0d13      275G  221G   41G  85% /data/disk013
/dev/cciss/c0d14      275G  248G   14G  95% /data/disk014
/dev/cciss/c0d15      275G  219G   43G  84% /data/disk015
/dev/cciss/c0d16      275G  216G   46G  83% /data/disk016
/dev/cciss/c0d17      275G  216G   46G  83% /data/disk017
/dev/cciss/c0d18      275G  219G   43G  84% /data/disk018
/dev/cciss/c0d19      275G  220G   42G  84% /data/disk019
/dev/cciss/c0d20      275G  213G   49G  82% /data/disk020
/dev/cciss/c0d21      275G  215G   47G  83% /data/disk021
/dev/cciss/c0d22      275G  247G   15G  95% /data/disk022
/dev/cciss/c0d23      275G  218G   44G  84% /data/disk023
/dev/cciss/c0d24      275G  222G   40G  86% /data/disk024
/dev/cciss/c1d1p1     275G  184G   78G  71% /data/disk025
/dev/cciss/c1d2p1     275G  176G   86G  68% /data/disk026
/dev/cciss/c1d3p1     275G  178G   84G  68% /data/disk027
/dev/cciss/c1d4p1     275G  177G   85G  68% /data/disk028
/dev/cciss/c1d5p1     275G  179G   83G  69% /data/disk029
/dev/cciss/c1d6p1     275G  181G   81G  70% /data/disk030

--travis


single node capacity
Hi,

How much data load can a single typical cassandra instance handle?
It seems like we are getting into trouble when one of our node's load
grows
to bigger than 200g. Both read latency and write latency are increasing,
varying from 10 to several thousand milliseconds.
machine config is 16*cpu 32G RAM
Heap size is 10G
Any suggestion of tuning?
Or should I start considering adding more nodes when the data grows to
this
big?

Thanks


Inconsistent HTTP response code returned for WS-RM scenarios
I noticed an inconsistent HTTP response code in the WS-RM server side
implementation that can lead to interoperability issues. This can be
observed in the sample demo scenario.

The WS-RM spec refers to WS-I Basic Profile for its HTTP binding
behavior and the profile states:

-  R1111 An INSTANCE SHOULD use a "200 OK" HTTP status code on a
response message that contains an envelope that is not a fault.

-  R1112 An INSTANCE SHOULD use either a "200 OK" or "202 Accepted"
HTTP status code for a response message that does not contain a SOAP
envelope but indicates the successful outcome of a HTTP Request.

The first problem is for a decoupled-endpoint case, where non-empty
content is returned with HTTP 202. Concretely, when using a
decoupled-endpoint, http://localhost:9990/decoupled_endpoint, the HTTP
response to the original request should be empty with HTTP 202
Accepted, as the concrete response is returned to this decoupled
endpoint. However, the current implementation returns a non-empty
content with HTTP 202 Accepted, more precisely, the content being a
SOAP envelope with no Body child but with the WS-address fields
filled.

In this case, the response content should be empty. I think the
partial response must be processed differently for this case so that
the response remains empty, as done in OneWayProcessorInterceptor.
This issue may not be critical, as any implementation that receives
HTTP 202 may probably ignore the content even if it is present.

The second problem is for an anonymous-endpoint case, where a valid
non-empty SOAP envelope is returned with HTTP 202. In this case, the
HTTP response to the first message, for CreateSequenceRequest, the
HTTP response is a SOAP envelope with CreateSequenceResponse with HTTP
200 OK. This is fine. However, for the subsequent messages, the HTTP
response is a SOAP envelope with some valid content, like
SequenceAcknowledgement, but returned with status HTTP 202 Accepted.

In this case, the response code should be HTTP 200 OK. It seems, the
HTTP code for oneway services is automatically set to HTTP 202 in
AbstractHTTPDestination. When the partial response content is not
empty, as in the WS-RM with anonymous-endpoint case, the HTTP status
code should be changed somewhere during the partial response handling
to HTTP 200 OK. This issue is more critical, as some implementation
may simply ignore the content when receiving HTTP 202.

Could you comment on this issue?

Thanks.
Best regards, Aki


using more than 50% of disk space
We're investigating Cassandra, and we are looking for a way to get
Cassandra
use more than 50% of it's data disks.  Is this possible?

For major compactions, it looks like we can use more than 50% of the disk
if
we use multiple similarly sized column families.  If we had 10 column
families of the same size, we could use 90% of the disk, since a major
compaction would only need as much free space as the largest column family
(in reality we would use less).  Is that right?

For bootstrapping new nodes, it looks like adding a new node will require
that an existing node does anti-compaction.  This anti-compaction could
take
nearly 50% of the disk.  Is there a way around this?

Is there anything else that would prevent us from using more than 50% of
the
data disk.

Thanks,

Sean


Buffering Output to Disk
Hi, everyone. I have a PHP script that is a huge content generator,
and it's making my server run out of memory very often, because the
content is generated faster than the user can download it, I guess. So
I assume that it's buffering the output on memory. My question is if
there's any way Apache would buffer the script output on disk, not on
RAM. I don't want to cache the output. It must be generated at every
user request. What can I do?

Thanks.


Problematic disk in a datanode
Hey,

A while ago We've added a new disk (volume) to every datanode in our
cluster.
We have configured the disks in "data.dfs.dir" in hdfs-site both on the
job
tracker and on each machine.
This went successfully for all of the machines except one, where the new
disk was not recognized by hadoop.

We can not find out what's wrong with it.

We know that the new disk is not recognized because
"http://namenode:50070/"
shows smaller capacity to that machine.
The mapred + hdfs directories on that drive exist, but they are not
identical to the structure of directories in other disks:
In the problematic drive there is no "local" directory under "mapred", and
no "name", "namesecondary" directories under "hdfs".

This problem was not so terrible until now, when the rest of the disks are
full:
The logs started containing errors such as "No space left on device" and
"DiskErrorException: Could not find any valid local directory for
taskTracker/jobcache/".
Some Hadoop jobs fail with the same errors, and the datanode+tasktracker
on
that machine crash a lot.

How do we install this disk properly?

Thanks in advance.

Technical info: hadoop-0.20, centos, each machine is datanode and
tasktracker (another machine is jobtracker + namenode).




Concerning the capacity to create a table column in which lies another table
Hello,

I need to know if it is possible to create a column in which a table can
be stored. Here is an example table:

Games

:Storing file on disk temporarily
Hi,

I have Common FileUpload for uploading file on DB.Can you please tell me
the pros and cons for storing file on disk temporarily till request
processing?

Your help will be highly appreciated.

 

Thanks and Regards,

 

Oracle logo.gif
Nitin Anande | Associate Consultant | +91 20 6670 7616 (O) +91 99752 45341
(M)
Oracle Financial Services PrimeSourcing
Pune, India

 

Oracle Financial Services Software Limited was formerly i-flex solutions
limited.

HYPERLINK "http://www.oracle.com/commitment" \nGreen Oracle

Oracle is committed to developing practices and products that help protect
the environment

 

 


Cassandra disk space utilization
Hi guys,
I have what may be a dumb question but I am confused by how much disk
space is 
being used by my Cassandra nodes.  I have 10 nodes in my cluster with a 
replication factor of 3.  After I write 1,000,000 rows to the database
(100kB 
each), I see that they have been distributed very evenly, about 100,000
rows 
per node but because of the replication factor of 3, each node contains
about 
300,000 rows.  This is all good.  Since my rows are 100kB each, I expect
each 
node to store about 30GB of data, however that is not what I am seeing.  
Instead, I am seeing some nodes that do not experience any compaction 
exceptions but report their space used as MUCH more.  Here's one using 106
GB 
of disk.  My disks are only 160 GB so this is at the bleeding edge and I 
thought my node would be able to store more data.

I only use a single column family so here is the cfstats output from one
of my 
nodes (server5):

		Column Family: Standard1
		SSTable count: 12
		Space used (live): 113946099884
		Space used (total): 113946099884
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 451
		Read Count: 31786
		Read Latency: 161.429 ms.
		Write Count: 300633
		Write Latency: 0.124 ms.
		Pending Tasks: 0
		Key cache: disabled
		Row cache capacity: 3000
		Row cache size: 3000
		Row cache hit rate: 0.38331340841880074
		Compacted row minimum size: 100220
		Compacted row maximum size: 100225
		Compacted row mean size: 100224

Note that I wrote these 1M rows of data yesterday and the system has had
24 
hours to digest it. There are no exceptions in the system.log file. 
Here's 
the tail end of it:

...
INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,162
SSTableDeletingReference.java (line 104) Deleted
/var/lib/cassandra/data/Keyspace1/Standard1-430-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,269
SSTableDeletingReference.java (line 104) Deleted
/var/lib/cassandra/data/Keyspace1/Standard1-445-Data.db
 INFO [COMPACTION-POOL:1] 2010-07-06 16:35:21,718 CompactionManager.java
(line 
246) Compacting []
 INFO [Timer-1] 2010-07-06 17:01:01,907 Gossiper.java (line 179)
InetAddress 
/10.248.107.19 is now dead.
 INFO [GMFD:1] 2010-07-06 17:01:42,039 Gossiper.java (line 568)
InetAddress 
/10.248.107.19 is now UP
 INFO [COMPACTION-POOL:1] 2010-07-06 17:35:21,306 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-06 18:35:20,802 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-06 19:35:20,389 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-06 20:35:19,934 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-06 21:35:19,582 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-06 22:35:19,233 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-06 23:35:18,593 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-07 00:35:18,076 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-07 01:35:17,673 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-07 02:35:17,172 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-07 03:35:16,784 CompactionManager.java
(line 
246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-07-07 04:35:16,383 CompactionManager.java
(line 
246) Compacting []

Thank you for your help!!
Julie




When ActiveMQ does flush non persistent messages to disk
We have some administrative task, from time to time. For example, move
database on another physical server etc. It's very easy when write to
database in offloaded with ActiveMQ. We simply turn off the consumer
updating database, move database, switching read to a new database, and
finally, turning on consumer with new database.

It's all works fine when task quite small (in terms of time). But if we
turn off consumer for a long period of time (for example a day), we
experience problems with non persisted messages. ActiveMQ try to hold
them all in memory, so soon a later it hangs up with OutOfMemoryError.

This shouldn't be a big problem, as of ActiveMQ have special store (Temp
store) for flushing non persistent messages on a disk. I'm play around
with some configuration options (memoryUsage, tempUsage, queue memory
limit policies), but can not figure it out, how to deal with this
problem.

Thanks a lot.




Re: Key cache capacity: 1 when using KeysCached="50%"
That does look like a bug.  Can you create a ticket and upload a
(preferably small-ish) sstable that illustrates the problem?

On Mon, May 24, 2010 at 12:07 PM, Ran Tavory <ran### @gmail.com>
wrote:
 I'd like to have 100% keys cached. Sorry if my example of Super2
wasn't
 correct, but I do think there's a problem. Here's with my own data:
 When using actual numbers (in this case for RowsCached) it works as
 expected, however when specifying KeysCached="100%" I get only 1.
       <ColumnFamily CompareWith="BytesType" Name="KvAds"
         KeysCached="100%"
         RowsCached="10000"
         />

                 Column Family: KvAds
                 SSTable count: 7
                 Space used (live): 797535964
                 Space used (total): 797535964
                 Memtable Columns Count: 42292
                 Memtable Data Size: 10514176
                 Memtable Switch Count: 24
                 Read Count: 2563704
                 Read Latency: 4.590 ms.
                 Write Count: 1963804
                 Write Latency: 0.025 ms.
                 Pending Tasks: 0
                 Key cache capacity: 1
                 Key cache size: 1
                 Key cache hit rate: 0.0
                 Row cache capacity: 10000
                 Row cache size: 10000
                 Row cache hit rate: 0.2206178354382234
                 Compacted row minimum size: 386
                 Compacted row maximum size: 9808
                 Compacted row mean size: 616

 On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis
<jbel### @gmail.com> wrote:
>
> If you really want a cache capacity of 0 then you need to use 0
> explicitly, otherwise the % versions will give you at least 1.
>
> On Mon, May 24, 2010 at 12:34 AM, Ran Tavory
<ran### @gmail.com> wrote:
> > I've noticed that when defining KeysCached="50%"
(or KeysCached="100%"
> > and I
> > didn't test other values with %) then cfstats reports Key
cache
> > capacity: 1
> > This looks weird... is this expected? (version 0.6.1)
> > For example, in the default configuration:
> >       <ColumnFamily Name="Super2"
> >                     ColumnType="Super"
> >                     CompareWith="UTF8Type"
> >                   
 CompareSubcolumnsWith="UTF8Type"
> >                     RowsCached="10000"
> >                     KeysCached="50%"/>
> >
> > 
Does the Kerberos server reads/writes anything from/on disk during AS requests?
Greetings again.

I was performing latency tests on ApacheDS kerberos services, and
comparing it to my own Kerberos prototype, which uses state machine
replication to be executing in more than one machine. Given this fact,
I was expecting that the  response of my prototype would be much
slower than ApacheDS, but as far as requesting TGT's, it takes 25 to
30 miliseconds to obtain them in ApacheDS, and in my prototype, it
takes only 3 to 4 miliseconds.

Given the fact that my prototype needs to perform much more
comunication steps between replicas than ApacheDS does - due to the
replication -, I was expecting these results to be reversed. So i was
wondering if ApacheDS reads or writes anything on disk while
requesting for TGT's. If not, does anyone has any idea why is there
such overhead on ApacheDS?

Thanks in advance

Note: when requesting normal tickets, the results of both services are
quite similar.


total disk space used on a node for a CF is too large than expected
 row size is 10 KB and write count on a node for a CF is 1054451,
so ideally the total disk space used on that node by that CF should be
around 10 GB
but it's showing  23 GB
what else might be taking up so much space?

Thanks


Full Text Search jackrabbit 2.1.0
Hi everyone,
 
I use jackrabbit 2.1.0, and I'd like to do full text search in nodes that 
hold documents (word, pdf.. and so on)
I wrote the following code, and the porblem is that it never returns 
result! Although the documents are there and the query string which I 
enter does exist in those documents. Don't know what did I miss or did 
wrong!
Could it be because I didn't specify values for the columns and orderings?

Actually I don't know what are these!
When I use XPath (which is deprecated) it works fine.

Here is the JQOM code:
 
        QueryManager queryManager = 
session.getWorkspace().getQueryManager();
        QueryObjectModelFactory qomf = queryManager.getQOMFactory();
        ValueFactory vf = session.getValueFactory();

        String selectorName = "fullTextSearchSelector";
        Selector selector = qomf.selector("nt:resource", selectorName);

        Constraint constraint = qomf.fullTextSearch(selectorName, 
"jcr:data", qomf.literal(vf
                .createValue("someText")));

        QueryObjectModel queryObjectModel = qomf.createQuery(selector, 
constraint, null,
                null);

        QueryResult result = queryObjectModel.execute();
        RowIterator iter = result.getRows();
        System.out.println("size: " + iter.getSize());
        while (iter.hasNext()) {
            Row row = iter.nextRow();
            System.out.println("Row: " + row.toString());
        }
 

Please, can any one tell me what could be wrong here? And if it's better 
ot use SQL, so how?
 
Thank you in advance.

How to get Performing full renegotiation error log.
Hi All,



Can anyone tell me the steps how to get this log error in “error_log”
file
of Apache 2.2.

“modules/ssl/ssl_engine.kernel.c”

/* do a full renegotiation */

            ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server,

                         "Performing full renegotiation: "

                         "complete handshake protocol");



Do I need to modify setting in “httpd.conf” file?



Thanks,

matty


Getting full head error in GET (REST)
When I am using post I am able to send message to queue but in GET I am
getting "full head" Error.
The messages which I am sending and consuming are xml messages.

can anyone please help me out with this...

Thanks in advance..

-Mahesh





Problem with full text search on PDFs
I have got a problem with Jackrabbit 2.1.0 and full text search on PDFs.

I have created a repository containing several plain text and PDF 
documents using the Java APIs. I am able to use the Java API to perform 
full text search on the text documents, but not the PDFs.

When I use the CLI to the standalone server to execute this query

[/] > xpathquery "//element(*, nt:file)[jcr:contains(jcr:content, 
\'*Typographical*\')]"

the result is 11 file nodes, correctly. But with the Java API and code:

String sql = "SELECT * FROM [nt:resource] AS resource WHERE 
CONTAINS(resource.*, '%Typographical%')";
Query query = queryManager.createQuery(sql, Query.JCR_SQL2);

the result is no nodes returned.  Thanks for any help on this.




Re: Hive-Hbase Key lookup w/o full scan
Hi Ray,

Apologies for my very slow response.

Here is a draft of a doc which explains how I think we can tackle this:

http://wiki.apache.org/hadoop/Hive/FilterPushdownDev

Maybe you can work on translation from ExprNodeDesc -> HBase scan
object?  If you can get that working in isolation in unit tests, I can help
with the rest of the parts for plumbing the filter through from Hive's
optimizer.

JVS

On Jul 1, 2010, at 2:57 PM, Ray Duong wrote:

Thanks John,

Can you provide me with some pointers?.  My team can try to work on it.

Our workaround right now is to call the Thrift API from within Hive using
a UDF.

Thanks,
-ray


On Thu, Jul 1, 2010 at 1:19 PM, John Sichi
<jsi### @facebook.com<mailto:jsi### @facebook.com>> wrote:
On Jul 1, 2010, at 10:36 AM, Ray Duong wrote:

 Is there away to do a hbase key lookup using the Hive-Hbase
integration without doing a full scan?

 Since I'm specifying the key='foo' in the where condition, shouldn't
it be a fast lookup?

Hi Ray,

Pushing down filters to HBase is one of our roadmap items.

https://issues.apache.org/jira/browse/HIVE-1226

If you'd like to work on it, let me know and I'll give you some pointers.

JVS