Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Re: Cassandra Scaling Questions

Re: unable to start cassandra
(13 lines)
Having different 0.6.x instances in one Cassandra cluster
(17 lines)
Aug 5, 2010
Oleg Anastasjev
Oleg Anastasjev
 
 1.) What have you found to be the best ratio of Cassandra row cache
to memory
free on the system for filesystem cache?  Are you tuning it like an RDBMS
so
Cassandra has the vast majority of the RAM in the system or are you
letting the
filesystem cache do some of the work?

This depends on your exact case: how much rows are in a hot set. Throwing
too
much memory to JVM cache results in slower garbage collection with no
effect on
performance. There are cases (for ex, large rows, which are read mostly
partially using get_slice), for which row cache will do things worse. I
did a
try and watch approach, changing size of row cache and watching for row
cache
hit ratio and op/s. Hit ratio of 0.9 was enough for my case.

 
 2.) Is the Cassandra cache write-through (ie are new records held in
the row
cache as they're written to disk?

Not exactly. Cassandra keeps recent writes (not rows) in memory, but after
flushing memtable, it will reread from disk (and reconstruct) whole row to
row
cache on 1st read if data. 

 
 3.) When using the random partitioner how much difference should be
expected
(or has been observed) between nodes?  2%? 10%?

This depends on data. It will distribute keys almost equal between nodes,
nut
sizes of row data could be different for different keys. In my case it was
about
0.2% 








Reply
Tags: systemcachecassandra
Similar Threads
Cassandra Scaling Questions
Hi All,
I've got a couple questions that have come up about how Cassandra works
and
what others are seeing in their environments.  Here goes:

1.) What have you found to be the best ratio of Cassandra row cache to
memory free on the system for filesystem cache?  Are you tuning it like an
RDBMS so Cassandra has the vast majority of the RAM in the system or are
you
letting the filesystem cache do some of the work?

2.) Is the Cassandra cache write-through (ie are new records held in the
row
cache as they're written to disk?

3.) When using the random partitioner how much difference should be
expected
(or has been observed) between nodes?  2%? 10%?

3.5) Can a load balance be expected to bring the data distribution pretty
close to even among all nodes in the ring?  Is the correct process for a
loadbalance to run the loadbalance operation on each node in the ring?


Thanks!  I'm curious to hear what other's have observed.
-Aaron


Some questions about using Cassandra
This is a multi-part message in MIME format.
We are currently looking at a distributed database option and so far
Cassandra ticks all the boxes. However, I still have some questions.

 

Is there any need for archiving of Cassandra and what backup options are
available? As it is a no-data-loss system I'm guessing archiving is not
exactly relevant.

 

Is there any concept of Listeners such that when data is added to
Cassandra we can fire off another process to do something with that
data? E.g. create a copy in a secondary database for Business
Intelligence reports? Send the data to an LDAP server?

 

 

Anthony Ikeda

Java Analyst/Programmer

Cardlink Services Limited

Level 4, 3 Rider Boulevard

Rhodes NSW 2138

 

Web: www.cardlink.com.au | Tel: + 61 2 9646 9221 | Fax: + 61 2 9646 9283

 

 


**********************************************************************
This e-mail message and any attachments are intended only for the use of
the addressee(s) named above and may contain information that is privileged
and confidential. If you are not the intended recipient, any display,
dissemination, distribution, or copying is strictly prohibited.   If you
believe you have received this e-mail message in error, please immediately
notify the sender by replying to this e-mail message or by telephone to
(02) 9646 9222. Please delete the email and any attachments and do not
retain the email or any attachments in any form.
**********************************************************************

Cassandra questions
Hi,

Being fairly new to Cassandra I have a couple of questions:

1) Is there a way to remove multiple keys/rows in one operation (batch) or
must keys be removed one by one?
2) I see API references to version 0.7, but I couldn't find a alpha or
beta anywhere? Does it exist already and if so, where can I get it? Or
else, when is it planned to be public/released?

Thanks in advance, Hugo.



      


more questions on Cassandra ACID properties
Hi,

I have more questions on Cassandra ACID properties.
Say, I have a row that has 3 columns already: colA, colB and colC

And, if two *concurrent* clients perform a different insert(...) into the
same row,
one insert is for colD and the other insert is for colE.
Then, Cassandra would guarantee both columns will be added to the same
row.

Is that correct?

That is, insert(...) of a column does NOT involving reading and rewriting
other existing columns of the same row?
That is, we do not face the following situation:
client X: read colA, colB and colC; then write: colA, colB, colC and colD
client Y: read colA, colB and colC; then write: colA, colB, colC and colE


BTW, it seems to me that insert() API as described in the wiki page:
http://wiki.apache.org/cassandra/API
should handle updating an existing column as well by the replacing the
existing column value.

If that is the case, I guess we should change the wording from "insert" to
"insert or update" in the wiki doc
And, ideally, insert(...) API operation name would be adapted
to update_or_insert(...)


Looking forward to replies that may confirm my understanding.
Thanks!


Regards,
Alex Yiu


backlogging and scaling
I'm curious if there are any efforts ongoing to amortize the
background tasks in Cassandra over time?
Specifically, the cost of compaction and AE, rebalancing, etc seems to
be a problem for some users when they are expecting more steady-state
performance. While this may sometimes be the result of a cluster which
is at its marginal capacity, users are still surprised with the
performance hit or downtime required for common operations. Making the
cluster able to make finer-grained and measurable progress towards the
ideal state may help other users, too.

Is there a feasible design or enhancement which may allow these types
of background tasks to be broken apart into smaller pieces without
compromising overall consistency?
It would be excellent if the user could see the over-all state of the
storage cluster, and to choose the proportion of resources allocated
to recovering backlog vs servicing clients, etc.
Even better, if there were some basic heuristics which worked well for
the general case, and users would only have to see the scheduling plan
in special situations.

How would you go about doing that? Does the current architecture lend
itself to this type of optimization, or otherwise?


questions about cache?
Hi,

I've seen people ask before about the entire cache being flushed for a
single object commit, and that there is no evict. My question is why?
Is there some special reason why the whole cache has to be flushed and
we can't just evict a single object?

Secondly, under what circumstances does a SQL query go to the cache
and when does it bypass the cache and go straight to the database? I
ask this cause we have lots of queries like "select * from myTable
where field1 = someValue and field2 = anotherValue". Is this sort of
query cachable or only select by primary key queries?

Raj.


ZK recovery questions
Hi, I've been reading the docs and trying out some basic Zookeeper
examples.
I have a few simple questions related to recovery.

It would be good to have questions like these on the Wiki/docs to avoid
noobs like me asking the same thing over and over.


   - If 1 out of 3 servers crashes and the log files are unrecoverable,
how
   do we provision a replacement server?


   - If the server log is recoverable but provisioning takes a long time,
   then what happens if the old log file is far behind the current state?
The
   docs say that recovery is based on fuzzy check pointing and snapshots
but I
   wasn't clear as to how long "catching up" would take


   - What happens at the client side code if a server quorum is lost? Does
   the ZK service freeze or continue to service just reads?
      - If there was a temporary glitch (n/w or GC) and the replica to
which
      the client is connected breaks away from the quorum does the client
get
      notified? Does it stop processing client requests? Does it rejoin
the
      cluster without manual intervention?
      - Now if even the client cannot connect to other servers (split
brain)
      .. ... well I suppose this question is moot


   - Do the servers really have to run with file based persistence? I saw
   that someone wanted this in-memory mode for unit testing (ZK
694<https://issues.apache.org/jira/browse/ZOOKEEPER-694>)
   but there are cases where only a transient ZK service is needed. Most
   enterprise systems have replicated Databases anyway. So, the fear of
data
   loss is minimal. If ZK logs are the only means of recovery, then this
might
   be harder to implement


   - A client example with full fledged error handling would be very
useful
   for starters. I'm not sure if http://github.com/sgroschupf/zkclient and
   http://code.google.com/p/cages/ have everything but they do look
   promising. Plain ZK API is a bit overwhelming :)


Thanks,
Ashwin.


New To Puppet - Two Questions
New to Puppet, heard about it for the first time at OSCON.

Two quick questions:

1.  Is there a web interface?   This is really key to our company
since we have some dev/ops people but also some customer service
people (not command-line savvy) who need to do things.

2.  Does it just manage server configuration or could I write custom
extensions or modules to do things like list all of our customers who
have accounts on a server, add/remove customers from our database,
enable/disable logins to our web app, etc?   These would be more like
"business operations" not "it/server management operations".

-K.R.






Questions in ForEachSupport
In bug 45197 https://issues.apache.org/bugzilla/show_bug.cgi?id=45197 Henri
wrote:
* Look at questions in the length method in ForEachSupport.java
* Look at commented out code in prepre in ForEachSupport.java

By "in the length method" did you mean line #241 where the length gets set
to 0? That's different to the non-deferred case which throws an exception
on line #402. I'd suggest we do the same for both (throwing the exception).

The commented code in prepare that sets the itemsName EL variable is
redundant as that is also done in LoopTagSupport line #532 (assuming the
Apache implementation of the JSTL API). Looks like this can just be
deleted.

if that sounds good, I'll contribute a patch for those changes.
Cheers
Jeremy

newbie questions
hi 
I have few questions on hive and its use case.

1. hive-on-hadoop-20 accessing/processing data stored on hadoop-18-dfs
    The actual files are on hadoop-18 dfs and then I will create external
table on hive-on-hadoop-20 with files pointing to hadoop-18-dfs.
    I don't think this is possible , given hadoop version incompatibility;
but never hurts to ask


2. We download tons of urls and massage the data. The massaging goes thru
various stages. We would like to monitor these stages 
   so I was thinking on doing a schema like following 
	One Table :
	
	url as STRING,
	
        massage_step1 is a STRUCT 
        massage_step2 is a STRUCT
	.
	.
	feature_set  is ARRAY<STRING>

      The STRUCT can have arrays on longs, ids, timestamps,
success/failure, reasons


     Assuming tht I am correct track here :
	will I able to run queries like :
		q1. where massage_step1.reasons like '%Failed on fetching%'
		q2. where feature_set like 'shopping'
			(feature set is an array, I think I have to implement a UDFLike for
Arrays)
		q3. where massage_step2.ids < 10K 

		q4. where count(*)  as count  where timestamps < 'SOME_DATE'  group
by massage_step1.success = true
		
	In short , can I query on data in the complex types like Struct, Array,
Map etc

3. Some of queries will require data from 2 or more structs and some wont.

    In the above example, I keeping it one table (external table). The
other option is multiple tables: one for each massage_step.
		
    In case of multiple tables, I will have to fire JOIN queries and in
case of single table , I will filter data using where clause

   What is expensive: JOIN queries or filtering data using where clause ? 


Feedback is greatly appreciated

Thanks,
Sagar
	
   
	



Questions about WS-RM behavior in CXF
Hi CXF-experts,

Some days ago, I asked my question about the expected behavior of the
WS-RM client. I am confused that the network exception is passed back
to the client application, while the RetransmissionInterceptor on the
client keeps retransmitting the message. I expected that the exception
would be handled by one of the RM interceptors so that the client
would not get the network exception directly.
(my original question
http://mail-archives.apache.org/mod_m### @mail.gmail.com>)

Looking further into the code today, I see that the exception is
passed back to the client because it is set both in message's content
and in its exchange map at the PhaseInterceptorChain's doIntercept
method below:

                        message.setContent(Exception.class, ex);
                        boolean isOneWay = false;
                        if (message.getExchange() != null) {
                            message.getExchange().put(Exception.class,
ex);
                            isOneWay = message.getExchange().isOneWay();
                        }
                        unwind(message);

Later, the existence of the exception object is checked by the
ClientImpl class in its processResult method and if found, this throws
this exception to the client.

Here I have two questions.

1. why does my oneway method initiated by port.greetMeOneWay method of
the demo sample invoke the ClientImpl that calls the processResult
method? Isn't the processResult method only needed for
request-response processing?

2. If the processResult method is called for oneway calls, should the
RetransmissionInterceptor's handleFault method remove the exception
object from the message and its exchange map? If this is done, the
exception would not be removed during the unwind method of the
doInterceptor method above and the original exception would not be
forwarded to the client application. This seems to work for me.

I would appreciate if someone can answer my questions.

Thanks.

Regards, aki


Questions about data modeling
I'm currently trying to wrap my head around Cassandra which is definitely
not easy for a mind deeply entrenched in SQL :)

I see how blogs/tweets etc. can be modeled in Cassandra. However, I have a
sightly different problem.

Let's say we let the user see a random
item(article/picture/recipe/you-name-it) and vote for it. We should show
the most popular items, the last articles/pictures the user has voted for
etc.

1. How can I show the most popular items?
2. How can I present the user with a random item he hasn't seen yet?

For the first question I figured I could have a <ColumnFamily
CompareWith="LongType"  Name="Rating"/> and store lists of items per
each rating, updating them as necessary. Can't figure out a way to
correctly implement question number 2.

XMPP Component Questions
Greetings,

I'll explain my problem shortly, i'm currently working on a project that
uses apache servicemix and apache camel for routing messages, now i'm
working on the XMPP-related stuff and i have some problems (please forgive
my slightly incompetence)

1) configuring the route in the usual method i am unable to specify
different destinations (i.e. in my project i should be able to retrieve
messages from an XMPP port (and it goes well with the URI
from("xmpp://XMPPConsumer?password=xxxx") , note that i'll use two
components (a consumer named systemrx and a producer named systemtx)
because
i have choice about this (it is an academic project))

the problem is when i want to send the message to the correct address
because the route is builded when camel context starts (for example
setting
a uri like this //.to("xmpp://XMPPProducer/SomeReceiver?password=xxxx");
Specifing the receiver it works well, but i want to change the receiver,
now
i have  two possibile solutions

1) Create a route each time i receive a message, (i receive the message,
extract the destination from message body, istantiate the component an
then
send)

2) have a dynamic route like this (is this possibile?)
from("xmpp://XMPPConsumer?password=xxxx").process(SomeProcess).toF("xmpp://XMPPConsumer/%s?password=xxxx",destination);
(destination should be setted in SomeProcess)
(reading the manual i discovered that XMPP component supports headers only
for In Messages, (i suggest that setting a "participant" header for the
outgoing message could do the trick but i don't know if this is possible)

otherwise are possibile any other solutions that i don't know? 
thanks for advice, any help would be greatly appreciated

Alessandro



A couple questions on the RPC spec.
This will be my last question for at least the next day or two. :) I just
want to double check my interpretation of the message framing. Assuming
that
the client and server have already gone through their handshake. Does this
sound right for a request/response on the wire?

Request:

4 byte length

map of bytes - request metadata

4 byte length

string - message name

for each parameter {

4 byte length

parameter bytes

}

null-terminate

Response:

4 byte length

map of bytes - request metadata

1 byte - 0 or 1 for success

if(false){

4 byte length

bytes containing response

}

else

{

4 byte length

bytes containing error
}


SocketTransceiver seems to support this but it also has this comment
stating
it's not standard.

/** A socket-based {@link Transceiver} implementation.  This uses a
simple,
 * non-standard wire protocol and is not intended for production services.
*/



A *call* consists of a request message paired with its resulting response
or
error message. Requests and responses contain extensible metadata, and
both
kinds of messages are framed as described above.

The format of a call request is:

   - *request metadata*, a map with values of type bytes
   - the *message name*, an Avro string, followed by
   - the message *parameters*. Parameters are serialized according to the
   message's request declaration.

The format of a call response is:

   - *response metadata*, a map with values of type bytes
   - a one-byte *error flag* boolean, followed by either:
      - if the error flag is false, the message *response*, serialized per
      the message's response schema.
      - if the error flag is true, the *error*, serialized per the
message's
      error union schema.


hbase evaluation questions
I am trying to evaluate hbase to be used as an analytical data store, and I
have a few questions I have not been able to answer from the wiki or
googling in general.

1) How can hbase be configured for a multi-tenancy model? What are the
options to create a solid separation of data? In a relational database
schemas would provide this and in cassandra the keyspace can provide the
same. Of course we can add the tenancy key to the row key and create
tenant
specific tables/column families but that does not provide the same level
of
confidence of separation. We could also create separate clusters for each
client, but then that defeats part of the point of going to a distributed
database cluster to improve overall throughput+utilization across all
clients. We currently run single MySQL databases for each of our clients
(1-3 TBs each).

2) I am trying to model data within hbase and I am unable to truly model
it
as a column based data store due to the limitations of the API
(hbase.thrift) in terms of getting back data for certain columns. I see
information for defining a bloom filter which I believe could help speed
up
the retrieval of certain columns within a large row but the API does not
seem to offer the ability to iterate through the columns. The API supports
the ability to request a list of columns but no way that I have seen to
scan
columns for a given row key based on a start/stop column. This forces us
to
create a tall data model vs. a wide data model which in the end we think
will hurt performance as more rows will be required.

The data model is a std star schema in relational terms with a time
dimension. Time is only down to the daily granularity and we would prefer
to
have this be part of the column key instead of the row key. From all
examples I have seen time has always been added to the end of the row key
to
be accessed via row scans. In Cassandra for example time is modeled as a
super column or column composite index and the API supports a range get
against a set of columns within a single row.

Any advice or pointers would be greatly appreciated. Thanks in advance!

Wayne


a questions about calling opensaml lib
Hi,
I am  writing a apache module mainly for validating saml assertion.Now,I
have already post signed assertion data to my apache server and in my
handler module ,I have also received assertion data .   But ,when I
validate assertion with opensaml lib,some questions are appeared: Opensaml
Library initialization failed. and program are staying this sentence all
the time.Why do this phenomenon  take place ??
Thanks for your consideration.
Regards.
Jia

Failover and replication questions
Hello,

I am not sure what the failover functionality does in ActiveMQ. I know 
that if you have machine A and machine B and machine A dies then anyone 
wanting to use the service will transparantly failover to using B so there

is no loss of service. But I don't know if the data held in the queues is 
replicated between A and B. Can anyone enlighten me please?

Regards,

Andrew Marlow


questions regarding bibliographic citations?
Hello:

I am employing apache commons math 2.2 in the course of my research,  
and I wonder if there is a format to cite my use of apache commons  
math? If anyone has a bibtex entry for citing apache commons projects,  
that would be most helpful.

Tanim Islam


scxml-js] a few questions on the data module
Hi,

I'm currently adding support for the data module, and I have a quick
question about the specification that I was hoping you could answer for
me.

For the <data> element, the src attribute can reference a URI
containing a
legal data value. I think that what constitutes legal data values is
defined
in the profile. The specification of the ecmascript profile seems to imply
that legal data values should be formatted as JSON. But the browser
runtime
also has good support for XML, and so I think it would be useful to be
able
to specify whether the data referenced at the URI should be handled as
JSON
or XML. The handleAs property of the Dojo toolkit's
dojo.xhrGet<http://api.dojotoolkit.org/jsdoc/dojo.xhrGet>API is a
good
example of what I have in mind for this. In the scxml
specification, is there any way to specify the way a referenced URI should
be handled?

Please let me know what you think. Thanks,

Jake


questions on documentation for configuring AJP connector
We are currently using - 

 

Tomcat - 5.5.25

JDK 1.5

IIS 6

Windows XP 64bit and 32bit machines

 

We are trying to upgrade to the latest connector. While going through the
worker properties variables to set we have few questions regarding the
following -

 

1) connection_pool_size - 

 

> Usually this is the same as the number of threads per web server
process.
(cut-paste from the description for connection_pool_size)

 

I am not familiar with IIS - so how do you determine the above?

 

> You should measure how many connections you need during peak
activity
without performance problems, and then add some percentage depending on
your
growth rate.

 

How do you determine what is a good percentage?

 

Also does this property have any correlation with the attribute MaxThreads
in the <Connector> tag of server.xml? How do you determine what
value should
you put for MaxThreads?

 

2) connection_pool_timeout - The server.xml - the default value if not
specified explicitly is 60000(60 secs). I see in our server.xml AJP
connector tag - its not specified - which means I do need to specify this
property connection_pool_timeout in our worker.properties as 60? The
documentation says the default for connection_pool_timeout is 0, shouldn't
it be 60 if this has to be in synch with server.xml?

 

3) The worker.loadbalancer.method property - currently not set - but we
are
thinking of doing as B instead of default R. What do you use in general?
Is
there a disadvantage to switching from Request to Busyness?

 

4) Question on server.xml -

 

maxSpareThreads 

maxThreads 

minSpareThreads 

 

What are the criteria to select appropriate values? For production servers
-
how do you determine the values to set?

Is there a correlation between the values for above(maxSpareThreads,
maxThreads, minSpareThreads)

 - for example - does the maxSpareThreads have to be certain % of
maxThreads?

 

Thank you for reading the question.

 

Regard,

Rumpa Giri