Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Random errors in AIX-Zlinux clients

Puppet 2.6.1rc2 eats up a lot of memory
(21 lines)
Re: Using FACTER in other ruby code
(21 lines)
Sep 2, 2010
Héctor Rivas Gándara
Héctor Rivas Gándara
>>> The errors appear once or two times, randomly, and if I
execute the
>>> client again it works properly. They are usually these
kind of errors:
>>  You will probably find life improved by deploying
>> another mode of operation for the puppetmaster daemon, and
using a real
>> database (I use PostgreSQL) for storedconfigs.
>
> You are right, I am using both of them... but I thougth that It
could
> scale to 20 clients... I will setup mysql+mongrel.
> I will tell you if this solves the problem.
 I just configured mysql for storeconfigs and Apache+mongrel. ns:

Bad luck, it is still failing with random errors, but more related to
the platform:

On a Suse ZLinux (s390x) installation (and a pair of times on AIX
servers) I get this error related to MySQL: "MySQL server has gone
away". It is always with the same query, but with Linux x86 clients it
does not fail :?. I have 1 zLinux, 2 AIX and 5 Linux-x86 clients:

Thu Sep 02 07:49:06 +0200 2010 //zlinux.myhost.com/Puppet (err): Could
not retrieve catalog from remote server: Error 400 on SERVER:
Puppet::Parser::Compiler failed with error
ActiveRecord::StatementInvalid: Mysql::Error: MySQL server has gone
away: SELECT * FROM `hosts`     WHERE (`hosts`.`name` =
'zlinux.myhost.com')  LIMIT 1 on node zlinux.myhost.com
Thu Sep 02 07:49:06 +0200 2010 //zlinux.myhost.com/Puppet (err):
Cached catalog for zlinux.myhost.com failed: Could not parse YAML data
for catalog zlinux.myhost.com: allocator undefined for Proc Thu Sep 02
07:49:06 +0200 2010 //zlinux.myhost.com/Puppet (err): Could not
retrieve catalog; skipping run

And I still get these random errors on AIX. I think that it might be
caused by a bug in ruby, maybe related to pthreads library :?


Thu Sep 02 07:27:12 +0200 2010 //aix1.myhost.com/Puppet (err): Could
not retrieve catalog from remote server: method `include?' called on
terminated object (0x2002e848) Thu Sep 02 07:27:12 +0200 2010
//aix1.myhost.com/Puppet (err): Cached catalog for aix1.myhost.com
failed: Could not parse YAML data for catalog aix1.myhost.com:
allocator undefined for Proc Thu Sep 02 07:27:12 +0200 2010
//aix1.myhost.com/Puppet (err): Could not retrieve catalog; skipping
run

Thu Sep 02 08:18:16 +0200 2010
//aix2.myhost.com//Stage[main]/Cgx_unixserver::Ad_integration::Aix::Secldapclntd/File[/usr/lib/libibmldap.a]
(err): Could not evaluate: undefined method `inject' for
false:FalseClass

Thu Sep 02 04:57:11 +0200 2010 //aix1.myhost.com/Puppet (err): Could
not retrieve catalog from remote server: undefined method `reference'
for 0:Fixnum Thu Sep 02 04:57:11 +0200 2010 //aix1.myhost.com/Puppet
(err): Could not retrieve catalog; skipping run

Thu Sep 02 02:48:08 +0200 2010 //aix2.myhost.com/Puppet (err): Could
not retrieve catalog from remote server: Could not intern from pson:
expected ',' or ']' in array at '{"exported":false,"l'!
Thu Sep 02 02:48:08 +0200 2010 //aix2.myhost.com/Puppet (err): Cached
catalog for aix2.myhost.com failed: Could not parse YAML data for
catalog aix2.myhost.com: allocator undefined for Proc Thu Sep 02
02:48:08 +0200 2010 //aix2.myhost.com/Puppet (err): Could not retrieve
catalog; skipping run

Wed Sep 01 20:48:14 +0200 2010 //aix2.myhost.com/Puppet (err): Could
not retrieve catalog from remote server: Could not intern from pson:
Could not convert from pson: Could not find relationship target
"Exec[reload-aliases]"
Wed Sep 01 20:48:14 +0200 2010 //aix2.myhost.com/Puppet (err): Cached
catalog for aix2.myhost.com failed: Could not parse YAML data for
catalog aix2.myhost.com: allocator undefined for Proc Wed Sep 01
20:48:14 +0200 2010 //aix2.myhost.com/Puppet (err): Could not retrieve
catalog; skipping run

Wed Sep 01 18:27:13 +0200 2010 //aix1.myhost.com/Puppet (err): Could
not save yaml aix1.myhost.com: class or module required

Wed Sep 01 15:48:12 +0200 2010 //aix2.myhost.com/Puppet (err): Got an
uncaught exception of type NoMethodError: undefined method `[]' for
false:FalseClass

Wed Sep 01 15:18:12 +0200 2010 //aix2.myhost.com/Puppet (err): Got an
uncaught exception of type NoMethodError: undefined method `merge' for
false:FalseClass

Wed Sep 01 14:00:22 +0200 2010
//aix1.myhost.com//Stage[main]/Cgx_unixserver::Srv_tree/File[/srv]
(err): Failed to generate additional resources using 'eval_generate':
Invalid parameter 0(0) at
/cgx1/puppet/data/test/modules/stow/manifests/package.pp:61









Reply
Tags: errorsimprovedlife
Messages in this thread
Random errors in AIX-Zlinux clients
reply Re: Random errors in AIX-Zlinux clients
(27 lines) Sep 3, 2010 06:06
Similar Threads
How report errors and random errors in clients
Hello,

I am using puppet 2.6.1rc3 in a test environment with AIX, Suse,
Debians. Right now there are 5 clients...  I am running puppet from
cron each 30m (using random minute per host).  I have prepared a
configuration ready to deploy puppet in all our infrastructure.

If I deploy and use it everywhere, need to known each error. That is
my first question How is the best way to report failures in puppet
configuration?

I was using the email aproach, but I am having lots of random errors
in clients that make the puppet runs fail and send a report error. For
5 hosts I am receiving around 150 emails/day. I do not known if this
is normal.

The errors appear once or two times, randomly, and if I execute the
client again it works properly. They are usually these kind of errors:


Thu Aug 26 18:05:10 +0200 2010
//puppetclient.myhost.com//Stage[main]/Cgx_unixserver::Ad_integration::Debian/Cgx_unixserver::Ad_integration::Debian::Pam_file[common-password]/File[/etc/pam.d/common-password]
(err): Could not evaluate: SSL_connect SYSCALL returned=5 errno=0
state=SSLv2/v3 read server hello A Could not retrieve file metadata
for
puppet:///modules/cgx_unixserver/Linux.ad_integration/debian.pam.d/common-password:
SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server
hello A at
/cgx1/puppet/data/development/services/cgx_unixserver/manifests/ad_integration/debian.pp:24

Thu Aug 26 17:49:12 +0200 2010
//puppetclient.myhost.com//Stage[main]/Monit::Base/File[/srv/monit/monit/monit.d]
(err): Failed to generate additional resources using 'eval_generate':
end of file reached

Mon Aug 30 10:36:45 +0200 2010 //puppetclient.myhost.com/Puppet (err):
Could not retrieve catalog from remote server: execution expired Mon
Aug 30 10:36:45 +0200 2010 //puppetclient.myhost.com/Puppet (err):
Cached catalog for puppetclient.myhost.com failed: Could not parse
YAML data for catalog puppetclient.myhost.com: allocator undefined for
Proc Mon Aug 30 10:36:45 +0200 2010 //puppetclient.myhost.com/Puppet
(err): Could not retrieve catalog; skipping run

Mon Aug 30 09:57:59 +0200 2010 //puppetclient.myhost.com/Puppet (err):
Could not retrieve catalog from remote server: Error 400 on SERVER:
SQLite3::BusyException: database is locked: DELETE FROM "fact_values"
WHERE ("id" IN
(205405,205406,205407,205408,205409,205410,205411,205412,205413,205414,205415,205416,205417,205418,205419,205420,205421,205422,205423,205424,205425,205426,205427,205428,205429,205430,205431,205432,205433,205434,205435,205436,205437,205438,205439,205440,205441))
Mon Aug 30 09:57:59 +0200 2010 //puppetclient.myhost.com/Puppet (err):
Cached catalog for puppetclient.myhost.com failed: Could not parse
YAML data for catalog puppetclient.myhost.com: allocator undefined for
Proc Mon Aug 30 09:57:59 +0200 2010 //puppetclient.myhost.com/Puppet
(err): Could not retrieve catalog; skipping run

Mon Aug 30 12:57:18 +0200 2010 //puppetclient.myhost.com/Puppet (err):
Could not save yaml puppetclient.myhost.com: class or module required


I also get sometimes some errors from cron output like these:


Could not run: method `directory?' called on terminated object
(0x2005051c)

/usr/local/lib/ruby/site_ruby/1.8/puppet/util/zaml.rb:243: [BUG]
Segmentation fault ruby 1.8.7 (2009-06-12 patchlevel 174) [rs6000-aix]
/srv/scripts/puppet/puppet.ctl.sh: line 117: 319712 IOT/Abort trap
     (core dumped) $PUPPETD ${PUPPET_OPTS} ${PUPPET_EXTRA_OPTS}
--onetime --no-daemonize --verbose


Is this normal? How is the best way to monitor all the puppet network?







Random file generation
I am running Tomcat 6.0.26 on a Solaris 10 system.  The Tomcat server is
configured to listen to HTTPS communications on port 8443.  When browsing
to the Tomcat server remotely using the following syntax everything works
as expected:
https://10.10.10.10:8443/

If however we accidentally leave out the "s" in https like this:
http://10.10.10.10:8443/

The Tomcat server responds with a 7 byte .exe file to download.  Each time
we make the request again it generates a new .exe file with a different
name (cd64dni2.exe or z0v8671g.exe for example).  The exe fail fails to
execute on a windows system.   The contents of all of the exe files are
exactly the same (binary data)

If I run an od on the file I get the following:
$od cd64dni2.exe
0000000 001425 000001 001002 000012
0000007


Can anyone explain what this file is and why it is getting generated?


Random testing feasible?
Hi,

I have a test plan with about 20 pages. I have set up 250 threads for 
testing. I like to know if I can perform random testing in the following 
manner:

1. put the urls for the  20 pages in a file
2. have 250 threads randomly go to one of the urls on the file 

Thank you.


Patty

RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
I took this approach... reject the first result of subsequent
get_range_slice requests. If you look back at output I posted (below)
you'll notice that not all of the 30 keys [key1...key30] get listed! The
iteration dies and can't proceed past key2.

1) 1st batch gets 10 unique keys.
2) 2nd batch only gets 9 unique keys with the 1st being a repeat
3) 3rd batch only get 2 unqiue keys ""

That means the iteration didn't see 9 keys in the CF. Key7 and Key30 are
missing for example.

[junit] Query w/ Range(,,10) result size: 10 
[junit] key18 
[junit] key23 
[junit] key26 
[junit] key27 
[junit] key12 
[junit] key28 
[junit] key4 
[junit] key3 
[junit] key1 
[junit] key24 
[junit] Query w/ Range(key24,,10) result size: 10 
[junit] key24 
[junit] key5 
[junit] key17 
[junit] key29 
[junit] key19 
[junit] key8 
[junit] key15 
[junit] key22 
[junit] key6 
[junit] key25 
[junit] Query w/ Range(key25,,10) result size: 3 
[junit] key25 
[junit] key14 
[junit] key2 
[junit] Query w/ Range(key2,,10), result size: 1 
[junit] key2

-Adam

-----Original Message-----
From: sco### @scode.org on behalf of Peter Schuller
Sent: Fri 8/6/2010 6:43 PM
To: us### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
 I think this is actually the expected result, whenever you are using
 range_slices with start_key/end_key you must increment the last key
 you received and then use that in the next slice start_key. I also
 tried to use token because of exactly that behaviour and the doc
 talking about inclusive/exclusive.

Another way to do it is to filter results to exclude columns received
twice due to being on iteration end points.

This is useful because it is not always possible to increment or
decrement (depending on iteration order) a column name (for example,
in the case of byte strings, because there is no defined maximum
possible length so the lexicographically "previous" column name might
be infinitely long).








RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
Hi Thomas,

Can you share your client code for the iteration? It would probably help
me catch my problem. Anyone know where in the cassandra source the
integration tests are for this functionality on the random partitioner?

Note that I posted a specific example where the iteration failed and I was
not throwing out good keys only duplicate ones. That means 1 of 2 things:

1) I'm somehow using the API incorrectly
2) I am the only one encountering a bug

My money is on 1) of course.  I can check the thrift API against what my
Scala client is calling under the hood.

-Adam


-----Original Message-----
From: th.he### @gmail.com on behalf of Thomas Heller
Sent: Fri 8/6/2010 7:17 PM
To: use### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain
<adam.c### @greenenergycorp.com> wrote:
 I took this approach... reject the first result of subsequent
get_range_slice requests. If you look back at output I posted (below)
you'll notice that not all of the 30 keys [key1...key30] get listed! The
iteration dies and can't proceed past key2.

 1) 1st batch gets 10 unique keys.
 2) 2nd batch only gets 9 unique keys with the 1st being a repeat
 3) 3rd batch only get 2 unqiue keys ""

 That means the iteration didn't see 9 keys in the CF. Key7 and Key30
are missing for example.


Remember the returned results are NOT sorted, so you whenever you are
dropping the first by default, you might be dropping a good one. At
least that would be my guess here.

I have iteration implemented in my client and everything is working as
expected and so far I never had duplicates (running 0.6.3). I'm using
tokens for range_slices tho, increment/decrement for get_slice only.

/thomas






Gaussian Random Timers Additive?
Hello All,

I want to know for certain if the Gaussian Timers are additive. 

Example1:

Thread Group
 - Timer1 (Ave 10s, deviation 1s)
 - Loop1
    - Sampler 1
 - Loop2
    - Timer2 (Ave 5s, deviation 4s)
    - Sampler2

So the Sampler1 requests will an average of 10 seconds apart plus or minus
1 second (so within 9-11 seconds)

For Sampler2 I hope the requests will be 10+5 = 15 seconds apart plus or
minus 1+4 = 5 seconds (so within 10-20 seconds).

Is this correct?

Example2:
Thread Group
- Loop1
    - Timer1 (Ave 10s, deviation 1s)
    - Sampler 1
 - Loop2
    - Timer2 (Ave 5s, deviation 4s)
    - Sampler2

In this case the timers wont affect each other?

So Sampler1 should still be 10 +- 1 seconds, but Sampler2 should be 5 +- 4
seconds. Is this correct?



Mit freundlichen Grüßen
Jörg Godau

SCHÜTZE Consulting Informationssysteme GmbH Argentinische Allee 22b
14163 Berlin
Tel.: 030/ 802 49 44
Fax: 030/ 8090 39 95
www.schuetze-berlin.de

Geschäftsführer: Klaus-Dieter Schütze
Registergericht: Amtsgericht Charlottenburg
Registernummer: HRB 73618
Umsatzsteuer-Identifikationsnummer gemäß § 27a Umsatzsteuergesetz: DE
813181239




improving random read performance
I am currently running under Hbasev0.20.3. I increased the block cache to
.4 from .2. The heap size is 2GB.
The default regionserver handler count is 25 in hbase-default.xml. Will
try LZO compression.

What are other performance tunings I can do ?
In particular, will applying HBASE-2180 or upgrading to a newer version
help ?

Another thing I notice is that the performance via stargate vs Java api is
comparable for random reads.I thought stargate would have some latency. Is
this expected?
Thanks,
Avani



Re: selecting a random subset of a view
I find it a really great idea. If it can be set per design doc, of course.

Mickael

----- Mail Original -----
De: "J Chris Anderson" <jch### @apache.org>
À: us### @couchdb.apache.org
Envoyé: Lundi 28 Juin 2010 21h34:16 GMT +01:00 Amsterdam / Berlin / Berne
/ Rome / Stockholm / Vienne
Objet: Re: selecting a random subset of a view


On Jun 28, 2010, at 12:28 PM, mickael### @free.fr wrote:

 It's an error to do this if I read the doc correctly, because shows
and lists functions should be idempotent. It's specified in the wiki.
 

It would be fine to have a config option to suppress the Etags on show and
list. Then you'd have no risk of improperly caching the output here. For
the time being, you can probably configure an http proxy to ignore and
strip the etags.

Chris

 Mickael
 
 ----- Mail Original -----
 De: "Jan Prieser" <j.pri### @hotornot.de>
 À: us### @couchdb.apache.org
 Envoyé: Lundi 28 Juin 2010 18h37:38 GMT +01:00 Amsterdam / Berlin /
Berne / Rome / Stockholm / Vienne
 Objet: Re: selecting a random subset of a view
 
 hi Mickael,
 
 i've had the same problem and used the lists-feature from couch.
 
 my list-function looks like this:
 
 function(head,req) {
   function shuffle(ary) {
     function randOrd(){ return Math.round(Math.random()) - 0.5; }
     ary.sort( randOrd );
   }
   body={};
   eval('body='+req.body);
   out = head;
   out.rows = [];
   if(out.total_rows > out.offset) {
     while (row = getRow()) {
         out.rows.push(row);
     }
   }
   shuffle(out.rows);
   if (body && body.rlimit) {
     out.rows = out.rows.slice(-body.rlimit);
   }
   return toJSON(out) + '\n';
 }
 
 maybe you could use a range with startkey and endkey, if the number
of 
 rows is to big. I didn't test the performance with bigger datasets.
 
 
 Am 28.06.2010 15:29, schrieb mickael.### @free.fr:
> Hello couchers,
> 
> how would you do to select a random subset of a view result (a
simple view with map only).
> 
> Example (I don't write the full view response array for clarity)
> 
> When called normally, my view returns :
> 
> {
> ...
> rows: [
> {id: aa1},
> {id: aa2},
> {id: aa3},
> {id: aa4},
> {id: aa5},
> {id: aa6},
> {id: aa7},
> {id: aa8},
> {id: aa9}
> ]
> }
> 
> And I want only three of those rows, randomly chosen. So I launch
the magic "get three random rows" feature, and it gives me :
> 
> {
> ...
> rows: [
> {id: aa5},
> {id: aa3},
> {id: aa6}
> ]
> }
> 
> The second time I launch the same magic "get three random rows" I
got:
> {
> ...
> rows: [
> {id: aa7},
> {id: aa1},
> {id: aa5}
> ]
> }
> 
> Thanks for your advices
> 
> Mickael



selecting a random subset of a view
Hello couchers,

how would you do to select a random subset of a view result (a simple view
with map only).

Example (I don't write the full view response array for clarity)

When called normally, my view returns :

{
...
rows: [
{id: aa1},
{id: aa2},
{id: aa3},
{id: aa4},
{id: aa5},
{id: aa6},
{id: aa7},
{id: aa8},
{id: aa9}
]
}

And I want only three of those rows, randomly chosen. So I launch the
magic "get three random rows" feature, and it gives me :

{
...
rows: [
{id: aa5},
{id: aa3},
{id: aa6}
]
}

The second time I launch the same magic "get three random rows" I got:
{
...
rows: [
{id: aa7},
{id: aa1},
{id: aa5}
]
}

Thanks for your advices

Mickael


Re: selecting a random subset of a view
It's an error to do this if I read the doc correctly, because shows and
lists functions should be idempotent. It's specified in the wiki.

Mickael

----- Mail Original -----
De: "Jan Prieser" <j.pr### @hotornot.de>
À: use### @couchdb.apache.org
Envoyé: Lundi 28 Juin 2010 18h37:38 GMT +01:00 Amsterdam / Berlin / Berne
/ Rome / Stockholm / Vienne
Objet: Re: selecting a random subset of a view

hi Mickael,

i've had the same problem and used the lists-feature from couch.

my list-function looks like this:

function(head,req) {
   function shuffle(ary) {
     function randOrd(){ return Math.round(Math.random()) - 0.5; }
     ary.sort( randOrd );
   }
   body={};
   eval('body='+req.body);
   out = head;
   out.rows = [];
   if(out.total_rows > out.offset) {
     while (row = getRow()) {
         out.rows.push(row);
     }
   }
   shuffle(out.rows);
   if (body && body.rlimit) {
     out.rows = out.rows.slice(-body.rlimit);
   }
   return toJSON(out) + '\n';
}

maybe you could use a range with startkey and endkey, if the number of 
rows is to big. I didn't test the performance with bigger datasets.


Am 28.06.2010 15:29, schrieb mickael.### @free.fr:
 Hello couchers,

 how would you do to select a random subset of a view result (a simple
view with map only).

 Example (I don't write the full view response array for clarity)

 When called normally, my view returns :

 {
 ...
 rows: [
 {id: aa1},
 {id: aa2},
 {id: aa3},
 {id: aa4},
 {id: aa5},
 {id: aa6},
 {id: aa7},
 {id: aa8},
 {id: aa9}
 ]
 }

 And I want only three of those rows, randomly chosen. So I launch the
magic "get three random rows" feature, and it gives me :

 {
 ...
 rows: [
 {id: aa5},
 {id: aa3},
 {id: aa6}
 ]
 }

 The second time I launch the same magic "get three random rows" I
got:
 {
 ...
 rows: [
 {id: aa7},
 {id: aa1},
 {id: aa5}
 ]
 }

 Thanks for your advices

 Mickael


Re: selecting a random subset of a view
Thanks for all your answers

I forgot to tell I try to avoid the "skip" param for performances reasons.

The first method Ian suggest imply to read all my doc ids... In this case
I can easily use the client random function (client is PHP). But reading
all doc ids is not really performance friendly.

The second method Ian speak about, is the one that others (Sebastian,
Robert) are proposing : some random parameter in the document. But as Ian
tells us, "If you use the MD5 or SHA function, then the order will be
repeatable if 
the data is not changed."... not so random :-)

My app is a music jukebox, I want to provide a way to fill the playlist
with random songs. Each song document have an _id composed of 'aa'+ a
random sha1 .

I scratched my head already but I think the random feature should be a
couchdb server feature, and can't be implemented client-side, but by
reading all documents ids and using client-side random function, which is,
once again, not really performance-friendly.

Any idea welcome...

Mickael

----- Mail Original -----
De: "Ian Hobson" <i### @ianhobson.co.uk>
À: us### @couchdb.apache.org
Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne
/ Rome / Stockholm / Vienne
Objet: Re: selecting a random subset of a view

On 28/06/2010 14:29, mickael### @free.fr wrote:
 Hello couchers,

 how would you do to select a random subset of a view result (a simple
view with map only).

    
Hi Mickael,

You want a random sample of predetermined size, from a list that you can 
only (best) access randomly. To do this you must know the number of 
records in the database.

Note - I know the stats side MUCH better than the couchdb so this might 
not be implementable.

Here are two methods.

Method 1.

Say, for example, you want 3 from 11.

Take the first index with a probability of 3/11  by computing a random 
number in range 0 to 1, and taking the record if rand < 3/11 (using
real 
math, not integer).

If you take that record, adjust the number required down by 1.
Reduce the number remaining by 1.

Take the next record with a probability or 2/10 or 3/10

Continue in like manner until you either

a) Take the last record with a probability of 1/1 or
b) Have all you want, and take the remaining records with a probability 
of 0/n

To do this with couchdb I would read all the IDs into the client and 
filter them there, and then read each records separately.

Method 2 -

Allocate a random number to each record (from a large range - we don't 
want duplicates). This could be a sha or MD5 of the actual data.
Sort by the random number allocated.
Read the first N records that you need.

I think this sort of index could be set up on the server, so the client 
needs only create the index, and read the first N records and remove the 
index. The work on the server will be much greater that method 1 though.

If you use the MD5 or SHA function, then the order will be repeatable if 
the data is not changed.

Regards

Ian



RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
David,

This much like the behavior I saw... I thought that I might be doing
something wrong, but I haven't had the time to check out other clients
iteration implementations. What client are you using?

-Adam


-----Original Message-----
From: David McIntosh [mailto:dav### @radiotime.com]
Sent: Thu 8/12/2010 6:51 PM
To: us### @cassandra.apache.org
Subject: RE: error using get_range_slice with random partitioner
 
I'm also seeing an issue with not being able to iterate over all keys in
Cassandra 0.6.4.  In my unit test I create 20 keys (0-19) and iterate with
a
batch size of 6.  This is what I get.

 

Cassandra 0.6.4

start key: ""

9, 14, 4, 15, 11, 18

start key: 18

18, 7, 17, 7, 17

start key:17

17

 

Cassandra 0.6.3

start key: ""

3, 6, 5, 19, 10, 0

start key: 0

0, 8, 2, 16, 13, 1

start key: 1

1, 12, 9, 14, 4, 15

start key: 15

15, 11, 15, 11, 18, 7

start key: 7

7, 17, 7, 17

 

In both versions I get duplicates but in 0.6.4 I don't get the complete
set
of keys back.  The complete set is returned in 0.6.3.




RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
I ran against the 0.6 branch I still see similarly odd results. My test
cases prove that set of keys have been successfully inserted, but usually I
never see the first key again or I reach the first key before having seen
all of the keys.

-Adam



-----Original Message-----
From: Jeremy Hanna [mailto:jeremy.ha### @gmail.com]
Sent: Fri 8/6/2010 4:25 PM
To: use### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
If you're willing to try it out, the easiest way to check to see if it is
resolved by the patch for CASSANDRA-1145, you could checkout the 0.6
branch:

svn checkout
http://svn.apache.org/repos/asf/cassa...es/cassandra-0.6/
cassandra-0.6

Then run `ant` to build the binaries.

On Aug 6, 2010, at 2:57 PM, Adam Crain wrote:

 Hi Jeremy,
 
 So, I fixed my client so it preserves the ordering and I get results
that may be related to the bug.
 
 If I insert 30 keys into the random partitioner with names [key1,
key2, ... key30] and then start the iteration (with a batch size of 10) I
get the following debug output during the iteration:
 
 [junit] Query w/ Range(,,10) result size: 10
 [junit] key18
 [junit] key23
 [junit] key26
 [junit] key27
 [junit] key12
 [junit] key28
 [junit] key4
 [junit] key3
 [junit] key1
 [junit] key24
 [junit] Query w/ Range(key24,,10) result size: 10
 [junit] key24
 [junit] key5
 [junit] key17
 [junit] key29
 [junit] key19
 [junit] key8
 [junit] key15
 [junit] key22
 [junit] key6
 [junit] key25
 [junit] Query w/ Range(key25,,10) result size: 3
 [junit] key25
 [junit] key14
 [junit] key2
 [junit] Query w/ Range(key2,,10), result size: 1
 [junit] key2
 
 I never make it back around to key 18 as expected, and I never see
all of the keys.
 
 -Adam
 
 -----Original Message-----
 From: Jeremy Hanna [mailto:jeremy.ha### @gmail.com]
 Sent: Fri 8/6/2010 11:45 AM
 To: us### @cassandra.apache.org
 Subject: Re: error using get_range_slice with random partitioner
 
 Sounds like what you're seeing is in the client, but there was
another duplicate bug with get_range_slice that was recently fixed on
cassandra-0.6 branch.  It's slated for 0.6.5 which will probably be out
sometime this month, based on previous minor releases.
 
 https://issues.apache.org/jira/browse/CASSANDRA-1145
 
 On Aug 6, 2010, at 10:29 AM, Adam Crain wrote:
 
> Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA,
but I just discovered that the client I'm using mutates the order of keys
after retrieving the result with the thrift API... pretty much making key
iteration impossible. So time to fork and see if they'll fix it :(.
> 
> I'll review yours as soon as I get the client fixed that I'm
using.
> 
> Adam
> 
> 
> -----Original Message-----
> From: dave### @gmail.com on behalf of Dave Viner
> Sent: Fri 8/6/2010 11:28 AM
> To: use### @cassandra.apache.org
> Subject: Re: error using get_range_slice with random partitioner
> 
> Funny you should ask... I just went through the same exercise.
> 
> You must use Cassandra 0.6.4.  Otherwise you will get duplicate
keys.
> However, here is a snippet of perl that you can use.
> 
> our $WANTED_COLUMN_NAME = 'mycol';
> get_key_to_one_column_map('myKeySpace', 'myColFamily',
'mySuperCol', QUORUM,
> \%map);
> 
> sub get_key_to_one_column_map
> {
>   my ($keyspace, $column_family_name, $super_column_name,
> $consistency_level, $returned_keys) = @_;
> 
> 
>   my($socket, $transport, $protocol, $client, $result,
$predicate,
> $column_parent, $keyrange);
> 
>   $column_parent = new Cassandra::ColumnParent();
>   $column_parent->{'column_family'} = $column_family_name;
>   $column_parent->{'super_column'} = $super_column_name;
> 
>   $keyrange = new Cassandra::KeyRange({
>           'start_key' => '', 'end_key' => '', 'count' =>
10
>   });
> 
> 
>   $predicate = new Cassandra::SlicePredicate();
>   $predicate->{'column_names'} = [$WANTED_COLUMN_NAME];
> 
>   eval
>   {
>       $socket = new Thrift::Socket($CASSANDRA_HOST,
$CASSANDRA_PORT);
>       $transport = new Thrift::BufferedTransport($socket, 1024,
1024);
>       $protocol = new Thrift::BinaryProtocol($transport);
>       $client = new Cassandra::CassandraClient($protocol);
>       $transport->open();
> 
> 
>       my($next_start_key, $one_res, $iteration, $have_more,
$value,
> $local_count, $previous_start_key);
> 
>       $iteration = 0;
>       $have_more = 1;
>       while ($have_more == 1)
>       {
>           $iteration++;
>           $result = undef;
> 
>           $result = $client->get_range_slices($keyspace,
$column_parent,
> $predicate, $keyrange, $consistency_level);
> 
>           # on success, results is an array of objects.
> 
>           if (scalar(@$result) == 1)
>           {
>               # we only got 1 result... check to see if it's the
>               # same key as the start key... if so, we're done.
>               if ($result->[0]->{'key'} eq
$keyrange->{'start_key'})
>               {
>                   $have_more = 0;
>                   last;
>               }
>           }
> 
>           # check to see if we are starting with some value
>           # if so, we throw away the first result.
>           if ($keyrange->{'start_key'})
>           {
>               shift(@$result);
>           }
>           if (scalar(@$result) == 0)
>           {
>               $have_more = 0;
>               last;
>           }
> 
>           $previous_start_key = $keyrange->{'start_key'};
>           $local_count = 0;
> 
>           for (my $r = 0; $r < scalar(@$result); $r++)
>           {
>               $one_res = $result->[$r];
>               $next_start_key = $one_res->{'key'};
> 
>               $keyrange->{'start_key'} = $next_start_key;
> 
>               if (!exists($returned_keys->{$next_start_key}))
>               {
>                   $have_more = 1;
>                   $local_count++;
>               }
> 
> 
>               next if (scalar(@{ $one_res->{'columns'} }) ==
0);
> 
>               $value = undef;
> 
>               for (my $i = 0; $i < scalar(@{
$one_res->{'columns'} });
> $i++)
>               {
>                   if
($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq
> $WANTED_COLUMN_NAME)
>                   {
>                       $value =
> $one_res->{'columns'}->[$i]->{'column'}->{'value'};
>                       if
(!exists($returned_keys->{$next_start_key}))
>                       {
>                           $returned_keys->{$next_start_key} =
$value;
>                       }
>                       else
>                       {
>                           # NOTE: prior to Cassandra 0.6.4, the
> get_range_slices returns duplicates sometimes.
>                           #warn "Found second value for key
> [$next_start_key]  was [" . $returned_keys->{$next_start_key}
. "] now
> [$value]!";
>                       }
>                   }
>               }
>               $have_more = 1;
>           } # end results loop
> 
>           if ($keyrange->{'start_key'} eq $previous_start_key)
>           {
>               $have_more = 0;
>           }
> 
>       } # end while() loop
> 
>       $transport->close();
>   };
>   if ($@)
>   {
>       warn "Problem with Cassandra: " . Dumper($@);
>   }
> 
>   # cleanup
>   undef $client;
>   undef $protocol;
>   undef $transport;
>   undef $socket;
> }
> 
> 
> HTH
> Dave Viner
> 
> On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain
> <adam.### @greenenergycorp.com>wrote:
> 
>> Thomas,
>> 
>> That was indeed the source of the problem. I naively assumed
that the token
>> range would help me avoid retrieving duplicate rows.
>> 
>> If you iterate over the keys, how do you avoid retrieving
duplicate keys? I
>> tried this morning and I seem to get odd results. Maybe this
is just a
>> consequence of the random partitioner. I really don't care
about the order
>> of the iteration, but only each key once and that I see all
keys is
>> important.
>> 
>> -Adam
>> 
>> 
>> -----Original Message-----
>> From: th.he### @gmail.com on behalf of Thomas Heller
>> Sent: Fri 8/6/2010 7:27 AM
>> To: use### @cassandra.apache.org
>> Subject: Re: error using get_range_slice with random
partitioner
>> 
>> Wild guess here, but are you using start_token/end_token here
when you
>> should be using start_key? Looks to me like you are trying
end_token
>> = ''.
>> 
>> HTH,
>> /thomas
>> 
>> On Thursday, August 5, 2010, Adam Crain
<adam.c### @greenenergycorp.com>
>> wrote:
>>> Hi,
>>> 
>>> I'm on 0.6.4. Previous tickets in the JIRA in searching
the web indicated
>> that iterating over the keys in keyspace is possible, even
with the random
>> partitioner. This is mostly desirable in my case for testing
purposes only.
>>> 
>>> I get the following error:
>>> 
>>> [junit] Internal error processing get_range_slices
>>> [junit] org.apache.thrift.TApplicationException: Internal
error
>> processing get_range_slices
>>> 
>>> and the following server traceback:
>>> 
>>> java.lang.NumberFormatException: Zero length BigInteger
>>>       at
java.math.BigInteger.<init>(BigInteger.java:295)
>>>       at
java.math.BigInteger.<init>(BigInteger.java:467)
>>>       at
>>
org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
>>>       at
>>
org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
>>> 
>>> I am using the scala cascal client, but am sure that
get_range_slice is
>> being called with start and stop set to "".
>>> 
>>> 1) Is batch iteration possible with random partioner?
>>> 
>>> This isn't clear from the FAQ entry on the subject:
>>> 
>>> http://wiki.apache.org/cassandra/FAQ#iter_world
>>> 
>>> 2) The FAQ states that start argument should be "". What
should the end
>> argument be?
>>> 
>>> thanks!
>>> Adam
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
> 
> <winmail.dat>
 
 
 
 
 
 <winmail.dat>







RE: error using get_range_slice with random partitioner
I can. I'm using the debian distro.  I assume that all that is required is
wiping the data/commitlog directories.

If I do that, I still get the same result.

Here's my CF:

<ColumnFamily Name ="Meas" CompareWith="LongType" />

I'm using this to time series measurement data where the keys are
measurement names and the columns are Long unix epoch timestamps in
millisecs. My use case is then to do a range_slice that asks for the first
X number of rows, but only the most recent measurement by using a
descending order column predicate with a limit of 1.

I have no trouble using this predicate to retrieve columns within a
specified row, but the get_range_slice fails.

-Adam

-----Original Message-----
From: Jonathan Ellis [mailto:jbe### @gmail.com] 
Sent: Thursday, August 05, 2010 12:22 PM
To: us### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner

can you reproduce starting with a fresh install, no existing data?

On Thu, Aug 5, 2010 at 12:09 PM, Adam Crain
<adam.### @greenenergycorp.com> wrote:
 I've never changed the partitioner from the default random. Other
ideas?

 I can insert and do column queries using a single key but not range
on CF.

 -Adam

 -----Original Message-----
 From: Jonathan Ellis [mailto:jbel### @gmail.com]
 Sent: Thursday, August 05, 2010 11:33 AM
 To: us### @cassandra.apache.org
 Subject: Re: error using get_range_slice with random partitioner

 Yes, you should be able to use get_range_slices with RP.

 This stack trace looks like you changed your partitioner after the
 node already had data in it.

 On Thu, Aug 5, 2010 at 10:06 AM, Adam Crain
 <adam.### @greenenergycorp.com> wrote:
> Hi,
>
> I'm on 0.6.4. Previous tickets in the JIRA in searching the web
indicated
> that iterating over the keys in keyspace is possible, even with
the random
> partitioner. This is mostly desirable in my case for testing
purposes only.
>
> I get the following error:
>
> [junit] Internal error processing get_range_slices
> [junit] org.apache.thrift.TApplicationException: Internal error
processing
> get_range_slices
>
> and the following server traceback:
>
> java.lang.NumberFormatException: Zero length BigInteger
>         at
java.math.BigInteger.<init>(BigInteger.java:295)
>         at
java.math.BigInteger.<init>(BigInteger.java:467)
>         at
>
org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
>         at
>
org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
>
> I am using the scala cascal client, but am sure that
get_range_slice is
> being called with start and stop set to "".
>
> 1) Is batch iteration possible with random partioner?
>
> This isn't clear from the FAQ entry on the subject:
>
> http://wiki.apache.org/cassandra/FAQ#iter_world
>
> 2) The FAQ states that start argument should be "". What should
the end
> argument be?
>
> thanks!
> Adam
>
>



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com













RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
Hi Jeremy,

So, I fixed my client so it preserves the ordering and I get results that
may be related to the bug.

If I insert 30 keys into the random partitioner with names [key1, key2,
... key30] and then start the iteration (with a batch size of 10) I get the
following debug output during the iteration:

[junit] Query w/ Range(,,10) result size: 10
[junit] key18
[junit] key23
[junit] key26
[junit] key27
[junit] key12
[junit] key28
[junit] key4
[junit] key3
[junit] key1
[junit] key24
[junit] Query w/ Range(key24,,10) result size: 10
[junit] key24
[junit] key5
[junit] key17
[junit] key29
[junit] key19
[junit] key8
[junit] key15
[junit] key22
[junit] key6
[junit] key25
[junit] Query w/ Range(key25,,10) result size: 3
[junit] key25
[junit] key14
[junit] key2
[junit] Query w/ Range(key2,,10), result size: 1
[junit] key2

I never make it back around to key 18 as expected, and I never see all of
the keys.

-Adam

-----Original Message-----
From: Jeremy Hanna [mailto:jeremy.h### @gmail.com]
Sent: Fri 8/6/2010 11:45 AM
To: us### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
Sounds like what you're seeing is in the client, but there was another
duplicate bug with get_range_slice that was recently fixed on cassandra-0.6
branch.  It's slated for 0.6.5 which will probably be out sometime this
month, based on previous minor releases.

https://issues.apache.org/jira/browse/CASSANDRA-1145

On Aug 6, 2010, at 10:29 AM, Adam Crain wrote:

 Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but
I just discovered that the client I'm using mutates the order of keys after
retrieving the result with the thrift API... pretty much making key
iteration impossible. So time to fork and see if they'll fix it :(.
 
 I'll review yours as soon as I get the client fixed that I'm using.
 
 Adam
 
 
 -----Original Message-----
 From: davev### @gmail.com on behalf of Dave Viner
 Sent: Fri 8/6/2010 11:28 AM
 To: us### @cassandra.apache.org
 Subject: Re: error using get_range_slice with random partitioner
 
 Funny you should ask... I just went through the same exercise.
 
 You must use Cassandra 0.6.4.  Otherwise you will get duplicate keys.
 However, here is a snippet of perl that you can use.
 
 our $WANTED_COLUMN_NAME = 'mycol';
 get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol',
QUORUM,
 \%map);
 
 sub get_key_to_one_column_map
 {
    my ($keyspace, $column_family_name, $super_column_name,
 $consistency_level, $returned_keys) = @_;
 
 
    my($socket, $transport, $protocol, $client, $result, $predicate,
 $column_parent, $keyrange);
 
    $column_parent = new Cassandra::ColumnParent();
    $column_parent->{'column_family'} = $column_family_name;
    $column_parent->{'super_column'} = $super_column_name;
 
    $keyrange = new Cassandra::KeyRange({
            'start_key' => '', 'end_key' => '', 'count' => 10
    });
 
 
    $predicate = new Cassandra::SlicePredicate();
    $predicate->{'column_names'} = [$WANTED_COLUMN_NAME];
 
    eval
    {
        $socket = new Thrift::Socket($CASSANDRA_HOST,
$CASSANDRA_PORT);
        $transport = new Thrift::BufferedTransport($socket, 1024,
1024);
        $protocol = new Thrift::BinaryProtocol($transport);
        $client = new Cassandra::CassandraClient($protocol);
        $transport->open();
 
 
        my($next_start_key, $one_res, $iteration, $have_more, $value,
 $local_count, $previous_start_key);
 
        $iteration = 0;
        $have_more = 1;
        while ($have_more == 1)
        {
            $iteration++;
            $result = undef;
 
            $result = $client->get_range_slices($keyspace,
$column_parent,
 $predicate, $keyrange, $consistency_level);
 
            # on success, results is an array of objects.
 
            if (scalar(@$result) == 1)
            {
                # we only got 1 result... check to see if it's the
                # same key as the start key... if so, we're done.
                if ($result->[0]->{'key'} eq
$keyrange->{'start_key'})
                {
                    $have_more = 0;
                    last;
                }
            }
 
            # check to see if we are starting with some value
            # if so, we throw away the first result.
            if ($keyrange->{'start_key'})
            {
                shift(@$result);
            }
            if (scalar(@$result) == 0)
            {
                $have_more = 0;
                last;
            }
 
            $previous_start_key = $keyrange->{'start_key'};
            $local_count = 0;
 
            for (my $r = 0; $r < scalar(@$result); $r++)
            {
                $one_res = $result->[$r];
                $next_start_key = $one_res->{'key'};
 
                $keyrange->{'start_key'} = $next_start_key;
 
                if (!exists($returned_keys->{$next_start_key}))
                {
                    $have_more = 1;
                    $local_count++;
                }
 
 
                next if (scalar(@{ $one_res->{'columns'} }) == 0);
 
                $value = undef;
 
                for (my $i = 0; $i < scalar(@{
$one_res->{'columns'} });
 $i++)
                {
                    if
($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq
 $WANTED_COLUMN_NAME)
                    {
                        $value =
 $one_res->{'columns'}->[$i]->{'column'}->{'value'};
                        if
(!exists($returned_keys->{$next_start_key}))
                        {
                            $returned_keys->{$next_start_key} =
$value;
                        }
                        else
                        {
                            # NOTE: prior to Cassandra 0.6.4, the
 get_range_slices returns duplicates sometimes.
                            #warn "Found second value for key
 [$next_start_key]  was [" . $returned_keys->{$next_start_key} . "]
now
 [$value]!";
                        }
                    }
                }
                $have_more = 1;
            } # end results loop
 
            if ($keyrange->{'start_key'} eq $previous_start_key)
            {
                $have_more = 0;
            }
 
        } # end while() loop
 
        $transport->close();
    };
    if ($@)
    {
        warn "Problem with Cassandra: " . Dumper($@);
    }
 
    # cleanup
    undef $client;
    undef $protocol;
    undef $transport;
    undef $socket;
 }
 
 
 HTH
 Dave Viner
 
 On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain
 <adam.c### @greenenergycorp.com>wrote:
 
> Thomas,
> 
> That was indeed the source of the problem. I naively assumed that
the token
> range would help me avoid retrieving duplicate rows.
> 
> If you iterate over the keys, how do you avoid retrieving
duplicate keys? I
> tried this morning and I seem to get odd results. Maybe this is
just a
> consequence of the random partitioner. I really don't care about
the order
> of the iteration, but only each key once and that I see all keys
is
> important.
> 
> -Adam
> 
> 
> -----Original Message-----
> From: th.h### @gmail.com on behalf of Thomas Heller
> Sent: Fri 8/6/2010 7:27 AM
> To: use### @cassandra.apache.org
> Subject: Re: error using get_range_slice with random partitioner
> 
> Wild guess here, but are you using start_token/end_token here
when you
> should be using start_key? Looks to me like you are trying
end_token
> = ''.
> 
> HTH,
> /thomas
> 
> On Thursday, August 5, 2010, Adam Crain
<adam.c### @greenenergycorp.com>
> wrote:
>> Hi,
>> 
>> I'm on 0.6.4. Previous tickets in the JIRA in searching the
web indicated
> that iterating over the keys in keyspace is possible, even with
the random
> partitioner. This is mostly desirable in my case for testing
purposes only.
>> 
>> I get the following error:
>> 
>> [junit] Internal error processing get_range_slices
>> [junit] org.apache.thrift.TApplicationException: Internal
error
> processing get_range_slices
>> 
>> and the following server traceback:
>> 
>> java.lang.NumberFormatException: Zero length BigInteger
>>        at
java.math.BigInteger.<init>(BigInteger.java:295)
>>        at
java.math.BigInteger.<init>(BigInteger.java:467)
>>        at
>
org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
>>        at
>
org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
>> 
>> I am using the scala cascal client, but am sure that
get_range_slice is
> being called with start and stop set to "".
>> 
>> 1) Is batch iteration possible with random partioner?
>> 
>> This isn't clear from the FAQ entry on the subject:
>> 
>> http://wiki.apache.org/cassandra/FAQ#iter_world
>> 
>> 2) The FAQ states that start argument should be "". What
should the end
> argument be?
>> 
>> thanks!
>> Adam
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
 
 <winmail.dat>







error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
Hi,

I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated
that iterating over the keys in keyspace is possible, even with the random
partitioner. This is mostly desirable in my case for testing purposes only.

I get the following error:

[junit] Internal error processing get_range_slices
[junit] org.apache.thrift.TApplicationException: Internal error processing
get_range_slices

and the following server traceback:

java.lang.NumberFormatException: Zero length BigInteger
	at java.math.BigInteger.<init>(BigInteger.java:295)
	at java.math.BigInteger.<init>(BigInteger.java:467)
	at
org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
	at
org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)

I am using the scala cascal client, but am sure that get_range_slice is
being called with start and stop set to "".

1) Is batch iteration possible with random partioner?

This isn't clear from the FAQ entry on the subject:

http://wiki.apache.org/cassandra/FAQ#iter_world

2) The FAQ states that start argument should be "". What should the end
argument be?

thanks!
Adam



RE: error using get_range_slice with random partitioner
I've never changed the partitioner from the default random. Other ideas?

I can insert and do column queries using a single key but not range on CF.

-Adam

-----Original Message-----
From: Jonathan Ellis [mailto:jbe### @gmail.com] 
Sent: Thursday, August 05, 2010 11:33 AM
To: use### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner

Yes, you should be able to use get_range_slices with RP.

This stack trace looks like you changed your partitioner after the
node already had data in it.

On Thu, Aug 5, 2010 at 10:06 AM, Adam Crain
<adam.### @greenenergycorp.com> wrote:
 Hi,

 I'm on 0.6.4. Previous tickets in the JIRA in searching the web
indicated
 that iterating over the keys in keyspace is possible, even with the
random
 partitioner. This is mostly desirable in my case for testing purposes
only.

 I get the following error:

 [junit] Internal error processing get_range_slices
 [junit] org.apache.thrift.TApplicationException: Internal error
processing
 get_range_slices

 and the following server traceback:

 java.lang.NumberFormatException: Zero length BigInteger
         at
java.math.BigInteger.<init>(BigInteger.java:295)
         at
java.math.BigInteger.<init>(BigInteger.java:467)
         at

org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
         at

org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)

 I am using the scala cascal client, but am sure that get_range_slice
is
 being called with start and stop set to "".

 1) Is batch iteration possible with random partioner?

 This isn't clear from the FAQ entry on the subject:

 http://wiki.apache.org/cassandra/FAQ#iter_world

 2) The FAQ states that start argument should be "". What should the
end
 argument be?

 thanks!
 Adam











RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but I
just discovered that the client I'm using mutates the order of keys after
retrieving the result with the thrift API... pretty much making key
iteration impossible. So time to fork and see if they'll fix it :(.

I'll review yours as soon as I get the client fixed that I'm using.

Adam


-----Original Message-----
From: davev### @gmail.com on behalf of Dave Viner
Sent: Fri 8/6/2010 11:28 AM
To: use### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
Funny you should ask... I just went through the same exercise.

You must use Cassandra 0.6.4.  Otherwise you will get duplicate keys.
 However, here is a snippet of perl that you can use.

our $WANTED_COLUMN_NAME = 'mycol';
get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol',
QUORUM,
\%map);

sub get_key_to_one_column_map
{
    my ($keyspace, $column_family_name, $super_column_name,
$consistency_level, $returned_keys) = @_;


    my($socket, $transport, $protocol, $client, $result, $predicate,
$column_parent, $keyrange);

    $column_parent = new Cassandra::ColumnParent();
    $column_parent->{'column_family'} = $column_family_name;
    $column_parent->{'super_column'} = $super_column_name;

    $keyrange = new Cassandra::KeyRange({
            'start_key' => '', 'end_key' => '', 'count' => 10
    });


    $predicate = new Cassandra::SlicePredicate();
    $predicate->{'column_names'} = [$WANTED_COLUMN_NAME];

    eval
    {
        $socket = new Thrift::Socket($CASSANDRA_HOST, $CASSANDRA_PORT);
        $transport = new Thrift::BufferedTransport($socket, 1024, 1024);
        $protocol = new Thrift::BinaryProtocol($transport);
        $client = new Cassandra::CassandraClient($protocol);
        $transport->open();


        my($next_start_key, $one_res, $iteration, $have_more, $value,
$local_count, $previous_start_key);

        $iteration = 0;
        $have_more = 1;
        while ($have_more == 1)
        {
            $iteration++;
            $result = undef;

            $result = $client->get_range_slices($keyspace,
$column_parent,
$predicate, $keyrange, $consistency_level);

            # on success, results is an array of objects.

            if (scalar(@$result) == 1)
            {
                # we only got 1 result... check to see if it's the
                # same key as the start key... if so, we're done.
                if ($result->[0]->{'key'} eq
$keyrange->{'start_key'})
                {
                    $have_more = 0;
                    last;
                }
            }

            # check to see if we are starting with some value
            # if so, we throw away the first result.
            if ($keyrange->{'start_key'})
            {
                shift(@$result);
            }
            if (scalar(@$result) == 0)
            {
                $have_more = 0;
                last;
            }

            $previous_start_key = $keyrange->{'start_key'};
            $local_count = 0;

            for (my $r = 0; $r < scalar(@$result); $r++)
            {
                $one_res = $result->[$r];
                $next_start_key = $one_res->{'key'};

                $keyrange->{'start_key'} = $next_start_key;

                if (!exists($returned_keys->{$next_start_key}))
                {
                    $have_more = 1;
                    $local_count++;
                }


                next if (scalar(@{ $one_res->{'columns'} }) == 0);

                $value = undef;

                for (my $i = 0; $i < scalar(@{ $one_res->{'columns'}
});
$i++)
                {
                    if
($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq
$WANTED_COLUMN_NAME)
                    {
                        $value =
$one_res->{'columns'}->[$i]->{'column'}->{'value'};
                        if (!exists($returned_keys->{$next_start_key}))
                        {
                            $returned_keys->{$next_start_key} = $value;
                        }
                        else
                        {
                            # NOTE: prior to Cassandra 0.6.4, the
get_range_slices returns duplicates sometimes.
                            #warn "Found second value for key
[$next_start_key]  was [" . $returned_keys->{$next_start_key} . "] now
[$value]!";
                        }
                    }
                }
                $have_more = 1;
            } # end results loop

            if ($keyrange->{'start_key'} eq $previous_start_key)
            {
                $have_more = 0;
            }

        } # end while() loop

        $transport->close();
    };
    if ($@)
    {
        warn "Problem with Cassandra: " . Dumper($@);
    }

    # cleanup
    undef $client;
    undef $protocol;
    undef $transport;
    undef $socket;
}


HTH
Dave Viner

On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain
<adam.c### @greenenergycorp.com>wrote:

 Thomas,

 That was indeed the source of the problem. I naively assumed that the
token
 range would help me avoid retrieving duplicate rows.

 If you iterate over the keys, how do you avoid retrieving duplicate
keys? I
 tried this morning and I seem to get odd results. Maybe this is just
a
 consequence of the random partitioner. I really don't care about the
order
 of the iteration, but only each key once and that I see all keys is
 important.

 -Adam


 -----Original Message-----
 From: th.he### @gmail.com on behalf of Thomas Heller
 Sent: Fri 8/6/2010 7:27 AM
 To: us### @cassandra.apache.org
 Subject: Re: error using get_range_slice with random partitioner

 Wild guess here, but are you using start_token/end_token here when
you
 should be using start_key? Looks to me like you are trying end_token
 = ''.

 HTH,
 /thomas

 On Thursday, August 5, 2010, Adam Crain
<adam.### @greenenergycorp.com>
 wrote:
 > Hi,
 >
 > I'm on 0.6.4. Previous tickets in the JIRA in searching the web
indicated
 that iterating over the keys in keyspace is possible, even with the
random
 partitioner. This is mostly desirable in my case for testing purposes
only.
 >
 > I get the following error:
 >
 > [junit] Internal error processing get_range_slices
 > [junit] org.apache.thrift.TApplicationException: Internal error
 processing get_range_slices
 >
 > and the following server traceback:
 >
 > java.lang.NumberFormatException: Zero length BigInteger
 >         at
java.math.BigInteger.<init>(BigInteger.java:295)
 >         at
java.math.BigInteger.<init>(BigInteger.java:467)
 >         at

org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
 >         at

org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
 >
 > I am using the scala cascal client, but am sure that
get_range_slice is
 being called with start and stop set to "".
 >
 > 1) Is batch iteration possible with random partioner?
 >
 > This isn't clear from the FAQ entry on the subject:
 >
 > http://wiki.apache.org/cassandra/FAQ#iter_world
 >
 > 2) The FAQ states that start argument should be "". What should
the end
 argument be?
 >
 > thanks!
 > Adam
 >
 >
 >
 >
 >
 >








RE: error using get_range_slice with random partitioner
This is a multi-part message in MIME format.
Thomas,

That was indeed the source of the problem. I naively assumed that the
token range would help me avoid retrieving duplicate rows.

If you iterate over the keys, how do you avoid retrieving duplicate keys?
I tried this morning and I seem to get odd results. Maybe this is just a
consequence of the random partitioner. I really don't care about the order
of the iteration, but only each key once and that I see all keys is
important.

-Adam


-----Original Message-----
From: th.he### @gmail.com on behalf of Thomas Heller
Sent: Fri 8/6/2010 7:27 AM
To: us### @cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
Wild guess here, but are you using start_token/end_token here when you
should be using start_key? Looks to me like you are trying end_token
= ''.

HTH,
/thomas

On Thursday, August 5, 2010, Adam Crain
<adam.### @greenenergycorp.com> wrote:
 Hi,

 I'm on 0.6.4. Previous tickets in the JIRA in searching the web
indicated that iterating over the keys in keyspace is possible, even with
the random partitioner. This is mostly desirable in my case for testing
purposes only.

 I get the following error:

 [junit] Internal error processing get_range_slices
 [junit] org.apache.thrift.TApplicationException: Internal error
processing get_range_slices

 and the following server traceback:

 java.lang.NumberFormatException: Zero length BigInteger
         at
java.math.BigInteger.<init>(BigInteger.java:295)
         at
java.math.BigInteger.<init>(BigInteger.java:467)
         at
org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
         at
org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)

 I am using the scala cascal client, but am sure that get_range_slice
is being called with start and stop set to "".

 1) Is batch iteration possible with random partioner?

 This isn't clear from the FAQ entry on the subject:

 http://wiki.apache.org/cassandra/FAQ#iter_world

 2) The FAQ states that start argument should be "". What should the
end argument be?

 thanks!
 Adam












DO NOT REPLY New: Apache service random outage
https://issues.apache.org/bugzilla/show_bug.cgi?id=49817

           Summary: Apache service random outage
           Product: Tomcat 5
           Version: 5.5.0
          Platform: PC
        OS/Version: Windows Server 2003
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: Servlet & JSP API
        AssignedTo: de### @tomcat.apache.org
        ReportedBy: whit### @avaya.com


Aug 19, 2010 12:03:21 PM org.apache.tomcat.util.threads.ThreadPool logFull
SEVERE: All threads (200) are currently busy, waiting. Increase maxThreads
(200) or check the servlet status
Aug 20, 2010 4:47:20 PM org.apache.coyote.http11.Http11Protocol pause
INFO: Pausing Coyote HTTP/1.1 on http-9602
Aug 20, 2010 4:47:21 PM org.apache.catalina.core.StandardService stop
INFO: Stopping service website
Aug 20, 2010 4:47:21 PM org.apache.coyote.http11.Http11Protocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-9602


My MaxTrheads settings are set to 100 instead 200.