Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Interpreting output from mahout clusterdumper

0

73 views

Hi,

I ran a clustering test on crawled pages (more than 25K docs ; personal data set). I've done a clusterdump :

$MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-1/ --output clusteranalyze.txt

The output after running cluster dumper is shown 25 elements "VL-xxxxx {}" :

VL-24130{n=1312 c=[0:0.017, 10:0.007, 11:0.005, 14:0.017, 31:0.016, 35:0.006, 41:0.010, 43:0.008, 52:0.005, 59:0.010, 68:0.037, 72:0.056, 87:0.028, ... ] r=[0:0.442, 10:0.271, 11:0.198, 14:0.369, 31:0.421, ... ]}
...
VL-24868{n=311 c=[0:0.042, 11:0.016, 17:0.046, 72:0.014, 96:0.044, 118:0.015, 135:0.016, 195:0.017, 318:0.040, 319:0.037, 320:0.036, 330:0.030, ...] ] r=[0:0.740, 11:0.287, 17:0.576, 72:0.239, 96:0.549, 118:0.273, ...]}

How to interpret this output?

In short : I am looking for document ids which belong to a particular cluster.

What is the meaning of :

  • VL-x ?
  • n=y c=[z:z', ...]
  • r=[z'':z''', ...]

Does 0:0.017 means "0" is the document id which belongs to this cluster?

I already have read on mahout wiki-pages what CL, n, c and r means. But can someone please explain them to me better or points to a resource where it is explained a bit more in detail?

Sorry, if i am asking some stupid questions, but i am a newbie wih apache mahout and using it as part of my course assignment for clustering.

asked April 27, 2011 8:52 am CDT
posted via StackOverflow

1 Answers

0
 

I think you need to read the source code -- download from http://mahout.apache.org. VL-24130 is just a cluster identifier for a converged cluster.

answered April 28, 2011 5:37 pm CDT

Your answer

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions
Using mahout and hadoop
January 12, 2011
Help With Output in PHP
February 28, 2011
Strange PHP output
January 16, 2011
CMake output parser
March 30, 2011
Strange grep output
January 31, 2011