Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Hadoop/HBase Capacity Planning

Hadoop/HBase Capacity Planning:

After some Hadoop hardware recommendations and using Amdhal's law for Hadoop provisioning, Cloudera shares its know-how on Hadoop/HBase capacity planning covering aspects like network, memory, disk, and CPU:

Since we are talking about data, the first crucial parameter is how much disk space we need on all of the Hadoop nodes to store all of your data and what compression algorithm you are going to use to store the data. For the MapReduce components an important consideration is how much computational power you need to process the data and whether the jobs you are going to run on the cluster is CPU or I/O intensive. […] Finally, HBase is mainly memory driven and we need to consider the data access pattern in your application and how much memory you need so that the HBase nodes do not swap the data too often to the disk. Most of the written data end up in memstores before they finally end up on disk, so you should plan for more memory in write-intensive workloads like web crawling.

Hadoop/HBase capacity planning

Original title and link for this post: Hadoop/HBase Capacity Planning (published on the NoSQL blog: myNoSQL)

Source Article
Comments
0
Be the first to comment

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect
avatar
Tags: data, disk, memory, planning, hadoop
Capacity planning and Re: Handling disk-full scenarios
Jun 2, 2010
Reading some more (someone break in when I lose my clue ;-) Reading the streams page in the wiki about anticompaction, I think the best approach to take when a node gets its disks overfull, is to set the compaction thresholds to 0 on all nodes,…

HBase on Hadoop 0.21
Jul 7, 2010
Hi, I've checked the Release Notes of HDFS 0.21 and saw two fixes from hadoop- append included, other two not, but still some more that have to do with sync stuff. Is Hadoop-append for HBase made obsolete with HDFS 0.21? Thank you, Thomas Koch,…

HBASE/HADOOP Examples
Jul 2, 2010
I've found examples using the older mapred interface but not the newer mapreduce interface. I want to write a mapper that is configured to only pull out specific rows(which are the mapper's keys) and a specific column's value(which is the mapper's…

Re: Hadoop support for hbase
Jun 2, 2010
Hello folks, I created a branch for doing the append/sync support for Hadoop 0.20. You can fetch the branch via http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/ If you feel that there are some JIRAS that need to go into…

Re: Rolling out Hadoop/HBase updates
Jul 4, 2010
Hey, We're using stock CHD2 without any patches so I'm not sure if we have hdfs630 or not. For HBase we're currently on 0.20.3 and will be testing and moving to 0.20.5 soon What I did with this rollout of just config changes was take one region…

Hbase 0.89-hadoop version mismatch errors..
Aug 17, 2010
I seem to have gotten into some version mismatch issues.. When I try to start HBAse 0.89 along with Hadoop 0.20.100, Hbase fails to start up with these errors in the namenode whioch leads to exceptions in master and regionservers.. Errors in…

Deployment architecture for Hadoop, HBase & Hive recommendations?
Aug 2, 2010
Hello, We're setting up a data warehouse environment that includes Hadoop, HBase, Hive and our own in-house MR jobs. I would like with your permission to discuss the architecture we should choose for this. Today we process ~10GB of data per day.…

NoClassDefFoundError: org/apache/hadoop/hbase/rest/Main
Jul 2, 2010
I am trying to start and stop stargate rest server. I get ClassNotFoundException intermittently. I did perform these steps : ? Place the Stargate jar in either the HBase installation root directory or lib/ directories. ? Copy the jars from…

How to specify HBase cluster end-points from HBase client code in HBase 0.20.0
Jul 7, 2010
Hello, In my current application environment, I need to have two HBase clusters running in two different racks, to form a fault-tolerant group to tolerate power failure. Then I have an HBase client, which is sitting outside of these two clusters, …

ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/mapreduce/TableInputFormat
Jul 22, 2010
Hi All, This is my first mail in the apache mailing list... please bear with me as I am absolutely new to Hadoop and its family. This is my question... I have some data on my hdfs in the following form. (number:int,word:chararray,…