Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Hadoop MapReduce users Recent threads

Threads Replies First post Last post
cleanup of data when restarting Tasktracker of Hadoop
By: hadoop anis
Friends, When Tasktracker exits, then data persist on linux filesystem. (I am using Hadoop without HDFS) but when I restart the tasktracker on that node it cleans all data on it's directory. Is this normal…
0 May 24 2012
09:34
May 24 2012
09:34
Good learning resources for 0.23?
By: Keith Wiley
I have already preordered the third edition of Tom's book (obviously, I don't have it yet since it won't be published until the end of the month), but aside from that, I'm looking for good resources for learning how to program to the .23 API. I…
2 May 23 2012
17:19
May 24 2012
01:21
Would I call a DLL in MR by using JNI
By: jason Yang
Hi, All~ Currently, I'm trying to rewrite an algorithm to MapReduce form. Since the algorithm depends on some third-party DLLs which are written in C++, I was wondering would I call a DLL in the Map() / Reduce() by using JNI? Thanks.
2 May 23 2012
04:38
May 23 2012
16:44
FileOutputCommiter behavior doubts
By: Subroto
Hi, I have an implementation SampleFileOutputCommiter which extends org.apache.hadoop.mapred.FileOutputCommitter . The implementation has specific code to be executed during cleanupJob() execution. When the framework(LocalJobRunner) makes a call…
3 May 23 2012
04:24
May 23 2012
06:44
API to get info for deprecated key
By: Subroto
Hi, Though this question may relate to Hadoop-Common project but, I faced the concern while working with MR. The current version of Hadoop deprecates many keys but, takes care of adding the new keys to the configuration accordingly. For the end…
2 May 22 2012
02:53
May 23 2012
02:45
Is there a good benchmark to evaluate the CPU time/space tradeoff in the shuffle stage of hadoop?
By: Jonathan Coveney
I was referred here by Alan Gates (I'm a committer on the Pig project). I've been dealing some with the intermediate serialization of Pi objects. When serializing, there is generally the time to serialize vs. space on disk tradeoff (an extreme…
0 May 22 2012
19:13
May 22 2012
19:13
org.apache.hadoop.io.MapFile.Reader.Reader(FileSystem fs, String dirName, Configuration conf) constr
By: Subroto
Hi, The constructor of Reader class ignores the FileSystem parameter provided in the constructor parameter. This results in creation of Path on the basis of default FileSystem mentioned in the configuration even though user wants to pass…
2 May 18 2012
09:13
May 18 2012
09:34
Need help for writing map reduce functions in hadoop-1.0.1 java
By: Ravi Joshi
Yeah, finally i get the exact place for my question. Hi, I am newbie in Hadoop. I have successfully installed Hadoop-1.0.1 on my Ubuntu10.04 LTS and i am using Eclipse Indigo for designing Hadoop MapReduce application. I am writing my own map and…
0 May 18 2012
06:11
May 18 2012
06:11
How HDFS divides Files into block
By: Utkarsh Gupta
Hi, I have a doubt about HDFS which may be a very trivial thing but I am not able to understand it. Since hdfs keeps the files in block of 64/128 MB how does HDFS splits files? The problem which I see is that suppose I have a long string in my…
1 May 18 2012
04:11
May 18 2012
05:23
TestDFSIO job hangs (0.23.1-cdh4b2)
By: Trevor Robinson
Would someone please give me some troubleshooting tips for TestDFSIO hanging on a new 0.23.1-cdh4b2 cluster? I've tried both a 5-machine cluster and just running everything on a single node. It's my first time configuring YARN, so maybe I've…
2 May 14 2012
15:46
May 14 2012
18:55
Re: max 1 mapper per node
By: Radim Kolar
This is a multi-part message in MIME format.Dne 10.5.2012 15:29, Robert Evans napsal(a): > Yes adding in more resources in the scheduling request would be the > ideal solution to the problem. But sadly that is not a trivial change. Best…
0 May 14 2012
10:10
May 14 2012
10:10
memory configuration
By: Keren Ouaknine
Hello, I keep on getting a memory error, these are my configuration and their respective errors: Few questions: why is physical memory set to 1.0GB when I actually have 47G on these machines. virtual memory is also limited to 2.1, even when I…
0 May 14 2012
04:58
May 14 2012
04:58
Hadoop 0.23.1 - cluster startup and job test
By: Keren Ouaknine
Hello, I configured 0.23 thanks to cloudavenue's <http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen -mapreduce.html>post. My UI seems ok, but reports only one node out of the ten. My master is running: 18130 DataNode 18264…
1 May 12 2012
18:25
May 13 2012
00:04
Re: max 1 mapper per node
By: Radim Kolar
if plugin system for AM is overkill, something simpler can be made like: maximum number of mappers per node maximum number of reducers per node maximum percentage of non data local tasks maximum percentage of rack local tasks and set this in job…
9 May 3 2012
07:59
May 11 2012
10:12
SocketTimeOutException while running Hadoop job
By: ashish vyas
Hi All, I am running jobs on cluster in my application. In one of my jobs i am getting SocketTimeOutException and job is failing. I have ran the job out of hadoop and it runs fine. But even on pseudo cluster it fails on Hadoop with following…
2 May 7 2012
11:11
May 8 2012
14:20
Ant Colony Optimization for Travelling Salesman Problem in Hadoop
By: sharat attupurath
Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column…
17 May 4 2012
11:06
May 8 2012
11:25
Getting filename in case of MultipleInputs
By: Kasi Subrahmanyam
Hi, Could anyone suggest how to get the filename in the mapper. I have gone through the JIRA ticket that map.input.file doesnt work in case of multiple inputs,TaggedInputSplit also doesnt work in case of 0.20.2 version as it is not a public class.…
5 May 3 2012
07:56
May 6 2012
01:30
hanging context.write() with large arrays
By: Zuhair Khayyat
Hi, I am building a MapReduce application that constructs the adjacency list of a graph from an input edge list. I noticed that my Reduce phase always hangs (and timeout eventually) as it calls the function context.write(Key_x,Value_x) when the…
2 May 5 2012
09:06
May 5 2012
09:57
kerberos security enabled and hadoop/hdfs/mapred users
By: Koert Kuipers
do i understand it correctly that with kerberos enabled the mappers and reducers will be "run as" the actual user that started them? as opposed to the user that runs the tasktracker, which is mapred or hadoop or something like that?
1 May 3 2012
18:09
May 3 2012
18:14
MapReduce jobs remotely
By: Kevin
Hi, I have a cluster running YARN, and mapreduce jobs run as expected when they are executed from one of the nodes. However, when I run Pig scripts from a remote client, Pig connects to HDFS and HBase but runs its MapReduce job using the…
2 May 2 2012
13:41
May 3 2012
09:36