Bigdata Diary

It is first time, that i felt i am so determined this time and very enthusiastic about learning Big Data and realized that HADOOP is playing bigger role over there. So started digging out the information across all the web resources, It was not a cake walk for me because there were multiple problems that i had faced while installing HADOOP in windows. So just trying to see if mentioning these would help some one.
Here are the problems that i had faced and how did i resolve them. I have mentioned the source of that information as well for their credit. I thought of mentioning each and every problem but do not remember all the information.

Installing Apache Hadoop for Windows.
Step1 : First install CYGWIN
I installed it but when i tried the below ssh local host it was not done . So here is the solution that I followed.

$ ssh localhost
    ssh: connect to host localhost port 22: Connection refused

if you are facing this problem in windows xp follow these steps to open port for ssh

Go to windows firewall of security section in control panel

Exceptions->add port

Give port name as ssh and number as 22.

Select option TCP

Click on Ok

This will help you open ssh from cygwin

For local application development like hadoop on windows please

please change the scope "localhost /ip address " in the custom list

I've read about the restrictions on accessing shares while logged
into a Windows system with the Cygwin ssh daemon. We are interested
in this to do remote builds, and it would be nice to access network
shares. We only really need one user to be able to log in, so I
thought I'd change the CYGWIN sshd service to run as that user.
However, when I changed the service and tried to start it, I got the
following error message: "The VYGWIN sshd servcice on Local COmputer
started and then stopped." Any ideas what's going on?

I tried to revert to having the service started by the .\sshd user,
but I can't get that to work no either! I think it's because I am
using the wrong password. How can I change or reset the password on
that account?
It's not a month since Larry posted this (thanks, BTW), and this
issue has bubbled up to the top again. I have tried various ways to
get the sshd service started as a domain user (instead of the local
sshd_server user) and can not get it to work. What is the correct
syntax to specify a domain user with cygrunsrv? This is what I have
tried:

cygrunsrv -I sshd -u "DOMAINNAME\USERNAME" -w PASSWORD -d "CYGWIN
sshd" -p /usr/sbin/sshd -a -D -e "CYGWIN=bin tty smbntsec" -y tcpip

This successfully installs the service, and if I look at it in the
Services panel it shows the correct username (DOMAIN\USERNAME), but
if I try to start the service I always get the error "The Cygwin sshd
service in Local Computer started and then stopped". If I substitute
sshd_server for the user and supply the correct password, the sshd
service starts correctly. But I want to start the service as a
domain user so that I can access network shares and resolve some
build issues with Visual Studio that are apparently caused by not
being fully authenticated.
Useful links
http://www.petrikainulainen.net/programming/apache-hadoop/install-and-configure-apache-hadoop-to-run-in-a-pseudo-distributed-mode/

http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html

After installing CYGWIN, you need to make sure that all your HADOOP files are placed into CYGWIN directory itself otherwise you will get errors.

STEP 2: Tried to run the command bin/hadoop but got the below error

Note if you get an error like "./bin/hadoop: line 2: $'\r': command not found" This can happen when the line endings in the hadoop script files become corrupted. To repair them, run the following set of commands:

dos2unix bin/hadoop
dos2unix bin/*.sh
dos2unix conf/*.sh
http://www.infosci.cornell.edu/hadoop/windows.html

STEP3: After running the above commands, At least i could see system is recognizing the HADOOP.

Felt very happy and patted on my back happily but the below errors smiled at me

b009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ bin/hadoop
: No such file or directoryn
bin/hadoop: line 60: syntax error near unexpected token `$'in\r''
'in/hadoop: line 60: `case "`uname`" in

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix bin/hadoop
dos2unix: converting file bin/hadoop to Unix format ...

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix bin/*.sh
dos2unix: converting file bin/hadoop-config.sh to Unix format ...
dos2unix: converting file bin/hadoop-daemon.sh to Unix format ...
dos2unix: converting file bin/hadoop-daemons.sh to Unix format ...
dos2unix: converting file bin/slaves.sh to Unix format ...
dos2unix: converting file bin/start-all.sh to Unix format ...
dos2unix: converting file bin/start-balancer.sh to Unix format ...
dos2unix: converting file bin/start-dfs.sh to Unix format ...
dos2unix: converting file bin/start-jobhistoryserver.sh to Unix format ...
dos2unix: converting file bin/start-mapred.sh to Unix format ...
dos2unix: converting file bin/stop-all.sh to Unix format ...
dos2unix: converting file bin/stop-balancer.sh to Unix format ...
dos2unix: converting file bin/stop-dfs.sh to Unix format ...
dos2unix: converting file bin/stop-jobhistoryserver.sh to Unix format ...
dos2unix: converting file bin/stop-mapred.sh to Unix format ...

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix conf/*.sh
dos2unix: converting file conf/hadoop-env.sh to Unix format ...

STEP 4 error:

bin/hadoop: line 325: /cygdrive/c/Program: No such file or directory

http://cygwin.com/ml/cygwin/2007-09/msg00640.html

STEP 5: After that I blindly followed the below command as is written in "HADOOP in ACTION " guide

$ bin/hadoop jar hadoop-*-examples.jar

/cygdrive/c/Program Files/Java/jdk1.6.0_31

Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-*-examples.jar

at org.apache.hadoop.util.RunJar.main(RunJar.java:90)

Caused by: java.io.FileNotFoundException: hadoop-*-examples.jar (The filename, directory name, or volume label syntax is incorrect)

at java.util.zip.ZipFile.open(Native Method)

at java.util.zip.ZipFile.<init>(ZipFile.java:127)

at java.util.jar.JarFile.<init>(JarFile.java:135)

at java.util.jar.JarFile.<init>(JarFile.java:72)

at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

bin/hadoop jar hadoop-examples-1.0.4.jar

OOPs, no not resolved, tried

$ bin/hadoop jar wordcount.jar

/cygdrive/c/Program Files/Java/jdk1.6.0_31

Exception in thread "main" java.io.IOException: Error opening job jar: wordcount.jar

at org.apache.hadoop.util.RunJar.main(RunJar.java:90)

Caused by: java.io.FileNotFoundException: wordcount.jar (The system cannot find the file specified)

at java.util.zip.ZipFile.open(Native Method)

at java.util.zip.ZipFile.<init>(ZipFile.java:127)

at java.util.jar.JarFile.<init>(JarFile.java:135)

at java.util.jar.JarFile.<init>(JarFile.java:72)

at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

Finally arrived at

$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount

/cygdrive/c/Program Files/Java/jdk1.6.0_31

Usage: wordcount <in> <out>

So modified to

$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount input output1

/cygdrive/c/Program Files/Java/jdk1.6.0_31

12/11/12 16:23:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

12/11/12 16:23:37 ERROR security.UserGroupInformation: PriviledgedActionException as:sb009239 cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-sb009239\mapred\staging\sb0092391774446851\.staging to 0700

java.io.IOException: Failed to set permissions of path: \tmp\hadoop-sb009239\mapred\staging\sb0092391774446851\.staging to 0700

at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)

at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)

at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)

at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)

at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Solution at https://github.com/congainc/patch-hadoop_7682-1.0.x-win

Bigdata Diary

Friday 24 May 2013

Big Data-Beyond Hadoop-In Response to the articles- 1

Friday 3 May 2013

Skills required for Big Data Experts

Wednesday 5 December 2012

Bigdata Baby Steps