Tweets by @sreemancloudera

Friday 24 May 2013

Big Data-Beyond Hadoop-In Response to the articles- 1

Big Data-Beyond Hadoop-In Response to the articles read.

  This is with reference to where the Big data market is heading to beyond Hadoop. why is it so?

   Barry Devlin Mentioned that fundamentally there are 3 kinds of data sources for Big Data. i.e For large sets of data to be processed.
1) Human-Sourced Information
2) Process-Medidated Data
3) Machine Generated Data

Now all these are poorly governed or managed themselves so need to be enhanced with traditional process-mediated data
to deliver useful and relevant business analytics.So the market may need to shift from big data startups and small vendors to more established vendors with enterprise-scale technologies
for semantic and physical integration of multiple data types from multiple sources.

Now the interesting statement is here "Google and Facebook, whose needs have exceeded the capabilities of file-based tools like Hadoop. A new wave of tools—Dremel, Caffeine, Pregel, Spanner and Prism—may be upon us as the biggest big data proponents inexorably move the needle from a batch-oriented, eventually consistent paradigm toward a distributed but ACID-compliant database mind-set."

Now,Big Data Beginners asking the below questions.

So Now is it that Hadoop is not really that promising that it was looking like?
If I go by market hype, Hadoop can solve all big Data problems. Is not it?

Source link: http://ow.ly/lmlgf

Open for comments.

Friday 3 May 2013

Skills required for Big Data Experts

This post is intended to gather the skill set information required for any Big Data experts that various organizations are looking for.

 A solid understanding of Algorithms and Data Structures
A solid understanding of Operating system and networking fundamentals
Hands-On experience on Hadoop and it's ecosystem, CASSANDRA, Any NO-SQL family
Understanding of performance optimization at a scale
Programming skills in Java, Map Reduce

Wednesday 5 December 2012

Bigdata Baby Steps

It is first time, that i felt i am so determined this time and very enthusiastic about learning Big Data and realized that HADOOP is playing bigger role over there. So started digging out the information across all the web resources, It was not a cake walk for me because there were multiple problems that i had faced while installing HADOOP in windows. So just trying to see if mentioning these would help some one.
 Here are the problems that i had faced and how did i resolve them. I have mentioned the source of that information as well for their credit. I thought of mentioning each and every problem but do not remember all the information.




Installing Apache Hadoop for Windows.
Step1 : First install CYGWIN
               I installed it but when i tried the below ssh local host it was not done . So here is the solution that I followed.

$ ssh localhost
    ssh: connect to host localhost port 22: Connection refused
if you are facing this problem in windows xp follow these steps to open port for ssh
Go to windows firewall of security section in control panel
Exceptions->add port
Give port name as ssh and number as 22.
Select option TCP
Click on Ok
This will help you open ssh from cygwin
For local application development like hadoop on windows please
please change the scope "localhost /ip address " in the custom list

I've read about the restrictions on accessing shares while logged   
into a Windows system with the Cygwin ssh daemon.  We are interested   
in this to do remote builds, and it would be nice to access network   
shares.  We only really need one user to be able to log in, so I   
thought I'd change the CYGWIN sshd service to run as that user.   
However, when I changed the service and tried to start it, I got the   
following error message: "The VYGWIN sshd servcice on Local COmputer   
started and then stopped."  Any ideas what's going on? 

I tried to revert to having the service started by the .\sshd user,   
but I can't get that to work no either!  I think it's because I am   
using the wrong password.  How can I change or reset the password on   
that account? 
It's not a month since Larry posted this (thanks, BTW), and this   
issue has bubbled up to the top again.  I have tried various ways to   
get the sshd service started as a domain user (instead of the local   
sshd_server user) and can not get it to work.  What is the correct   
syntax to specify a domain user with cygrunsrv?  This is what I have   
tried: 

   cygrunsrv -I sshd -u "DOMAINNAME\USERNAME" -w PASSWORD -d "CYGWIN   
sshd" -p /usr/sbin/sshd -a -D -e "CYGWIN=bin tty smbntsec" -y tcpip 

This successfully installs the service, and if I look at it in the   
Services panel it shows the correct username (DOMAIN\USERNAME), but   
if I try to start the service I always get the error "The Cygwin sshd   
service in Local Computer started and then stopped".  If I substitute   
sshd_server for the user and supply the correct password, the sshd   
service starts correctly.  But I want to start the service as a   
domain user so that I can access network shares and resolve some   
build issues with Visual Studio that are apparently caused by not   
being fully authenticated. 
Useful links
http://www.petrikainulainen.net/programming/apache-hadoop/install-and-configure-apache-hadoop-to-run-in-a-pseudo-distributed-mode/

http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html

After installing CYGWIN, you need to make sure that all your HADOOP files are placed into CYGWIN directory itself otherwise you will get errors.

STEP 2: Tried to run the command bin/hadoop but got the below error

Note if you get an error like "./bin/hadoop: line 2: $'\r': command not found" This can happen when the line endings in the hadoop script files become corrupted. To repair them, run the following set of commands: 

dos2unix bin/hadoop
dos2unix bin/*.sh
dos2unix conf/*.sh
 
http://www.infosci.cornell.edu/hadoop/windows.html

STEP3: After running the above commands, At least i could see system is recognizing the HADOOP.

Felt very happy and patted on my back happily but the below errors smiled at me

b009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ bin/hadoop
: No such file or directoryn
bin/hadoop: line 60: syntax error near unexpected token `$'in\r''
'in/hadoop: line 60: `case "`uname`" in

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix bin/hadoop
dos2unix: converting file bin/hadoop to Unix format ...

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix bin/*.sh
dos2unix: converting file bin/hadoop-config.sh to Unix format ...
dos2unix: converting file bin/hadoop-daemon.sh to Unix format ...
dos2unix: converting file bin/hadoop-daemons.sh to Unix format ...
dos2unix: converting file bin/slaves.sh to Unix format ...
dos2unix: converting file bin/start-all.sh to Unix format ...
dos2unix: converting file bin/start-balancer.sh to Unix format ...
dos2unix: converting file bin/start-dfs.sh to Unix format ...
dos2unix: converting file bin/start-jobhistoryserver.sh to Unix format ...
dos2unix: converting file bin/start-mapred.sh to Unix format ...
dos2unix: converting file bin/stop-all.sh to Unix format ...
dos2unix: converting file bin/stop-balancer.sh to Unix format ...
dos2unix: converting file bin/stop-dfs.sh to Unix format ...
dos2unix: converting file bin/stop-jobhistoryserver.sh to Unix format ...
dos2unix: converting file bin/stop-mapred.sh to Unix format ...

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix conf/*.sh
dos2unix: converting file conf/hadoop-env.sh to Unix format ...

STEP 4 error:

bin/hadoop: line 325: /cygdrive/c/Program: No such file or directory


STEP 5: After that I blindly followed the below command as is written in "HADOOP in ACTION " guide

$ bin/hadoop jar hadoop-*-examples.jar
/cygdrive/c/Program Files/Java/jdk1.6.0_31
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-*-examples.jar
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: hadoop-*-examples.jar (The filename, directory name, or volume label syntax is incorrect)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:127)
        at java.util.jar.JarFile.<init>(JarFile.java:135)
        at java.util.jar.JarFile.<init>(JarFile.java:72)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

bin/hadoop jar hadoop-examples-1.0.4.jar

OOPs, no not resolved, tried 

$ bin/hadoop jar wordcount.jar
/cygdrive/c/Program Files/Java/jdk1.6.0_31
Exception in thread "main" java.io.IOException: Error opening job jar: wordcount.jar
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: wordcount.jar (The system cannot find the file specified)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:127)
        at java.util.jar.JarFile.<init>(JarFile.java:135)
        at java.util.jar.JarFile.<init>(JarFile.java:72)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

Finally arrived at

$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount
/cygdrive/c/Program Files/Java/jdk1.6.0_31
Usage: wordcount <in> <out>

So modified to

$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount input output1
/cygdrive/c/Program Files/Java/jdk1.6.0_31
12/11/12 16:23:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/11/12 16:23:37 ERROR security.UserGroupInformation: PriviledgedActionException as:sb009239 cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-sb009239\mapred\staging\sb0092391774446851\.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-sb009239\mapred\staging\sb0092391774446851\.staging to 0700
        at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)