dinsdag 19 november 2013

Installing Apache Spark on Ubuntu 12.04

Installing Apache Spark on Ubuntu 12.04

A few easy steps

Installing Apache Spark only involves some simple steps including the 

  1. Install Java
  2. Install Hadoop
  3. Install Scala
  4. Install Spark

Install Java on Ubuntu

Java can be installed as shown on this howto:
  • sudo add-apt-repository ppa:webupd8team/java 
  • sudo apt-get update
  • sudo apt-get install oracle-java7-installer
After installation, you can test if it works by typing java -version at the command prompt. This should give you the java version.

    Install Hadoop on Ubuntu

    Hadoop can simply be installed by downloading a .deb file:
    After installation, you can see if it works by typing hadoop at the command prompt. It should give you some information about using the hadoop command.

    After installing Hadoop, lookup /etc/hadoop/hadoop-env.sh and change the line:
     export JAVA_HOME=/usr/lib/jvm/java-6-sun
    into
     #export JAVA_HOME=/usr/lib/jvm/java-6-sun

    Install Scala on Ubuntu

    Follow the steps as presented on this page:
    • Download Scala from http://scala-lang.org/ and save it somewhere you can find it (e.g. ~/)
    • at the command prompt, type:
      • cd /usr/share
      • sudo tar -zxf <location and name of the tgz file> (e.g. sudo tar -zxf ~/scala-2.10.3.tgz)
      • link (ln -s) the executables to the /usr/bin location, e.g.:
        • sudo ln -s /usr/share/scala-2.10.3/bin/scala /usr/bin/scala
        • sudo ln -s /usr/share/scala-2.10.3/bin/scalac /usr/bin/scalac
        • sudo ln -s /usr/share/scala-2.10.3/bin/fsc /usr/bin/fsc

    Installing Spark on Ubuntu

    Getting Spark up and running is easy as described on http://spark.incubator.apache.org/docs/latest/:
    You can start spark by executing ./spark_shell in the spark home, however log4j is still configured in such a way that all logging messages occur in your main window. You can redirect the messages to a standard log file by creating a file log/log4j.properties with the following content:
    log4j.rootLogger = DEBUG, A1
    log4j.appender.A1=org.apache.log4j.RollingFileAppender
    log4j.appender.A1.File=SparkLog.log
    log4j.appender.A1.MaxFileSize = 100KB
    log4j.appender.A1.layout=org.apache.log4j.PatternLayout
    log4j.appender.A1.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Ignore messages below warning level from Jetty, because it's a bit verbose
    log4j.logger.org.eclipse.jetty=WARN

    Now you are ready to make some Sparks!!!