Tuesday, August 4, 2015

IPython Notebook in pySpark mode on Ubuntu

Lately I have been very interested about Recommender Systems. According to 


I have started courses on Coursera and edX on the various types of Recommender system. All the courses makes use of a big data framework, (commonly Apache Spark over Hadoop).



This has necessitated learning about Apache Spark. Trying to run through a tutorial from codemento.io, The prerequisite was IPython notebook in pySpark mode
The means, Installing Apache Spark with IPython
  • python
  • ipython notebook
  • Scala
  • Java
  • Apache Spark


sudo apt-get install default-jdk

sudo apt-get install ipython3 ipython3-notebook
or
pip install "ipython[notebook]"

wget http://www.scala-lang.org/files/archive/scala-2.11.7.deb
sudo dpkg -i scala-2.11.7.deb
sudo apt-get update
sudo apt-get install scala
sudo apt-get install -f

#prebuilt package of the latest Spark release, for Hadoop 2.4
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop2.4.tgz
tar -xzvf spark-1.2.0-bin-hadoop2.4.tgz

cp spark-1.4.0-bin-hadoop2.6/ /opt/spark-1.4.0
cd /opt/spark-1.3.1
./bin/spark-shell
./bin/pyspark

MASTER="spark://127.0.0.1:7077" SPARK_EXECUTOR_MEMORY="6G" IPYTHON_OPTS="notebook --pylab inline" /opt/spark-1.3.1/bin/pyspark
MASTER="spark://127.0.0.1:7077" IPYTHON_OPTS="notebook --pylab inline" /opt/spark-1.3.1/bin/spark-shell

No comments: