I am trying to set up Apache-Spark on a small standalone cluster (1 master node and 8 slave nodes) . I have installed a "pre-built" version of SPARC 1.1.0 that is built on top of the Hadoop 2.4. I have installed passwordless SSH between nodes and exported some required environment variables. One of these variables (which is probably the most relevant) is:
export SPARK_LOCAL_DIRS = / scratch / spark /
I have a small python code The piece that works with SPARC can run it locally - not with the cluster on my desktop:
$ SPARK_HOME / bin / spark-submit ~ / my_code Py
After I have copied the code to the cluster, I start all processes from the head node:
$ SPARK_HOME / sbin / start-all
and each slave is listed as running process as xxxxx
If I try to run my code again with the above command:
$ SPARK_HOME / bin / spark-submit ~ / MY_code.py
I get the following error:
14/10/27 14:19:02 error usage. Usage: Local Route Dire / Scratch / Spark / Ignoring this directory is 14/10/27 14:19:02 error storage. Diskblock Manager: Failed to create any Local DIR.
I have permissions set to 777 on / scratch
and / scratch / spark
. Any help is greatly appreciated. The problem was that I did not know that the master node also needs a scratch directory.
Comments
Post a Comment