- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 182字
- 2025-04-04 16:35:18
How to do it...
The installOnRemote.sh script for this recipe can be found in the Chapter01 folder in the GitHub repository: http://bit.ly/2ArlBck. Some portions of the script are very similar to the ones we have outlined in the previous recipes, so we will skip those; you can refer to previous recipes for more information (especially the Installing Spark requirements and the Installing Spark from binaries recipes).
The top-level structure of the script is as follows:
#!/bin/bash
# Shell script for installing Spark from binaries
# on remote servers
#
# PySpark Cookbook
# Author: Tomasz Drabas, Denny Lee
# Version: 0.1
# Date: 12/9/2017
_spark_binary="http://mirrors.ocf.berkeley.edu/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz"
_spark_archive=$( echo "$_spark_binary" | awk -F '/' '{print $NF}' )
_spark_dir=$( echo "${_spark_archive%.*}" )
_spark_destination="/opt/spark"
_java_destination="/usr/lib/jvm/java-8-oracle"
_python_binary="https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh"
_python_archive=$( echo "$_python_binary" | awk -F '/' '{print $NF}' )
_python_destination="/opt/python"
_machine=$(cat /etc/hostname)
_today=$( date +%Y-%m-%d )
_current_dir=$(pwd) # store current working directory
...
printHeader
readIPs
checkJava
installScala
installPython
updateHosts
configureSSH
downloadThePackage
unpack
moveTheBinaries
setSparkEnvironmentVariables
updateSparkConfig
cleanUp
We have highlighted the portions of the script that are more relevant to this recipe in bold font.