There's more...

Now that we have everything in place, let's see what this can do. 

First, start Jupyter (note that we do not use the pyspark command):

jupyter notebook

You should now be able to see the following options if you want to add a new notebook:

If you click on PySpark, it will open a notebook and connect to a kernel. 

There are a number of available magics to interact with the notebooks; type %%help to list them all. Here's the list of the most important ones:

Once you have configured your session, you will get information back from Livy about the active sessions that are currently running:

Let's try to create a simple data frame using the following code:

from pyspark.sql.types import *

# Generate our data
ListRDD = sc.parallelize([
(123, 'Skye', 19, 'brown'),
(223, 'Rachel', 22, 'green'),
(333, 'Albert', 23, 'blue')
])

# The schema is encoded using StructType
schema = StructType([
StructField("id", LongType(), True),
StructField("name", StringType(), True),
StructField("age", LongType(), True),
StructField("eyeColor", StringType(), True)
])

# Apply the schema to the RDD and create DataFrame
drivers = spark.createDataFrame(ListRDD, schema)

# Creates a temporary view using the data frame
drivers.createOrReplaceTempView("drivers")

Once you execute the preceding code in a cell inside the notebook, only then will the SparkSession be created:

If you execute %%sql magic, you will get the following: