- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 226字
- 2025-04-04 16:35:18
There's more...
Now that we have everything in place, let's see what this can do.
First, start Jupyter (note that we do not use the pyspark command):
jupyter notebook
You should now be able to see the following options if you want to add a new notebook:

If you click on PySpark, it will open a notebook and connect to a kernel.
There are a number of available magics to interact with the notebooks; type %%help to list them all. Here's the list of the most important ones:

Once you have configured your session, you will get information back from Livy about the active sessions that are currently running:

Let's try to create a simple data frame using the following code:
from pyspark.sql.types import *
# Generate our data
ListRDD = sc.parallelize([
(123, 'Skye', 19, 'brown'),
(223, 'Rachel', 22, 'green'),
(333, 'Albert', 23, 'blue')
])
# The schema is encoded using StructType
schema = StructType([
StructField("id", LongType(), True),
StructField("name", StringType(), True),
StructField("age", LongType(), True),
StructField("eyeColor", StringType(), True)
])
# Apply the schema to the RDD and create DataFrame
drivers = spark.createDataFrame(ListRDD, schema)
# Creates a temporary view using the data frame
drivers.createOrReplaceTempView("drivers")
Once you execute the preceding code in a cell inside the notebook, only then will the SparkSession be created:

If you execute %%sql magic, you will get the following:
