- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 61字
- 2025-04-04 16:35:18
.distinct() transformation
The distinct() transformation returns a new RDD containing the distinct elements of the source RDD. So, look at the following code snippet:
# Provide the distinct elements for the
# third column of airports representing
# countries
(
airports
.map(lambda c: c[2])
.distinct()
.take(5)
)
This will return the following output:
# Output
[u'Canada', u'USA', u'Country']