Big Data Technologies Lab
Big Data Technologies Lab
Big Data Technologies Lab
CSC - 394
Name: Addhyan Pant DATE: 25th March 2021
UID : 18BCS3780 SEC : 18AITBDA1 ( Group 1 )
EXPERIMENT – 5
AIM: Write the code of a word count program using Apache Spark
ii) Using the filter and map function with Lambda function.
Lambda Function: Lambda functions are anonymous functions in Python. Anonymous
functions do not bind to any name in runtime and it returns the functions without any
name. They are usually used with map and filter methods. Lambda functions create
functions to be called later.
PrintSchema action prints the types of columns in the Dataframe and it gives information
aboutwhether there is null values in columns or not.
Select method to select some columns of DataFrame. If we give argument to show method, it
prints out
rows as number of arguments.
Getting the count of the Data Elements in the DataFrame.
We can use the dropDuplicates action to get the Dataframe to remove/drop all the duplicates
in the
dataframe.
● We can also get the count of the females in the dataframe using count action.
● Group the dataframe based upon the Sex using groupby action.
● Sort the dataframe based upon the DOB using orderby action.
● Rename the column name using withColumnRenamed action.
3) Using SQL queries with Data Frames by using Spark SQL module
SQL queries are used to achieve the same things with Data Frames. Firstly, we should
create temporary table by using create Or Replace Temp View method. We should
give the name of temporary table as an argument to the method. Then, we can give
any query we want to execute to Spark Session's sql method as an argument.
Result: We have successfully created a Apache Spark code for word count and have also
implemented it.