Softwares

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Download Java JDK11–

https://www.oracle.com/in/java/technologies/javase/jdk11-archive-downloads.html

1. Open Above link and click respective Java version for your PC – (Java SE Development
kit 11.0.19)

2. while downloading – it may ask Oracle account, create your own account and
proceed download.

3. Open the downloaded executable file and install.

Python download –
https://www.python.org/downloads/release/python-3115/

1. Open the above url and download python 3.11 version ( Make your selecting proper
version of 32 or 64 depending on your machine)
2. While installing make sure you are selecting check box to add path to environment
variables, If you are not able to find checkbox not an issue we can add path
manually later

3. Once file is downloaded, Click on it and install.

Pycharm Community edition -


https://www.jetbrains.com/pycharm/download/
1. Download and install from above link

Anaconda navigator – If you don’t want to install all softwares separately and if you want
install all together in single tool you can download Anaconda navigator – I won’t
recommend this for now. We will see it later.

Pyspark setup:

1. Download Pyspark from below link


https://spark.apache.org/downloads.html
2. Unzip and place it one folder
tar xvzf spark-3.3.1-bin-hadoop3.tgz -C C:\Users\Sreenivasulu_Kattuba\Desktop\
Spark -- Path
3. Update environment variables and set
PYSPARK_PYTHON – python
SPARK_HOME - C:\Users\Sreenivasulu_Kattuba\Desktop\Spark
HADOOP_HOME - C:\Users\Sreenivasulu_Kattuba\Desktop\Spark
And add variable under path - C:\Users\Sreenivasulu_Kattuba\Desktop\Spark\bin
Verify path set correctly or not. – echo %HADOOP_HOME%
echo %PYSPARK_HOME%
echo %JAVA_HOME%
echo %PYSPARK_PYTHON%
the above commands should return correct paths

4. Open the pycharm navigate to settingnavigate to project project structure


Click on add content root
Add folder paths – Spark/python folder path
Spark/python/lib/py4j….zip file
5. You are all set now
6. Run below code without any issue
# Import SparkSession
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.master("local[1]") \
.appName("SparkByExamples.com") \
.getOrCreate()
dataList = [("Java", 20000), ("Python", 100000),
("Scala", 3000)]
df=spark.createDataFrame(dataList,
schema=['Language','fee'])

df.show()

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy