This is from my learning notes!!!
1.1
Setting up Spark on Google
Colab
Google Collaborator is perfect cloud platform for someone to start learning Python. You can access what you practiced from anywhere and everywhere.
1.1.1
Install Java, Spark, and
Findspark
1.1.2
Set Environment Variables
1.1.3
Start a SparkSession
1.1.4
Use Spark!
1.1
Setting up Spark on Google
Colab
Google Collaborator is perfect cloud platform for someone to start learning Python. You can access what you practiced from anywhere and everywhere.
This could also be used to learn
Spark . Please follow below steps. Make sure you check the file version
and do the modification as needed (like look for latest .tgz file etc.)
1.1.1
Install Java, Spark, and
Findspark
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q
http://apache.osuosl.org/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
!tar xf spark-2.4.3-bin-hadoop2.7.tgz
!pip install -q findspark
1.1.2
Set Environment Variables
import os
os.environ["JAVA_HOME"] =
"/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] =
"/content/spark-2.4.3-bin-hadoop2.7"
1.1.3
Start a SparkSession
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark =
SparkSession.builder.master("local[*]").getOrCreate()
1.1.4
Use Spark!
df = spark.createDataFrame([{"winner":
"Humanity"} for x in range(100)])
df.show(2)
No comments:
Post a Comment