This post will teach us how to execute a scala file in Spark without creating a jar file. We know that a scala source code file has an extension of .scala. Also, we need to create or package the source code into a jar file to execute an application written in Scala. We can create an executable jar file by compiling the scala application along with all the dependencies and then it can be used in the production environment. However, for try-and-test purposes, apache-spark provides a spark-shell console application that can be used locally. This spark console is a REPL (Read-Evaluate-Print-Loop) environment and is very useful during the Scala application development. The REPL application can be started using the spark-shell command from a terminal window in Linux or macOS, or a command prompt in a Windows machine. This spark console is very helpful especially if we want to try and test a piece of code without using the Dev, Stage, or Prod environments that can cost some money.
Using the REPL window is fine for evaluating small pieces of code in chunks. However, sometimes we need to evaluate and test a bigger part of the source code. In this case, testing a piece of code becomes a tedious job especially if the written source code is bigger in size. So, here we will discuss a few ways that can help us to execute Scala code in spark-shell without creating a new jar file. Below is the spark REPL window for reference.
Using spark-shell -i file.scala – (Supports multiline coding method)
This method is handy as it supports multiline code (line continuation) execution. Using this method, we can execute a scala source code that contains code scattered through multiple lines in a source code file. This is the preferred way we can use especially if we do not want to convert the multiline code to a single-line code.
This is the sample Scala code that we want to execute in spark-shell without creating a jar file. In this sample code, we are trying to read the data from a Hive table named “test.sampleTable“. You can create a sample table or use any existing table in place of the test.sampleTable in your environment. We have saved this file as “~/Downloads/executeMultiLine.scala“. Also, we can see that we have multiline code in this file using line continuation methodologies supported by Scala language.
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder
.master("local[*]")
.appName("Scala Test")
.enableHiveSupport()
.getOrCreate()
val df = spark.sql("select * from test.sampleTable")
df.show(truncate = false)
We can use the terminal window to start the spark-shell and execute the scala file directly. Open the terminal window and execute the below command.
spark-shell -i ~/Downloads/executeMultiLine.scala
Output
Using spark-shell < file.scala – (Supports single-line coding method only)
This method does not support line continuation and executes each line of the code from the input file independently. Using this method, if we will execute the above sample code, we will run into errors because of the multiline source code. The sample scala source code that we are using in this method is below.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").appName("Scala Test").enableHiveSupport().getOrCreate()
val df = spark.sql("select * from test.sampleTable")
df.show(truncate = false)
To execute this scala file, open the terminal window and type the below command.
spark-shell < ~/Downloads/executeSingleLine.scala
Using :load file.scala – (Supports single-line coding method only)
Again, this method also does not support line continuation coding style. It executes each line of the code from the input file line-by-line. Below is the sample code that we execute here.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").appName("Scala Test").enableHiveSupport().getOrCreate()
val df = spark.sql("select * from test.sampleTable")
df.show(truncate = false)
We can use the :load command if we have already started the spark-shell console application. Open the spark-shell REPL window and type the below command to load the sample code from the scala file and execute it in spark.
:load /Users/admin/Downloads/executeSingleLine.scala
Using :paste command in spark-shell – (Supports single-line coding method only)
Again, this method can also be used to execute the single-line coding method only. If we have already started the spark-shell console, we can type :paste command and then manually copy and paste the scala code in the spark-shell. Once the code is copied, we can press ctrl + D to execute the command.
:paste
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").appName("Scala Test").enableHiveSupport().getOrCreate()
val df = spark.sql("select * from test.sampleTable")
df.show(truncate = false)
Single vs Multiline code
Scala supports splitting a single line code into multiple lines using line continuation for better readability. We have already mentioned that we can use the spark-shell -i file.scala method to execute a scala file in case we have multiline code in it. Otherwise, we can use any of these mentioned methods. Below is the sample code that helps to understand the difference between single-line and multiple-line coding styles.
Thanks for the reading. Please share your inputs in the comment section.