2022

Fill null with the next not null value – Spark Dataframe

In this post, we discussed how to fill a null value with the previous not-null value in a Spark Dataframe. We have also discussed how to extract the non-null values per group from a spark dataframe. Now, in this post, we will learn how to fill a null value with the next available not-null value […]

Fill null with the next not null value – Spark Dataframe Read More »

Fill null with the previous not null value – Spark Dataframe

In the previous post, we discussed how to extract the non-null values per group from a spark dataframe. Now, in this post, we will learn how to fill the null values with the previous not-null value in a spark dataframe using the backfill method. To demonstrate this with the help of an example, we will

Fill null with the previous not null value – Spark Dataframe Read More »

ERROR Utils: Aborting task java.io.IOException: Failed to connect to – Local Spark

In this post, we will discuss an error/warning message “java.io.IOException: Failed to connect to”. This error keeps coming when we try to execute a hive query from spark-shell using spark SQL. This error occurs when Spark tries to execute a task in local mode (pseudo-distributed mode). It is caused because of a connection exception. The

ERROR Utils: Aborting task java.io.IOException: Failed to connect to – Local Spark Read More »

Get the first non-null value per group Spark dataframe

Suppose, we need to get the first non-null value from a Dataframe from each partition. Certainly, we want to get only the first not null value from each column regardless of the rows. That means a not-null value from column A from row 5 can be stitched with another not-null value of column B from

Get the first non-null value per group Spark dataframe Read More »

Download a file from DBFS – Databricks to the local machine

In this post, we will learn how we can download a file from DBFS i.e. Databricks File System to the Local machine. DBFS is the File system that Databricks uses to store its files. It is a distributed file system mounted into a Databricks workspace and it is available on Databricks clusters. To demonstrate how

Download a file from DBFS – Databricks to the local machine Read More »

Scala Option, Some, None – Exception and Null handling

In the previous post, we discussed the Try, Success, Failure exception handling method. Now, in this post, we will discuss the use of Scala’s Option, Some, None pattern and its usage. Scala is a high-level programming language combining object-oriented and functional programming in one place. It is a very powerful programming language that can be

Scala Option, Some, None – Exception and Null handling Read More »

Scala Try, Success, Failure – Functional error handling

In this post, we will discuss the Scala’s functional error handling method using Try, Success, Failure. We know that Scala is a high-level programming language that combines both object-oriented and functional programming in one place. It runs on JVM so it can be mixed seamlessly with Java. Scala’s static types helps to identify bugs at

Scala Try, Success, Failure – Functional error handling Read More »

Using Pandas on Spark

Pandas is one of the most popular Python libraries used by Data Scientists/Data Engineers for data wrangling and data analysis. Also, Pandas provide DataFrames (a table-like structure that stores data in rows and columns) to deal with structured datasets. These DataFrames are very similar to Spark’s DataFrames. However, Pandas dataframes are limited to a single

Using Pandas on Spark Read More »