Show full column content in Spark

This post briefs how we can display the full contents of data frame columns in Apache Spark. The default behavior of Spark truncates the column values if it is more than 20 characters. However, sometimes we need to display the full values rather than the truncated data. Having truncated data might not be useful in some cases. For example, if we need to compare some values from two different columns or data frames. In such cases, we might need the full content of the column instead of the default truncated values of the columns. To do so, Spark exposes some options which need to be added in Spark’s dataframe.show method as an additional argument. This additional argument helps us to show full column content in Spark.

Sample CSV file

So, let’s use the sample CSV file which has below columns:

  1. ProductName – Name of the product
  2. Descr – Product description column (a long string-valued column)

For simplicity, we are using only two columns in the CSV file. Also, the CSV file has a header row and is using double quotes (“) as a text qualifier. Below is the sample file data.

Sample CSV file with long text
Sample CSV file with long text

Spark dataframe show method default behavior

Now, we are going to display the data frame column values using the show method. Below code can be used to display the top N records using the show method of a Spark data frame.

df = spark.read.\
        option("delimiter", ",").\
        option("header","true").\
        csv("hdfs:///user/admin/CSV_with_long_characters.csv")
df.show(5)

Output:

Show method with truncated data
Show method with truncated data

In the above output, we can see that the “Descr” column values are being truncated. This is the default settings in Spark.

Show full column content in Spark using “truncate = False” option

Now, we are going to read the full content of the column without truncating the values. To do so, we need to use truncate = False in the show method of the data frame. This looks as below:

df = spark.read.\
        option("delimiter", ",").\
        option("header","true").\
        csv("hdfs:///user/admin/CSV_with_long_characters.csv")
df.show(5, truncate=False)

Output:

Show method to display full content
Show method to display full content

Now we can see that the full content of the column is displayed. It is not truncating the column values anymore. So, to avoid truncation, we need to use the truncate = False as an additional parameter.

Thanks for the reading. Please share your inputs in the comment section.

Rate This
[Total: 1 Average: 5]

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.