In this post, we are going to learn how we can leverage python in SQL server to generate comma separated values.
If we want to combine all values of a single column it is fairly easy as we can use COALESCE function to do that. Here is a reference to the already existing post. But have you ever thought what would happen if we needed a comma separated value in a column along with other columns? In that scenario, this approach would not work.
We can get comma separated values in a column along with other columns using FOR XML PATH query wrapped inside a sub-query, but there also we would need to take care of HTML encoded characters like < and >.
Now, with python’s integration with SQL Server 2017, it can be achieved very easily and efficiently as we do not have to rely on subqueries and XML PATH. Let us see how we can achieve this using Python in SQL Server.
Convert rows into comma separated values in a column using Python Script
We will be using the AdventureWorks2014 database to show how we can achieve this in SQL server using python. Let us have a look at the data. We have tables Production.ProductCategory and Production.ProductSubCategory in AdventureWorks database. This is the subset of data that they contain respectively.
Now what we need from these two tables is output in this format.
The output should have two columns i.e. CategoryName column and CommaSepSubcategory column. The CommaSepSubcategory column should have comma separated values of all subcategories for a product CategoryName.
This is how we will achieve this using python in SQL Server.
EXEC sp_execute_external_script @language = N'Python', @script = N' df = product #Line 1 import pandas as pd #Line 2 dfconcat = df.groupby("category", as_index = False).agg(lambda x: ", ".join(sorted(set(x)))) #Line 3 OutputDataSet = dfconcat ' ,@input_data_1 = N'SELECT C.Name AS category, SC.Name AS subcategory FROM Production.ProductCategory C INNER JOIN Production.ProductSubcategory SC ON C.ProductCategoryID = SC.ProductCategoryID' ,@input_data_1_name = N'product' WITH RESULT SETS(("CategoryName" varchar(512), "CommaSepSubcategory" nvarchar(max)))
We will use pandas module that provides us the power of data-frame(a two-dimensional data structure just like a table). Let us understand the Python code used above in detail now.
Python Scrip – #Line 1
Here, we are importing pandas module and aliasing it as pd.
Python Scrip – #Line 2
Once we have the data from SQL Server into a data frame. We will group the data based on Category and then apply lambda function to each value. Lambda function will then take each value of subcategory for a given product and join it with a comma. we also have used set operation on top of that to eliminate duplicates if any. we then sort the set since sets are unordered data structures.
Python Scrip – #Line 3
In this line, we have assigned the dfconcat data frame to the OutputDataSet data frame which gets returned to the SQL Server as an output.
Quick performance summary
Here is the performance summary of the Python script which we have used to generate comma separated values. The results are pretty impressive as you can see. However, I would recommend you to simulate it on your environment and then share your findings i.e. performance stats.
Thanks for reading. Please share your inputs in the comments section.