Tidy Data in Python – First Step in Data Science and Machine Learning

Most of the Data Science / Machine Learning projects follow the Pareto principle where we spend almost 80% of the time in data preparation and remaining 20% in choosing and training the appropriate ML model. Mostly, the datasets we get to create Machine Learning models are messy datasets and cannot be fitted into the model […]

Tidy Data in Python – First Step in Data Science and Machine Learning Read More »

Python use case – Import data from excel to sql server table – SQL Server 2017

If we need to import data from an excel file into SQL Server, we can use these methods: SQL Server Import Export Wizard Create an SSIS package to read excel file and load data into a SQL Server table Use T-SQL OPENROWSET query Use the read_excel method of Python’s pandas library (Only available in SQL Server 2017

Python use case – Import data from excel to sql server table – SQL Server 2017 Read More »

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017 SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017 Read More »

Import CSV file into SQL Server using T-SQL query

Sometimes, we need to read an external CSV file using T-SQL query in SQL Server. Due to some functional limitations, we cannot use the import-export wizard functionality in such kinds of scenarios as we need the result set in the middle of the execution of the other queries. There, we can use the BULK INSERT

Import CSV file into SQL Server using T-SQL query Read More »

Python use case – Convert rows into comma separated values in a column – SQL Server 2017

In this post, we are going to learn how we can leverage python in SQL server to generate comma separated values. If we want to combine all values of a single column it is fairly easy as we can use COALESCE function to do that. Here is a reference to the already existing post. But have

Python use case – Convert rows into comma separated values in a column – SQL Server 2017 Read More »

Python use case – Dynamic UNPIVOT using pandas – SQL Server 2017

In this post, we are going to learn how we can leverage the power of Python’s pandas module in SQL Server 2017. pandas is an open source Python library providing data frame as data structure similar to the SQL table with the vectorized operation support for high performance. To know more about pandas, you can click

Python use case – Dynamic UNPIVOT using pandas – SQL Server 2017 Read More »

Handling special characters in Hive (using encoding properties)

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this

Handling special characters in Hive (using encoding properties) Read More »

Skip header and footer rows in Hive

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in

Skip header and footer rows in Hive Read More »