step by step

Access git repository using SSH key in PyCharm on Windows and Mac machine

In this post, we are going to discuss how we can set up git bash, SSH keys, and PyCharam IDE to access a git repository using the command line on a Windows or Mac machine. First, we will set it up on a Windows machine followed by a Mac machine. The setup process is very […]

Access git repository using SSH key in PyCharm on Windows and Mac machine Read More »

Read and write data to SQL Server from Spark using pyspark

Apache Spark is a very powerful general-purpose distributed computing framework. It provides a different kind of data abstractions like RDDs, DataFrames, and DataSets on top of the distributed collection of the data. Spark is highly scalable Big data processing engine which can run on a single cluster to thousands of clusters. To follow this exercise,

Read and write data to SQL Server from Spark using pyspark Read More »

Install Spark on Windows (Local machine) with PySpark – Step by Step

Apache Spark is a general-purpose big data processing engine. It is a very powerful cluster computing framework which can run from a single cluster to thousands of clusters. It can run on clusters managed by Hadoop YARN, Apache Mesos, or by Spark’s standalone cluster manager itself. To read more on Spark Big data processing framework,

Install Spark on Windows (Local machine) with PySpark – Step by Step Read More »

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017 SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017 Read More »

Handling special characters in Hive (using encoding properties)

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this

Handling special characters in Hive (using encoding properties) Read More »

Skip header and footer rows in Hive

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in

Skip header and footer rows in Hive Read More »

Preserve Hive metastore in Azure HDInsight

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes,

Preserve Hive metastore in Azure HDInsight Read More »

Get error column name in Data Flow Task in SSIS

How to get error column name and error description in Data Flow Task in SSIS During execution of an SSIS package, when a bad row comes in the data flow task, the task gets failed. However, most of the components (source, transformation, and destination) in the data flow task exposes an error output path which can

Get error column name in Data Flow Task in SSIS Read More »

Full Text Search on files in SQL Server

What is Full Text Search in SQL Server? Full Text Search in SQL Server enables us to perform complex queries against character based data. Full Text Search supports char, varchar, nchar, nvarchar, text, ntext, image, varbinary and xml data types. We Can store document files in varbinary(max) format with their extensions and enable Full-Text search

Full Text Search on files in SQL Server Read More »