2021

Sort By, Order By, Distribute By, and Cluster By in Hive

This post will briefly discuss the difference and similarity between Sort By, Order By, Distribute By, and Cluster By in hive queries. This is one of the most important questions being asked in Big data/Hadoop interviews. These Sort By, Order By, Distribute By, and Cluster By clauses are available in the hive query language and […]

Sort By, Order By, Distribute By, and Cluster By in Hive Read More »

Grant UPDATE and SELECT on specific columns in a table – SQL Server

This post briefs how we can Grant UPDATE and SELECT permissions to specific columns of a table in SQL Server without using a view. So that, this partial vertical access control strategy can help us to manage the permissions directly at the table level. It is always good to set the access permissions at the

Grant UPDATE and SELECT on specific columns in a table – SQL Server Read More »

Get consecutive available seats in a row using SQL query

This post briefs how to get consecutive available seats in a row using SQL query for a multiplex cinema theatre that stores its data into a SQL Server database. In other words, we need to write a query to get n number of available consecutive seats for the multiplex seat booking application. However, for this

Get consecutive available seats in a row using SQL query Read More »

Create pair plots using scatter_matrix method in pandas

The exploratory data analysis is a very important step in a Data Science project. It helps us to visualize the data and identify any hidden trends that might not be visible with summary statistics alone. So, we can use matplotlib and seaborn libraries to create stunning visuals in Python. However, the pandas.plotting module of the

Create pair plots using scatter_matrix method in pandas Read More »

Plot ECDF in Python

We know that EDA (Exploratory Data Analysis), is the process of organizing, plotting, and summarizing the data to find trends, patterns, and outliers using statistical and visual methods. Here, we have already discussed various methods of performing EDA with their pros and cons on an underlying dataset. ECDF plot is another visual method of performing

Plot ECDF in Python Read More »

Interactive Data Analysis with HANA using Jupyter Notebook/Jupyter Lab

We have discussed that how we can use Jupyter Lab/Jupyter Notebook to do Interactive Data Analysis with SQL Server using Jupyter Notebooks. Jupyter Notebook is a very powerful and useful tool for any Data Analyst/Data Scientist. The Jupyter Lab is the next generation tool for the Jupyter Notebooks. It provides an interface where we can

Interactive Data Analysis with HANA using Jupyter Notebook/Jupyter Lab Read More »