Analytics/ML

Introduction to k-fold Cross-Validation in Python

This post briefs how we can use the k-fold cross-validation to evaluate a Machine Learning model performance using the Scikit-learn library in Python. We know that the performance of a Machine Learning model depends on the training dataset. Also, if the training dataset has a peculiarity, the model created with that dataset will not work […]

Introduction to k-fold Cross-Validation in Python Read More »

Create pair plots using scatter_matrix method in pandas

The exploratory data analysis is a very important step in a Data Science project. It helps us to visualize the data and identify any hidden trends that might not be visible with summary statistics alone. So, we can use matplotlib and seaborn libraries to create stunning visuals in Python. However, the pandas.plotting module of the

Create pair plots using scatter_matrix method in pandas Read More »

Plot ECDF in Python

We know that EDA (Exploratory Data Analysis), is the process of organizing, plotting, and summarizing the data to find trends, patterns, and outliers using statistical and visual methods. Here, we have already discussed various methods of performing EDA with their pros and cons on an underlying dataset. ECDF plot is another visual method of performing

Plot ECDF in Python Read More »

Interactive Data Analysis with HANA using Jupyter Notebook/Jupyter Lab

We have discussed that how we can use Jupyter Lab/Jupyter Notebook to do Interactive Data Analysis with SQL Server using Jupyter Notebooks. Jupyter Notebook is a very powerful and useful tool for any Data Analyst/Data Scientist. The Jupyter Lab is the next generation tool for the Jupyter Notebooks. It provides an interface where we can

Interactive Data Analysis with HANA using Jupyter Notebook/Jupyter Lab Read More »

An Introduction to mssql-cli – Cross platform Interactive command line tool for SQL Server

In this post, we are going to discuss a new cross-platform and interactive command-line query tool that can be used to communicate with Microsoft SQL Server. Unlike sqlcmd command-line utility, mssql-cli supports cross-platform and can be used on Windows, macOS, Linux, Ubuntu, Debian, CentOS, Fedora, and etc. This is an open-source tool and it supports

An Introduction to mssql-cli – Cross platform Interactive command line tool for SQL Server Read More »

Relationship between Binomial and Poisson distributions

In this post, we are going to discuss the Relationship between Binomial and Poisson distributions. We know that Poisson distribution is a limit of Binomial distribution for a large n (number of trials) and small p (independent probability for each trial) values. A large number of trials n with very small probability p indicates a

Relationship between Binomial and Poisson distributions Read More »

Convert Jupyter notebooks to PDF

Jupyter lab is the next-generation web-based UI experience for Jupyter notebook users. It facilitates a tab-based programming interface that is highly extensible and interactive. It supports 40+ programming languages. We have already discussed how we can use Jupyter notebooks for interactive data analysis with SQL Server. With the help of Jupyter notebooks, we can keep

Convert Jupyter notebooks to PDF Read More »

Interactive Data Analysis with SQL Server using Jupyter Notebooks

In this post “Interactive Data Analysis with SQL Server using Jupyter Notebooks“, we will demonstrate how we can use Jupyter Notebooks for interactive data analysis with SQL Server. Jupyter notebooks are one of the most useful tools for any Data Scientist/Data Analyst. It supports 40+ programming languages and facilitates web-based interactive programming IDE. We can

Interactive Data Analysis with SQL Server using Jupyter Notebooks Read More »

Python use case – Export SQL table data to excel and CSV files – SQL Server 2017

In this post, we are going to discuss how we can export SQL Server table data to an Excel file or to a CSV file using Python’s pandas library. Prior to SQL Server 2017, we could use one of the below methods to export data from SQL Server to Excel or CSV file: Create an

Python use case – Export SQL table data to excel and CSV files – SQL Server 2017 Read More »

Read and write data to SQL Server from Spark using pyspark

Apache Spark is a very powerful general-purpose distributed computing framework. It provides a different kind of data abstractions like RDDs, DataFrames, and DataSets on top of the distributed collection of the data. Spark is highly scalable Big data processing engine which can run on a single cluster to thousands of clusters. To follow this exercise,

Read and write data to SQL Server from Spark using pyspark Read More »