Analytics/ML

Install Spark on Windows (Local machine) with PySpark – Step by Step

Apache Spark is a general-purpose big data processing engine. It is a very powerful cluster computing framework which can run from a single cluster to thousands of clusters. It can run on clusters managed by Hadoop YARN, Apache Mesos, or by Spark’s standalone cluster manager itself. To read more on Spark Big data processing framework, […]

Install Spark on Windows (Local machine) with PySpark – Step by Step Read More »

Change Jupyter Notebook startup folder on Windows and Mac OS

Once we have installed the Jupyter notebook, we can start it by executing “jupyter notebook” command in the command prompt on a Windows machine or in the terminal on a Mac machine. Jupyter notebook is a very useful web-based application which can be used to write programs in many programming languages like Python, R, Scala,

Change Jupyter Notebook startup folder on Windows and Mac OS Read More »

Python use case – Save each worksheet as a separate excel workbook

In this post “Python use case – Save each worksheet as a separate excel workbook“, we are going to learn that how we can create a separate workbook for each worksheet of a given excel file. We will be copying data, values, formatting and all other settings of the sheet in the newly created workbook.

Python use case – Save each worksheet as a separate excel workbook Read More »

Building Decision Tree model in python from scratch – Step by step

In previous post, we created our first Machine Learning model using Logistic Regression to solve a classification problem. We used “Wisconsin Breast Cancer dataset” for demonstration purpose. Now, in this post “Building Decision Tree model in python from scratch – Step by step”, we will be using IRIS dataset which is a standard dataset that

Building Decision Tree model in python from scratch – Step by step Read More »

Building first Machine Learning model using Logistic Regression in Python – Step by Step

This post briefs how to create our first machine learning predictive model using Logistic regression in Python. When we start working on a Machine Learning project, first, we perform some data wrangling and transformation to get the tidy dataset. Then, we perform some EDA to find trends, patterns, and outliers in the given dataset. Once, we have machine-interpretable data

Building first Machine Learning model using Logistic Regression in Python – Step by Step Read More »

Exploratory Data Analysis (EDA) using Python – Second step in Data Science and Machine Learning

In the previous post, “Tidy Data in Python – First Step in Data Science and Machine Learning”, we discussed the importance of the tidy data and its principles. In a Machine Learning project, once we have a tidy dataset in place, it is always recommended to perform EDA (Exploratory Data Analysis) on the underlying data

Exploratory Data Analysis (EDA) using Python – Second step in Data Science and Machine Learning Read More »

Python use case – Resampling time series data (Upsampling and downsampling) – SQL Server 2017

Resampling time series data in SQL Server using Python’s pandas library In this post, we are going to learn how we can use the power of Python in SQL Server 2017 to resample time series data using Python’s pandas library. Sometimes, we get the sample data (observations) at a different frequency (higher or lower) than

Python use case – Resampling time series data (Upsampling and downsampling) – SQL Server 2017 Read More »

What is Machine learning and why is it gaining so much popularity?

Well now a days everyone seems to be talking about machine learning and its applications/uses, but have we ever thought how all of a sudden ML has become so popular? If I tell you that work on AI started way back in 1950 and Machine learning started to grow rapidly in 1990, what has suddenly

What is Machine learning and why is it gaining so much popularity? Read More »

Tidy Data in Python – First Step in Data Science and Machine Learning

Most of the Data Science / Machine Learning projects follow the Pareto principle where we spend almost 80% of the time in data preparation and remaining 20% in choosing and training the appropriate ML model. Mostly, the datasets we get to create Machine Learning models are messy datasets and cannot be fitted into the model

Tidy Data in Python – First Step in Data Science and Machine Learning Read More »

Python use case – Import data from excel to sql server table – SQL Server 2017

If we need to import data from an excel file into SQL Server, we can use these methods: SQL Server Import Export Wizard Create an SSIS package to read excel file and load data into a SQL Server table Use T-SQL OPENROWSET query Use the read_excel method of Python’s pandas library (Only available in SQL Server 2017

Python use case – Import data from excel to sql server table – SQL Server 2017 Read More »