In this post “Interactive Data Analysis with SQL Server using Jupyter Notebooks“, we will demonstrate how we can use Jupyter Notebooks for interactive data analysis with SQL Server. Jupyter notebooks are one of the most useful tools for any Data Scientist/Data Analyst. It supports 40+ programming languages and facilitates web-based interactive programming IDE. We can put comments, headings, codes, and output in one single document. This document maintains the context to the original data source which means we can re-execute the code whenever we need it. This feature facilitates Data scientists/Data analysts to play with the code during the presentations. Also, these notebooks are very handy in sharing and can be shared easily across the teams.
What is Jupyter Lab
Jupyter Lab is the next-generation web-based tool for Jupyter notebooks. It enables tab based programming model which is highly extensible. We can arrange multiple windows side by side. The Jupyter Lab brings all the features of the Jupyter notebook along with some additional features. In this tutorial, we will be using the SQL magic commands in Jupyter Lab to interact with SQL Server directly from Jupyter notebooks. This can help you to execute SQL queries/procedures directly from the notebook.
Prerequisites for the setup
We need to install below packages to enable interactive data analysis with a relational database such as SQL Server using Jupyter lab:
Python – We need to download and install Python (consider the latest version of Python 3) from Python’s official website. Once installed, we can connect Python to SQL Server using pyodbc also. However, in this post, we will be using the ipython’s magic sql commands from Jupyter notebooks.
jupyterlab module – This module is required to setup Jupyter notebooks on our machine. This will automatically install the notebooks and other required dependencies. We can execute the below pip command to install jupyterlab from the command prompt.
pip install jupyterlab
pyodbc module – This module is required to create a connection from Python to SQL Server. To install this package, we can use below pip command at the command prompt:
pip install pyodbc
ipython-sql module – This module is required to enable SQL magic commands from Jupyter notebooks. We can use %sql followed by the actual query. We can use this pip command at the command prompt to install it:
pip install ipython-sql
Note: In case you want to set up the startup folder for Jupyter notebook on Windows or Mac machine, you can follow this link.
Querying SQL Server from Jupyter Notebook
To start the Jupyter lab in the web browser, we can simply open a command prompt window and type “Jupyter lab” and hit the Enter key. This will open the Jupyter lab web interface in the default browser.
Let’s use the Jupyter lab to query SQL Server tables directly from Jupyter notebook. To do that we need to follow these steps:
- Load the ipython’s sql module
- Create a connection
- Write queries prefixed with magic command %sql and execute it
Load the ipython’s sql module
To load the ipython’s sql module which enables the magic command for SQL Server, we need to use below code:
%load_ext sql
Create a connection
Now, we need to set the connection string so that the notebook magic commands can execute the SQL queries as like any other SQL Server IDEs. Below is the syntax for the connection string:
Connection string syntax for SQL Server authentication:
mssql+pyodbc://user:password@server:port/DatabaseName?driver=DriverName
Considering that we are using SQL Server 2017 with ODBC driver 2017, we can use a connection string as below:
%sql mssql+pyodbc://admin:test_12345@localhost:1433/TestDB?driver=ODBC+Driver+17+for+SQL+Server
Connection string syntax for Windows authentication:
mssql+pyodbc://user@server:port/DatabaseName?trusted_connection=yes&driver=DriverName
We need to set trusted_connection = yes if we are using windows authentication rather than the SQL Server authentication. Assuming that we are using SQL Server 2017 with ODBC driver 2017, we can use a connection string as below:
%sql mssql+pyodbc://admin@localhost:1433/TestDB?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server
When we execute the above command, it should return a message like “‘Connected: admin@TestDB‘”. Once, we get that message, we can execute the SQL queries from the Jupyter notebook directly.
Write queries prefixed with magic command %sql and execute it:
Now that we have loaded the SQL module and set the connection string, next, we can write our SQL queries and execute it like below:
%sql SELECT * FROM SYS.OBJECTS
The above command will return the list of all the objects from sys.objects system view.
Writing multiline SQL queries:
To write multiline SQL queries as we do with SQL Server Management Studio IDE, we can simply prefix our query with %%sql. Below is an example:
%%sql DROP TABLE IF EXISTS [dbo].[tbl_Sales]; CREATE TABLE [dbo].[tbl_Sales] ( [Seq] [bigint] NULL, [EmpName] [nvarchar](256) NULL, [Product] [nvarchar](256) NULL, [Amount] DECIMAL(18,2) NULL ); INSERT INTO [dbo].[tbl_Sales] ([Seq], [EmpName], [Product], [Amount]) VALUES (1, N'John', N'Cleaners', 62704.1), (2, N'John', N'Cleaners', 212.9), (3, N'Adam', N'Chains', 21814.2), (4, N'Adam', N'Chains', 33044.65), (5, N'Michal', N'Caps', 27582.42), (6, N'Michal', N'Caps', 40892.41), (7, N'Smith', N'Brakes', 44787.54), (8, N'Smith', N'Brakes', 12692.64), (9, N'David', N'Bottom Brackets', 49318.12), (10, N'David', N'Bottom Brackets', 74022.22); SELECT * FROM [dbo].[tbl_Sales];
Below is the output of the above SQL queries:
The %%sql magic command allows us to write the SQL query in multiple lines. This is very handy if we are working with a complex data analysis task.
Thanks for the reading. Please share your inputs in the comment section.