Handling exceptions: Rollback pandas dataframe’s to_sql operation

Pandas is one of the most popular Python libraries that is used for data manipulation and for data analysis. It provides very convenient and useful methods to analyze tabular data. One of Pandas dataframe’s essential functions is its to_sql method that allows seamless integration with various databases. However, it’s crucial to understand how to handle exceptions effectively in order to truly harness the power of this operation. While dealing with databases, exception handling becomes a very important and fundamental necessity. The unexpected errors can compromise the data integrity, and lead to costly business consequences. Error handling is essential for any data professional and need to be handled in an organized way. In this blog post, we delve into the details of handling exceptions and implementing a rollback strategy for Pandas DataFrame’s to_sql operation.

Insert data into SQL table with explicit transaction using to_sql() method of datarame
Insert data into SQL table with an explicit transaction using to_sql() method of datarame

Safeguard data integrity using explicit transactions with pandas dataframe’s to_sql operation

The to_sql() method is used to write the dataframe’s data to a relational database such as MySQL, PostgreSQL, SQLite, Microsoft SQL Server, and more. It provides multiple options to control the data transfer operation while moving data from a pandas dataframe to an SQL table. However, sometimes, we need to manage the data transfer operations explicitly. For example, suppose, we want to insert data from multiple dataframes as one transaction. That means, either all rows of data from each DataFrame are inserted or no rows from any DataFrame are inserted. To achieve this, we need to take explicit control over database transactions while performing the insert operation using to_sql() method. To demonstrate the to_sql rollback operation, let’s create a dummy table named user_table with this schema and demonstrate this.

Create Table: CREATE TABLE user_table (
	name varchar(255) DEFAULT NULL,
	email varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Sample table schema
Sample table schema

Now that we have created the sample table, we can use below python code to insert data into the above table.

import pandas as pd
from sqlalchemy import create_engine

if __name__ == "__main__":
    # Create a sample dataframe
    df = pd.DataFrame({
        'Name': ['Adam', 'David'],
        'Email': ['adam@abc.com', 'david@abc.com']
    })

    df_err = pd.DataFrame({
        'Name': ['Adam', 'David'],
        'Email': ['adam@abc.com', 'david@abc.com' + "a" * 500 ] # Added 500 chars in email to force exception
    })

    # Start a Try/Except block to handle connection errors
    try:
        # Connect to the MySQL database using create_engine method
        engine = create_engine('mysql+pymysql://admin:password@12345@localhost/testdb')

        with engine.begin() as connection:
            try:
                table_name = 'user_table'

                # Start an explicit control over transaction
                tran = connection.begin()
                # Insert the DataFrame data into the MySQL table using explicit transaction in append mode
                df.to_sql(name=table_name, con=connection, if_exists='append', index=False)
                df_err.to_sql(name=table_name, con=connection, if_exists='append', index=False) # This dataframe insert will raise an exception

                # Commit the transaction explicitly if above insert statements work fine
                tran.commit()
                print("Transaction committed successfully.")
            except Exception as e:
                # Roll back the transaction if any error occurs
                tran.rollback()
                print("Transaction rolled back due to an error:", e)

    except Exception as e:
        print("Error connecting to the database:", e)

How the explicit transaction management works

The above code creates two sample pandas dataframes with some ample data. Then, it inserts the data into an SQL table. However, the data from the first dataframe gets inserted into the table successfully. But the data from the seconds dataframe gets failed to insert because of its length in the email field of the second record.

Sample dataframes with data
Sample dataframes with data

Let’s discuss the above code in detail.

Step 1 – Import Required Libraries:

import pandas as pd: Imports the Pandas library for data manipulation and analysis.
from sqlalchemy import create_engine: Imports the create_engine function from the SQLAlchemy library for creating a database connection.

Step 2 – DataFrame Creation:

Create two Pandas DataFrames (df and df_err) with some sample data for demonstration purposes. Each DataFrame contains two columns: ‘Name’ and ‘Email’. df_err is intentionally designed to cause an error due to its excessive email length.

Step 3 – Database Connection and Transaction Handling:

try block starts to handle potential connection errors.
engine = create_engine(‘mysql+pymysql://admin:password@localhost/testdb’): Creates a database connection using SQLAlchemy. Replace ‘admin’, ‘password’, and ‘localhost/testdb’ with your actual database credentials and connection details.
with engine.begin() as connection:: Establishes a connection to the database within a context. The connection is automatically closed when leaving the context without writing extra lines of code.

Step 4 – Transaction and DataFrame Insertion:

We have used table_name = ‘user_table’ to define the name of the target table in the database.
tran = connection.begin(): This is an important line. It starts an explicit control over the transaction. Any operations performed within this transaction can be committed or rolled back as a single unit of work.
df.to_sql(name=table_name, con=connection, if_exists=’append’, index=False): Inserts the data from the DataFrame df into the specified table using an append mode.
df_err.to_sql(name=table_name, con=connection, if_exists=’append’, index=False): Attempts to insert the data from the DataFrame df_err into the same table. This operation is designed to raise an exception due to an excessively long email value.

Step 5 – Transaction Handling and Exception Management:

Please note that, inside the try block, if both DataFrame insertions work without errors, the tran.commit() statement explicitly commits the transaction, making the changes permanent. However, If an error occurs during any insertion, the except block is executed.
In the except block, the tran.rollback() statement is used to undo any changes made within the transaction.
The specific error message is printed to the console using the print statement.

Step 6 – Exception Handling for Connection:

The final except block is utilized to manage scenarios where an issue arises during the establishment of the database connection, such as incorrect credentials. In this case, the except block takes care of the connection error and displays a suitable error message.

Conclusion:

In summary, this code demonstrates how to handle exceptions and implement a rollback strategy when inserting data from Pandas DataFrames into a MySQL database using the to_sql method. It safeguards data integrity by ensuring that if any error occurs during the data insertion process, the changes are rolled back, preventing partial or incorrect data insertion into the database.

Thanks for the reading. Please share your inputs in the comment section.

Rate This
[Total: 1 Average: 5]

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.