Create pandas dataframe from MongoDB collection

In this post, we will learn how we can create pandas dataframe from MongoDB collection. MongoDB is a popular NoSQL database that stores data in a JSON-like format and offers a flexible and scalable solution for managing large volumes of data. When working with data stored in MongoDB, it is often necessary to analyze and manipulate it using powerful libraries like pandas. In the world of data analysis and manipulation, Python’s Pandas library is widely used for its powerful tools and capabilities.

Before we dive into the process, make sure you have the following prerequisites in place:

  1. Python installed on your machine.
  2. The pymongo library, which allows us to interact with MongoDB using Python.
  3. The pandas library, which provides data manipulation and analysis tools.

We will be using pymongo library in order to connect and access the MongoDB collections in Python. Also, we will be using Python version 3 in this demo. To install the pymongo and pandas modules, use the below pip command:

pip install pymongo
pip install pandas

The above command will install the pymongo and pandas libraries.

Create pandas DataFrame from a MongoDB Collection

Once we have pymongo and pandas libraries installed on the machine, we can use the below code to create a pandas dataframe from MongoDB collection.

import pymongo
import pandas as pd

client = pymongo.MongoClient('mongodb://localhost:27017/')
db = client['myMongoDb']
collection = db['myMongoCollection']

data = list(collection.find())

df = pd.DataFrame(data)
print(df.head())

client.close()

In the above code, we connected to a MongoDB instance hosted on the local machine. If you host your MongoDB instance remotely, replace ‘localhost:27017’ with the appropriate connection string. Once you establish the connection, select the database and collection from which you want to retrieve data. In this example, we have used ‘myMongoDb’ named database to connect and fetch the ‘myMongoCollection’ named collection.

If needed, we can also perform queries to filter the data based on specific criteria while fetching the data from MongoDB. Once, we have the collection in ‘data’ object, we have passed this to the pd.DataFrame method to create a dataframe from this collection.

In this blog post, we have explored the process of creating a Pandas DataFrame from MongoDB, that enables us to seamlessly combine the power of MongoDB’s flexible data storage and retrieval capabilities with the extensive data manipulation and analysis tools offered by Pandas. We covered establishing a connection to MongoDB, selecting the appropriate database and collection, retrieving data using pymongo, converting the retrieved data into a pandas DataFrame, and performing data manipulation and analysis using the pandas library. In the realm of data analysis and manipulation, the Python library pandas and the NoSQL database MongoDB are two powerful tools. Combining the functionalities of both can provide us with efficient data handling capabilities.

Thanks for the reading. Please share your inputs in the comment section.

Rate This
[Total: 1 Average: 5]

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.