Writing pandas data frames to an Azure SQL database either requires you to use a cursor or the SQL Alchemy library(particularly, we use the to_sql function). However, these methods are extremely slow. We have written about to_sql this in this article: Spark Dataframe performance benefits.
Not everyone has access to a Spark Cluster, though. So isn’t there a way to write pandas data frames efficiently to a SQL db? Seems like there is! It’s by using a cursor and fast_executemany .
To fully appreciate fast_executemany, let’s visit the code without the same. In this example, we will use the California housing dataset of scikit-learn. We will load it as it is into an Azure SQL DB from a python pandas dataframe.
0. Create a destination table in Azure SQL DB
Before writing code to load data into Azure SQL Database. Following is the structure of the table:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_california_housing]( [MedInc] [nvarchar](50) NULL, [HouseAge] [nvarchar](50) NULL, [AveRooms] [nvarchar](50) NULL, [AveBedrms] [nvarchar](50) NULL, [Population] [nvarchar](50) NULL, [AveOccup] [nvarchar](50) NULL, [Latitude] [nvarchar](50) NULL, [Longitude] [nvarchar](50) NULL, [target] [nvarchar](50) NULL ) ON [PRIMARY] GO
1. Load Data in a Pandas Dataframe
Load the California Housing Dataset into a Pandas Dataframe:
import pandas as pd from sklearn.datasets import fetch_california_housing california_housing = fetch_california_housing() pd_df_california_housing = pd.DataFrame(california_housing.data, columns = california_housing.feature_names) pd_df_california_housing['target'] = pd.Series(california_housing.target)
2. Create an ODBC connection using pyodbc
Programming languages form connections to SQL Databases using ODBC connection/standard. In python, this is implemented by pyodbc library. Here is the code to create an ODBC connection:
import pyodbc conn = pyodbc.connect(driver='{ODBC Driver 17 for SQL Server}', server='<mydatabaseserver.database.windows.net>', database='<mydatabase>', uid='<my database user>', pwd='<my database password>')
3. Load data into Azure SQL Database without fast execution
Next, here is the code to write the data to Azure SQL Database without fast execution.
import time insert_to_tmp_tbl_stmt = f"INSERT INTO tbl_california_housing \ #Form Insert Statement ([MedInc],[HouseAge],[AveRooms],[AveBedrms],[Population],[AveOccup],[Latitude],[Longitude],[target] ) \ values(?,?,?,?,?,?,?,?,?)" cursor = conn.cursor() #create a cursor start = time.time() #Note start time cursor.executemany(insert_to_tmp_tbl_stmt, pd_df_california_housing.values.tolist()) #load data into azure sql db end = time.time() #Note end time print(f'{len(pd_df_california_housing)} rows inserted in pd_df_california_housing table') print(f'{(end - start)/60} minutes elapsed') cursor.commit() #Close the cursor and connection cursor.close() conn.close()
The following message was displayed:
20640 rows inserted in pd_df_california_housing table 8.287326637903849 minutes elapsed
4. Load data into Azure SQL Database ‘with’ fast execution
Now, the fundamental difference in this snippet from the previous section is 1 line of code i.e fast_executemany :
import time insert_to_tmp_tbl_stmt = f"INSERT INTO tbl_california_housing \ #Form Insert Statement ([MedInc],[HouseAge],[AveRooms],[AveBedrms],[Population],[AveOccup],[Latitude],[Longitude],[target] ) \ values(?,?,?,?,?,?,?,?,?)" cursor = conn.cursor() #create a cursor cursor.fast_executemany = True #Set fast_executemany = True start = time.time() #Note start time cursor.executemany(insert_to_tmp_tbl_stmt, pd_df_california_housing.values.tolist()) #load data into azure sql db end = time.time() #Note end time print(f'{len(pd_df_california_housing)} rows inserted in pd_df_california_housing table') print(f'{(end - start)/60} minutes elapsed') cursor.commit() #Close the cursor and connection cursor.close() conn.close()
The following message was displayed:
20640 rows inserted in pd_df_california_housing table in 0.06369909445444742 minutes elapsed
Conclusion
This is not limited to Azure SQL DB. Hence, we encourage you to try other databases and techniques etc. Please note that this is for information. Finally, we don’t claim guarantees regarding its accuracy or completeness and encourage the readers to use this at their own discretion.