Running SQL queries in Azure Data Factory

4006

SQL is the backbone information science/technology. From a transactional database to data warehouse systems to modern big data analytics, none can escape SQL. Hence, even modern big data ecosystems provide SQL equivalents like Hive QL or Spark SQL. Thus , every ETL/ELT tool is bound to have provisions to run SQL statement.

In the SSIS, the Microsoft BI ETL offering, we had a task called as the execute SQL task. For more details on SQL options in SSIS, read this article: Introduction to SSIS and making it metadata independent

However, with the advancement in cloud technologies, the focus is shifting to Azure Data Factory from the traditional SQL server integration services (SSIS). Hence, we have to find an equivalent to SSIS Execute SQL task in ADF. Fortunately, there are two options for the same viz. ‘Execute SQL procedure’ and the ‘lookup activity’

Stored Procedure activity in Azure Data Factory

Stored procedures are an integral part of any ETL process. We need to run them for a variety of reasons; one of them being transforming and loading the data from the staging table to the dimension/fact table. To elaborate, in modern data warehousing systems, we extract data from the source, load it to staging. Furthermore, we perform operations like merge to the main table in order to maintain the slowly changing dimensions. These merge statements are executed as part of a Stored Procedure.

In Azure Data Factory, we extract data from source systems and load it into staging tables using Copy Data activity. Furthermore, we run a stored procedure to perform SCD operations using a Stored Procedure activity.

Here is the sample implementation of Stored procedure activity which is pretty simple!

Lookup activity in Azure Data Factory

This is similar to the Lookup activity of SSIS. A typical scenario in which this can be used is when you would like to have a result set to be passed on to the next activity in the Azure Data Factory. For instance, we may want to read a list of tables on which an operation needs to be performed. This list of tables lies in a metadata table. Using lookup activity, we can run a query on the metadata table and return the resultset to a for each activity. On a side note, to know more about for each activity, read this: Move Files with Azure Data Factory- End to End

Nonetheless, here is a sample implementation of a Lookup activity in the Azure Data Factory. Please note that Lookup activity needs a dataset to be created from a table or a view. To read more about lookup activity, refer to this link.

Conclusion

Hope that this article helps. This is only for information purposes and we claim no guarantees whatsoever.



I am a Data Scientist with 6+ years of experience.


Leave a Reply