Category: Big Data
Why Process management is the need of the day
Prasad KulkarniJun 18, 2020
Every problem brings an opportunity with it. The current times of COVID-19 pandemic are no exception to this. However, as far as organizations are concerned, it is a...
Azure Data Lake Gen2 and Azure Databricks
Prasad KulkarniJun 13, 2020
Before Azure Data Lake Gen2 and Azure Databricks, In our previous articles, we elaborated about two aspects of Azure Data Lake Gen2 migration i.e. governance and...
Managing Azure Data Lake Gen2 with Powershell
Prasad KulkarniMay 28, 2020
In the fast-moving world of data and technology in general, addressing tech debts is an integral part of any organization. It is important not only to stay ahead in the...
Cumulative Distribution in Azure Databricks using Spark SQL
Prasad KulkarniMay 24, 2020
We can solve every problem in multiple ways. In our previous article, we motivated the need to fit cumulative distributions. Moreover, we demonstrated the same in Azure...
Cumulative Distribution in Azure Databricks
Prasad KulkarniMay 03, 2020
Imagine that you receive a requirement to calculate the aggregations like average on a range of percentiles and quartiles, for a given dataset. There are two ways to...
Azure Data Lake Gen2 Managed Identity using Access Control Lists
Prasad KulkarniApr 19, 2020
Firstly, we urge you to read this article of ours: Managed Identity between Azure Data Factory and Azure storage. In that article, we have extensively elaborated on...
Challenges in Modern Data Processing
Prasad KulkarniApr 06, 2020
Having spent 6 years in the space of Data Analytics, I have come across a few challenges that might hamper an organization’s efforts to mature as a Data-Driven...
Databricks Koalas: bridge between pandas and spark
Prasad KulkarniMar 22, 2020
Imagine that you are an ML engineer. You have a massive task of operationalizing a model trained and tested by your Data Scientists. It is working perfectly well for the...
How to read mismatched schema in apache spark
Prasad KulkarniDec 20, 2019
In today’s world, python dominates the data analysis space, while apache-spark rules the big data paradigm. The former contains a plethora of libraries like pandas...
A first look at Azure Synapse Analytics
Prasad KulkarniDec 10, 2019
Strongly recommended reading: Azure Synapse Analytics: Azure SQL Data Warehouse revamped In the aforementioned article, we gave an introduction to Azure Synapse...