In this 5 part series I will be talking about the Synapse Minimal Test Framework. The why, how and whatnot:
- Part 1: Let’s get testing – Introduction, considerations, and approach
- Part 2: Simple pipeline test – Testing a simple unzip pipeline using Storage
- Part 3: Advanced pipeline test – Testing a complex pipeline using SQL
- Part 4: CI with Github – Automate testing using Github Actions
- Part 5: CI with Azure DevOps – Automate testing using Azure DevOps Pipelines
Want to dive straight into the code?
Introduction
Coming from a development background I’ve noticed how immature the data world can be with regards to GitOps. Take Azure’s data offerings like ADF or Synapse, there are no testing capabilities and while tools like ADF are here for years already, not much can be found in the community. Which is a shame because the API’s and tools are available, all the building blocks seem to be in place.
So let’s take things to the next level and build a testing framework for Azure Synapse.
Considerations
A good testing framework should fit quality and practice requirements like:
- Keep it simple – I’ve seen too many frameworks getting too bloated and hard to manage. It should be easy to include the framework in your own solution and extend it or tailor it to the necessary result.
- Easy to operate – It’s important that it will be easy for data engineers to write and manage tests. Also running tests should be made easy. It will be beneficial to have a DevOps team where more code-focussed engineers could help getting everyone started with tests in code.
- CI – Running tests automatically and the ability to set them up as a quality gate is necessary for the success of GitOps (Shift-Left).
Approach
For data engineers a test framework in Python would make sense because, most likely, that’s the language they will prefer. I did choose for .NET with MSTest however. .NET allows us to use Managed Identities out of the box, so that would not only be a very secure option, but it will also be very easy to get started. It requires some understanding of C#, but it outweighs the easy integration that we’re getting back. It will be easy to implement with Azure DevOps and GitHub as well.
For this framework we need to start and manage Synapse Pipelines so we will be using the Synapse client library for .NET. Then, to prepare the tests and read results we need to connect to the Azure Blob Storage as well, considering we’re doing file-based processing. For this we can use the Storage client libraries. And we will also be connecting to the Serverless SQL (built-in) Pool, which we can do using regular SQL client libraries.
For authentication we’re using the Azure Identity client library which allows us to authenticate as the current logged in user, or in the case of CI as the identity of the service connection. So make sure the AzureAD user or service principal has the corresponding permissions.
Next
In part 2 we will demonstrate how a simple pipeline test works using storage and the Synapse connector.
Leave a Reply