Azure Machine Learning - the AI suite

397

The hallmark of any good tool/platform is its flexibility. With AI models all around, and an explosion of tools to develop to build AI workflows, it gets overwhelming. Thus, it is always convenient to have all the tools we need in one place.

Moreover, AI application development has two methodologies viz. Pre-built AI and Custom AI. With the explosion of Generative AI, the former has gathered a lot of steam, while the latter still accounts for a majority of AI/ML workloads. Hence, a platform that can have pre-built models along with all the requisite infrastructure to build custom models is most desirable. This requirement has given rise to platforms like AWS Sagemaker, Vertex AI, Azure Machine Learning, etc.

Amongst this, Microsoft Azure Machine Learning is a key player, especially after the mainstreaming of Azure Open AI. Earlier, Azure Machine Learning was made only for Custom AI. Prebuilt AI was primarily offered by Azure AI services. While Azure AI is still mainstream, most of the models are now integrated in Azure Machine Learning, along with a host of other open source and proprietary models. Although we have explored quite a few aspects of Azure Machine Learning on this platform earlier, this article, will now look at it in a new light.

Back to the article, Azure Machine Learning now has two major sections: Model as a Service i.e. Model Catalog and Custom Authoring:

Prebuilt AI: Model as a Service –  Model Catalog

Let’s first explore the pre built AI using Model as a Service, i.e. Model Catalog in the Azure Machine Learning.  A simple glance at the pane will give you glimpse of expanse of model categories available to a user.

If your subscription has enough authorization, you can use any of the models. For an example, we will deploy a t5 model and show how to use it. Search for t-5 base and click on deploy.

Once the deployment is completed, go to the endpoint and then to the Consume section.

Take the code, copy the key and paste it in the placeholder:

import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):

    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

data = {
  "input_data": [
    "Hello. I am Prasad. I live in Bengaluru."
  ]
}

body = str.encode(json.dumps(data))

url = 'https://t5-epnt.eastus2.inference.ml.azure.com/score'

# Replace this with the primary/secondary key, AMLToken, or Microsoft Entra ID token for the endpoint
api_key = '<Api-Key>'

if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")

headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)
    result = response.read()
    print(result)

except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the request ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

Running this script will give you this result: b'[“Hallo, ich bin Prasad und lebe in Bengaluru.”]’ 

In a similar fashion, any of the models mentioned in the catalog, including Open AI models can be deployed with a few clicks and can be consumed with little effort.

Custom AI: Authoring

As aforementioned, majority of AI/ML workloads are still driven by Custom Model training and deployment. Azure Machine Learning gives you flexibility to train and deploy Machine Learning Models end to end. In our article on Machine Learning System Design we touched upon two main workflows. Accordingly, Machine Learning systems have two major workflows viz. Training and Serving. They are further divided into 6 workflows.

The first three comprise the training end of the ML Systems:

  • Data Management
  • Orchestrated Experimentation
  • Reliable and Repeatable Training

The next three are pertaining to deployment, serving and monitoring.

  • Continuous Deployment
  • Reliable and Scalable Serving
  • Continuous Monitoring

Azure Machine Learning can build each of these workflows. In this article, we will cover a simple example using Azure ML SDK v2. Earlier, we have used Azure ML SDK v1 to demonstrate Training Pipeline, Data Drift Monitoring, Experimentation, etc. All this can be managed via. the authoring pane in the Azure Machine Learning.

Using these authoring tools, we will train and deploy a model in a quickfire model using SDK v2.

Demo Time

We start the demo by creating a compute instance. A compute instance is the dev environment which has tools like Jupyter notebooks. The instances can be connected and run in Visual Studio Code.

In this demo, we use VS Code (Desktop) option. Clicking on the same will open up a VS Code window in your machine. This is how it looks like:

Create a new notebook and run the following cells:

Create Credentials and Compute Cluster

First, let’s create a credential object:

# Handle to the workspace
from azure.ai.ml import MLClient

# Authentication package
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()

Next, create a ml client object:

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="<your-subscription-id>",
    resource_group_name="ai-experiments",
    workspace_name="aml-ai-project",
)

Now, in order to run the training script, you need a compute cluster. You can run the training on compute instance. However, for faster training, compute cluster is recommended:

from azure.ai.ml.entities import AmlCompute
# Name assigned to the compute cluster

cpu_compute_target = "aml-compute-cluster"

try:
    # let's see if the compute target already exists

    cpu_cluster = ml_client.compute.get(cpu_compute_target)
    print(
        f"You already have a cluster named {cpu_compute_target}, we'll reuse it as is."
    )

except Exception:
    print("Creating a new cpu compute target...")

    # Let's create the Azure Machine Learning compute object with the intended parameters
    cpu_cluster = AmlCompute(
        name=cpu_compute_target,
        # Azure Machine Learning Compute is the on-demand VM service
        type="amlcompute",
        # VM Family
        size="STANDARD_DS11_V2",

        # Minimum running nodes when there is no job running
        min_instances=0,
        # Nodes in cluster
        max_instances=1,

        # How many seconds will the node running after the job termination

        idle_time_before_scale_down=180,
        # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination
        tier="Dedicated",
    )

    print(
        f"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}"
    )

    # Now, we pass the object to MLClient's create_or_update method
    cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

Go to the portal and check the compute cluster:

Create an Environment

Now, create an environment which packages all the dependencies required to run the training and deployment. First create a directory for dependency file.

import os

dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)

Write the file containing the dependencies:

%%writefile {dependencies_dir}/conda.yaml
name: model-env
channels:
  - conda-forge
dependencies:
  - python=3.8
  - numpy=1.21.2
  - pip=21.2.4
  - scikit-learn=0.24.2
  - scipy=1.7.1
  - pandas>=1.1,<1.2
  - pip:
    - inference-schema[numpy-support]==1.3.0
    - xlrd==2.0.1
    - mlflow== 2.6.0
    - azureml-mlflow==1.42.0
    - psutil>=5.8,<5.9
    - tqdm>=4.59,<4.60
    - ipykernel~=6.0
    - matplotlib

Next, create an environment:

from azure.ai.ml.entities import Environment

custom_env_name = "credit-card-scikit-38"

custom_job_env = Environment(
    name=custom_env_name,
    description="Custom environment for Credit Card Defaults job",
    tags={"scikit-learn": "0.24.2"},
    conda_file=os.path.join(dependencies_dir, "conda.yaml"),
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

custom_job_env = ml_client.environments.create_or_update(custom_job_env)
print(
    f"Environment with name {custom_job_env.name} is registered to workspace, the environment version is {custom_job_env.version}"
)

Check the portal under the environments section.

Create a training job

We now create a training job. Create a directory for the same.

import os

train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)

Write the training script:

%%writefile {train_src_dir}/main.py

import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

def main():
    """Main function of the script."""

    # input and output arguments
    parser = argparse.ArgumentParser()
    parser.add_argument("--data", type=str, help="path to input data")
    parser.add_argument("--test_train_ratio", type=float, required=False, default=0.25)
    parser.add_argument("--n_estimators", required=False, default=100, type=int)
    parser.add_argument("--learning_rate", required=False, default=0.1, type=float)
    parser.add_argument("--registered_model_name", type=str, help="model name")

    args = parser.parse_args()   

    # start Logging
    mlflow.start_run()

    # enable autologging
    mlflow.sklearn.autolog()

    ###################
    #<prepare the data>
    ###################

    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
    print("input data:", args.data)   

    credit_df = pd.read_csv(args.data, header=1, index_col=0)
    mlflow.log_metric("num_samples", credit_df.shape[0])
    mlflow.log_metric("num_features", credit_df.shape[1] - 1)

    train_df, test_df = train_test_split(
        credit_df,
        test_size=args.test_train_ratio,
    )

    ###################
    #</prepare the data>
    ###################

    ##################
    #<train the model>
    ##################

    # extracting the label column
    y_train = train_df.pop("default payment next month")

    # convert the dataframe values to array
    X_train = train_df.values

    # extracting the label column
    y_test = test_df.pop("default payment next month")

    # convert the dataframe values to array
    X_test = test_df.values

    print(f"Training with data of shape {X_train.shape}")

    clf = GradientBoostingClassifier(
        n_estimators=args.n_estimators, learning_rate=args.learning_rate
    )

    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(classification_report(y_test, y_pred))

    ##################
    #</train the model>
    ##################

    ##########################
    #<save and register model>
    ##########################

    # registering the model to the workspace
    print("Registering the model via MLFlow")

    mlflow.sklearn.log_model(
        sk_model=clf,
        registered_model_name=args.registered_model_name,
        artifact_path=args.registered_model_name,
    )

    # saving the model to a file
    mlflow.sklearn.save_model(
        sk_model=clf,
        path=os.path.join(args.registered_model_name, "trained_model"),
    )
 

    # stop Logging
    mlflow.end_run()

if __name__ == "__main__":
    main()

Create the training job.

from azure.ai.ml import command
from azure.ai.ml import Input

registered_model_name = "credit_defaults_model"

job = command(
    inputs=dict(
        data=Input(
            type="uri_file",
            #path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default%20of%20credit%20card%20clients.csv",
            path="azureml:credit-card-default:1",
        ),

        test_train_ratio=0.2,
        learning_rate=0.25,
        registered_model_name=registered_model_name,
    ),

    code="./src/",  # location of source code
    command="python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}",
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    experiment_name="train_model_credit_default_prediction_cluster",
    compute=cpu_compute_target,
    display_name="credit_default_prediction",
)
ml_client.create_or_update(job)

Go to the portal and check under the Jobs. Check the metrics.

Here is the registered model.

Create an Online Endpoint and Deployment

Now, since the model is registered, we now create an online endpoint and the deployment. Managed online endpoints are a special feature of Azure Machine Learning, where ML endpoints’ infrastructure are managed by Azure. For more details about Azure Machine Learning, here is our old article Managed Online Endpoints in Azure Machine Learning.

Anyway, here is the online endpoint creation script:

import uuid

# Creating a unique name for the endpoint
online_endpoint_name = "credit-endpoint-" + str(uuid.uuid4())[:8]

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
)

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is an online endpoint",
    auth_mode="key",
    tags={
        "training_dataset": "credit_defaults",
        "model_type": "sklearn.GradientBoostingClassifier",
    },
)

endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()
print(f"Endpoint {endpoint.name} provisioning state: {endpoint.provisioning_state}")

endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
print(
   f'Endpoint "{endpoint.name}" with provisioning state "{endpoint.provisioning_state}" is retrieved'
)

Next, let’s create a deployment. For that, first retrieve the registered model:

# Let's pick the latest version of the model
registered_model_name = "credit_defaults_model"
latest_model_version = max(
    [int(m.version) for m in ml_client.models.list(name=registered_model_name)]
)

Next, let’s create a deployment:

# picking the model to deploy. Here we use the latest version of our registered model
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)

# create an online deployment.
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    instance_type="Standard_E2s_v3",
    instance_count=1,
)
blue_deployment = ml_client.begin_create_or_update(blue_deployment).result()

Notice, that the earlier t5 endpoint is actually a managed online endpoint.

Test the Endpoint

Once the deployment is complete, test the endpoint. Create a folder to request json.

deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)

Create a request json.

%%writefile {deploy_dir}/sample-request.json

{
  "input_data": {
    "columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
    "index": [0, 1],
    "data": [
            [20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
            [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8]
        ]
  }
}

Lastly, test the endpoint:

# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    request_file="./deploy/sample-request.json",
    deployment_name="blue",
)

Lastly, delete the endpoint to save the cost.

ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

Conclusion – Azure ML is the complete AI suite?

We have seen that Azure Machine Learning has both capabilities i.e. pre built AI and custom AI, making it a complete AI suite. We hope this article is useful. However, we do not guarantee completeness or accuracy.

Featured Image Credit.



I am a Data Scientist with 6+ years of experience.


Leave a Reply