Package and deploy models faster with new tools and guided workflows in Amazon SageMaker | Amazon Web Services

ByContributor November 29, 2023

I’m happy to share that Amazon SageMaker now comes with an improved model deployment experience to help you deploy traditional machine learning (ML) models and foundation models (FMs) faster.

As a data scientist or ML practitioner, you can now use the new ModelBuilder class in the SageMaker Python SDK to package models, perform local inference to validate runtime errors, and deploy to SageMaker from your local IDE or SageMaker Studio notebooks.

In SageMaker Studio, new interactive model deployment workflows give you step-by-step guidance on which instance type to choose to find the most optimal endpoint configuration. SageMaker Studio also provides additional interfaces to add models, test inference, and enable auto scaling policies on the deployed endpoints.

New tools in SageMaker Python SDK
The SageMaker Python SDK has been updated with new tools, including ModelBuilder and SchemaBuilder classes that unify the experience of converting models into SageMaker deployable models across ML frameworks and model servers. Model builder automates the model deployment by selecting a compatible SageMaker container and capturing dependencies from your development environment. Schema builder helps to manage serialization and deserialization tasks of model inputs and outputs. You can use the tools to deploy the model in your local development environment to experiment with it, fix any runtime errors, and when ready, transition from local testing to deploy the model on SageMaker with a single line of code.

Amazon SageMaker ModelBuilder

Let me show you how this works. In the following example, I choose the Falcon-7B model from the Hugging Face model hub. I first deploy the model locally, run a sample inference, perform local benchmarking to find the optimal configuration, and finally deploy the model with the suggested configuration to SageMaker.

First, import the updated SageMaker Python SDK and define a sample model input and output that matches the prompt format for the selected model.

import sagemaker
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve import Mode

prompt = "Falcons are"
response = "Falcons are small to medium-sized birds of prey related to hawks and eagles."

sample_input = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 32}
}

sample_output = [{"generated_text": response}]

Then, create a ModelBuilder instance with the Hugging Face model ID, a SchemaBuilder instance with the sample model input and output, define a local model path, and set the mode to LOCAL_CONTAINER to deploy the model locally. The schema builder generates the required functions for serializing and deserializing the model inputs and outputs.

model_builder = ModelBuilder(
    model="tiiuae/falcon-7b",
    schema_builder=SchemaBuilder(sample_input, sample_output),
    model_path="/path/to/falcon-7b",
    mode=Mode.LOCAL_CONTAINER,
	env_vars={"HF_TRUST_REMOTE_CODE": "True"}
)

Next, call build() to convert the PyTorch model into a SageMaker deployable model. The build function generates the required artifacts for the model server, including the inferency.py and serving.properties files.

local_mode_model = model_builder.build()

For FMs, such as Falcon, you can optionally run tune() in local container mode that performs local benchmarking to find the optimal model serving configuration. This includes the tensor parallel degree that specifies the number of GPUs to use if your environment has multiple GPUs available. Once ready, call deploy() to deploy the model in your local development environment.

tuned_model = local_mode_model.tune()
tuned_model.deploy()

Let’s test the model.

updated_sample_input = model_builder.schema_builder.sample_input
print(updated_sample_input)

{'inputs': 'Falcons are',
 'parameters': {'max_new_tokens': 32}}
 
local_tuned_predictor.predict(updated_sample_input)[0]["generated_text"]

In my demo, the model returns the following response:

a type of bird that are known for their sharp talons and powerful beaks. They are also known for their ability to fly at high speeds […]

When you’re ready to deploy the model on SageMaker, call deploy() again, set the mode to SAGEMAKLER_ENDPOINT, and provide an AWS Identity and Access Management (IAM) role with appropriate permissions.

sm_predictor = tuned_model.deploy(
    mode=Mode.SAGEMAKER_ENDPOINT, 
	role="arn:aws:iam::012345678910:role/role_name"
)

This starts deploying your model on a SageMaker endpoint. Once the endpoint is ready, you can run predictions.

new_input = {'inputs': 'Eagles are','parameters': {'max_new_tokens': 32}}
sm_predictor.predict(new_input)[0]["generated_text"])

New SageMaker Studio model deployment experience
You can start the new interactive model deployment workflows by selecting one or more models to deploy from the models landing page or SageMaker JumpStart model details page or by creating a new endpoint from the endpoints details page.

Amazon SageMaker - New Model Deployment Experience

The new workflows help you quickly deploy the selected model(s) with minimal inputs. If you used SageMaker Inference Recommender to benchmark your model, the dropdown will show instance recommendations from that benchmarking.

Model deployment experience in SageMaker Studio

Without benchmarking your model, the dropdown will display prospective instances that SageMaker predicts could be a good fit based on its own heuristics. For some of the most popular SageMaker JumpStart models, you’ll see an AWS pretested optimal instance type. For other models, you’ll see generally recommended instance types. For example, if I select the Falcon 40B Instruct model in SageMaker JumpStart, I can see the recommended instance types.

Model deployment experience in SageMaker Studio

However, if I want to optimize the deployment for cost or performance to meet my specific use cases, I could open the Alternate configurations panel to view more options based on data from before benchmarking.

Model deployment experience in SageMaker Studio

Once deployed, you can test inference or manage auto scaling policies.

Model deployment experience in SageMaker Studio

Things to know
Here are a couple of important things to know:

Supported ML models and frameworks – At launch, the new SageMaker Python SDK tools support model deployment for XGBoost and PyTorch models. You can deploy FMs by specifying the Hugging Face model ID or SageMaker JumpStart model ID using the SageMaker LMI container or Hugging Face TGI-based container. You can also bring your own container (BYOC) or deploy models using the Triton model server in ONNX format.

Now available
The new set of tools is available today in all AWS Regions where Amazon SageMaker real-time inference is available. There is no cost to use the new set of tools; you pay only for any underlying SageMaker resources that get created.

Learn more

Get started
Explore the new SageMaker model deployment experience in the AWS Management Console today!

— Antje

Maximizing Your Forms with RSForm! Pro – The Ultimate Guide

Joomla plugins are vital tools that enhance the functionality of a website, with RSForm! Pro distinguished as a robust form-building solution. This overview aims to outline the key features, benefits, and straightforward installation process of RSForm! Pro. It will explore various customization options that enable users to tailor forms to their specific requirements, review common… […]

Boost Your Website Speed with Our Performance Optimization Plugin

In the rapidly evolving digital landscape, the performance of a Joomla website significantly influences user engagement. Performance optimization plugins serve as essential tools aimed at enhancing the speed and efficiency of a website by combining, minifying, and compressing assets such as CSS, JavaScript, and images. This article examines the advantages these plugins provide, ranging from… […]

How to Upgrade from Ubuntu 22.04 LTS to Ubuntu 24.04 LTS

The stable version of Ubuntu 24.04 LTS (code-named Noble Numbat) is released on April 25th 2024, if you are curious to know what is in it, you can now upgrade to the version of it… The post How to Upgrade from Ubuntu 22.04 LTS to Ubuntu 24.04 LTS appeared first on FAST DOMAINS.

16 Best Linux Distributions for Older Computers

Do you have an old laptop that has gathered layers of dust over time and you don’t exactly what to do with it? A good place to start would be to install a Linux distribution… The post 16 Best Linux Distributions for Older Computers appeared first on FAST DOMAINS.

Package and deploy models faster with new tools and guided workflows in Amazon SageMaker | Amazon Web Services

AWS Weekly Roundup – AWS Storage Day, AWS Israel (Tel Aviv) Region, and More – Aug 8, 2023 | Amazon Web Services

On setenv() and Explosions

Wayland by default in Fedora 25?

Episode 9: Taking a Deep Dive into WordPress for Small and Medium Business

The Symbiotic Orchestra of Linux and Blockchain

Linux Shell Scripting: A Pathway to Automated System Excellence

© Copyright

VMware ESXi Power Optimization Overview

WiredGorilla

Similar Posts

© Copyright