AWS yesterday unveiled a host of enhancements for Amazon SageMaker, its end-to-end machine learning offering. Among the most important features are a collection of new governance tools aimed at keeping ML projects on track, but there are many other new features designed to make getting AI applications into production easier.
As machine learning and the use of AI become more widespread, companies are realizing they need better tools and processes to manage new predictive capabilities, in order to prevent poor outcomes related to biases, breaches of ethics and breaches of privacy.
AWS has addressed some of these concerns with three new SageMaker tools, including Role Manager, Model Cards, and Model Dashboard, which the cloud giant unveiled yesterday at its re:Invent conference in Las Vegas, Nevada.
Amazon SageMaker Role Manager is intended to provide more granular control over who has access to SageMaker resources, including machine learning models as well as the data used to train them. According to Amazon SageMaker Managing Director Ankur Mehrotra, Role Manager gives administrators the ability to onboard new users into SageMaker with just the right level of access,
“They want to make sure users have access to the tools they need, but they don’t want permission to be too permissive,” Mehrotra says. DataName. “They also want to reduce exposures.”
Guided prompts and predefined policies can help administrators quickly set up new users in SageMaker with the right level of access, including the ability to access encrypted data and any network restrictions that may be required.
Just a few years ago, SageMaker was primarily used by data scientists. But as ML and AI spread, more and more stakeholders are brought into the mix, which complicates governance, says Mehrotra. “Visibility and controls over how these models are verified or tools are governed are becoming increasingly difficult,” he says.
As more ML and AI applications enter production, tracking them also becomes more difficult. To that end, Amazon SageMaker model maps are designed to help data scientists and others keep track of how model training went, model behavior, when problems arose, and changes made in response.
“As part of the training, there are all sorts of things in terms of hyperparameters and other things that need to be observed,” says Mehrotra. “And recording these things is important because sometimes they may be needed for approvals. Suppose you have made a POC and you want to approve it for use in production. So the right stakeholders might want to see this information .
Today, much of this information about ML model behavior is tracked ad hoc using email and spreadsheets. The new Model Cards offering is designed to provide a “single source of truth” for ML model information. Data scientists can enter their observation into the template maps, and it can also automatically populate some information, Mehrotra says.
‘These [Model Cards] can be viewed at any time and can be used to refer to the history of models and the decisions they make,” he says.
Monitoring multiple ML models in production is the goal of Amazon SageMaker Model Dashboards, the third new governance tool launched this week. The company already offers model monitoring capability with SageMaker Clarify and SageMaker Model Monitor.
If users aren’t using either of these two tools — which AWS recommends they use as a best practice, Mehrotra says — then template dashboards can provide users with performance data. The model dashboard also provides model lineage and performance history, which can be useful for tracking models over the long term.
AWS has tens of thousands of customers using SageMaker, which makes more than 1 trillion predictions per month, Mehrotra says. As companies scale their use of SageMaker and AI from proof-of-concept (POC) stage to full production mode, they run into thorny issues of bias, fairness, and ethics .
“A lot of these problems are really difficult, and we will continue to invest in making sure our customers can implement ML safely and responsibly,” Mehrotra says.
But wait, that’s not all! AWS unveiled a host of other SageMaker enhancements at re:Invent.
It launched next-generation SageMaker notebooks, in which AWS augments its Juypter-based notebook environment with built-in data preparation tools to improve data quality. Multiple users can also access the same notebook, eliminating the need to manually share code, thus enhancing collaboration. Learn more here.
AWS also offers SageMaker users an “easy button” for deployment. Instead of dealing with dependencies, users can press a single button and their SageMaker model will automatically be deployed to an EC2 instance of their choice. Behind the scenes, SageMaker bundles the model into a Docker container, with all dependencies automatically taken care of.
“To move from the world of laptops to tasks running in full-scale production today requires multiple steps…and it can be a laborious process,” says Mehrotra. “So we’re launching a new feature that, with just a few clicks, lets you automatically convert a laptop into a job that can be run in full-scale production.”
A new “parallel test” feature allows users to see how changes to a model will work in production, but without actually deploying the model to the production environment. “Ghost testing helps you build confidence in your model and catch potential misconfigurations and performance issues before they affect end users,” AWS’s Antje Barth writes in a blog post.
AWS launched SageMaker Data Wrangler two years ago to help users clean and prepare data for machine learning purposes. However, AWS users found that the same data preparation steps needed to be performed to get the right answer when inferring. To address this issue, AWS announced this week that Data Wrangler is now available as a “real-time inference endpoint” so customers can get consistent predictions when inferring. It can work in batch mode and in real time, according to Donnie Prakoso’s blog.
Finally, AWS is also introducing support for geospatial data in SageMaker. AWS provides pre-trained deep neural network (DNN) models and geospatial operators that make it easy to access and prepare large geospatial datasets, AWS’s Channy Yun writes in a blog post.
AWS seeks to end ETL
AWS releases the DataZone
AWS Strengthens SageMaker with Data Prep, Feature Store, and Pipelines
#SageMaker #bolstered #controls #governance