This content originally appeared on DEV Community and was authored by Salah Elhossiny
4 weeks ago, I was invited from Packt publishing team to review on the awesome books about AWS Sagemaker best practices as I am AWS ML hero.
Book chapters:
1. Sagemaker Overview
This chapter provides an overview of whole ML pipeline: preparing, building, training, tuning, deployment and monitoring. It also provides a data preparation capabilities, such as:
It also provides a feature tour about model building capabilities, such as:
It also provides an overview for training and tuning
capabilities, such as:
- SageMaker training jobs
- Autopilot
- Hyperparameter Optimization (HPO)
- SageMaker Debugger
- SageMaker Experiments
ML phases on Sagemaker:
For data preparation
For operations
- For model training
2. Data Science Environments
This chapter provides an overview of how to create managed data science environments to scale and create repeatable environments for your model-building activities using IaC or CaC .
It also provides an overview for Cloudformation importance and capabilities:
- Consistency
- Improved management
Reducing manual approvals
Reducing hand-offs in siloed teams
Providing centralized governance
By ensuring environments are provisioned across teams using only approved configurations.
AWS Service Catalog and its importance in such approach.
You will get a brief overview of the machine learning (ML) use case. ML use case mentioned is prediction of a value for a particular type of air quality measurement (for example, pm25) given location (weather station) and date, that's can be treated using regression XGBoost model.
3. Data Labeling with Amazon SageMaker Ground Truth
This chapter provides an review of labeling data using SageMaker Ground Truth and its challenges, such as:
Challenges with labeling data at scale
Addressing unique labeling requirements with custom labeling workflows
Using active learning to reduce labeling time
Security and permissions ( private workforces )
4. Data Preparation at Scale Using AWS SageMaker Data Wrangler and Processing
In this chapter, the following topics are covered:
- Visual data preparation with Data Wrangler
- Bias detection and explainability with Data Wrangler
- Data preparation at scale with SageMaker Processing
- Difference between Data Wrangler and Spark(in EMR) according to dataset size.
My personal feedback on this chapter is that it needs more clarification and examples.
5. Centralized Feature Repository with AWS SageMaker Feature Store
This chapter, we are going to cover the following main topics:
Basic concepts of SageMaker Feature Store.
Creating reusable features to reduce feature inconsistencies and inference latency.
Designing solutions for near real-time ML predictions.
6. Training and Tuning at Scale
This chapter covers the following topics:
- ML training at scale with SageMaker distributed libraries.
- Difference between data parallelism and model parallelism.
- Some considerations to be taken before choosing between data & model parallelism.
- Automated model tuning with SageMaker hyperparameter tuning.
Organizing and tracking training jobs with SageMaker Experiments.
Best practices to consider while configuring hyperparameter jobs on Amazon SageMaker.
7. Profile Training Jobs with Amazon SageMaker Debugger
This chapter covers the following main topics:
Amazon SageMaker Debugger essentials
Real-time monitoring of training jobs using built-in and custom rules
-
Gain insight into the training infrastructure and training framework by:
- Analyzing and visualizing the system and framework metrics generated by the profiler.
- Analyzing the profiler report generated by SageMaker Debugger.
8. Managing Models at Scale Using a Model Registry
A model registry is a central repository for metadata related to a certain model version. It contains details about how the model was created, how it performed, and where and how it was deployed.
Additional features, such as approval workflows and notifications, are frequently included in model registry services or solutions.
This chapter covers the following topics:
- Using a model registry
- Choosing a model registry solution
- Managing models using the Amazon SageMaker model registry
9. Updating Production Models Using SageMaker Endpoint Production Variants
This chapter covers the following main topics:
Basic concepts of Amazon SageMaker Endpoint Production Variants
Deployment strategies for updating ML models with Amazon SageMaker Endpoint Production Variants
Selecting an appropriate deployment strategy
10. Optimizing Model Hosting and Inference Costs
This chapter covers the following topics:
Real-time inference versus batch inference
Deploying multiple models behind a single inference endpoint
Scaling inference endpoints to meet inference traffic demands
Using Elastic Inference for deep learning models
Optimizing models with SageMaker Neo
11. Monitoring Production Models with Amazon SageMaker Model Monitor and Clarify
This chapter covers the following main topics:
Basic concepts of Amazon SageMaker Model Monitor and Amazon
SageMaker ClarifyEnd-to-end architectures for monitoring ML models
Best practices for monitoring ML models
12. Machine Learning Automated Workflows
This chapter covers the following topics:
Considerations for automating your SageMaker ML workflows
Building ML workflows with Amazon SageMaker Pipelines
Creating CI/CD ML pipelines using Amazon SageMaker projects
13. Well-Architected Machine Learning with Amazon SageMaker
This chapter covers the following main topics:
Best practices for operationalizing ML workloads
Best practices for securing ML workloads
Best practices for building reliable ML workloads
Best practices for building performant ML workloads
Best practices for building cost-optimized ML workloads
14. Managing SageMaker Features across Accounts
This chapter discusses the following topics as they relate to managing SageMaker features across multiple AWS accounts:
Examining an overview of the AWS multi-account environment
Understanding the benefits of using multiple AWS accounts with Amazon SageMaker
Examining multi-account considerations with Amazon SageMaker
Conclusion
I really enjoyed reading this article and my personal rate for it is 4.5 / 5, because it gives a very good overview for best practices for implementing ML on AWS Sagemaker.
Happy reading :)
This content originally appeared on DEV Community and was authored by Salah Elhossiny
Salah Elhossiny | Sciencx (2021-12-03T10:09:01+00:00) AWS Sagemaker Best Practices Packt Book Review. Retrieved from https://www.scien.cx/2021/12/03/aws-sagemaker-best-practices-packt-book-review/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.