This content originally appeared on Level Up Coding - Medium and was authored by Aashirwad Kashyap
Cutting Costs or Sacrificing Flexibility?
Amazon Prime Video recently shifted its live video monitoring service from a micro-service to a monolith, resulting in a 90% cost reduction.
Let’s take a deep dive into the architecture and try to understand the motivations behind the move beyond just saving cost, and determine if it’s really a micro-service or macro-service architecture.
Commonly used Terminology:
VMS (video monitoring service): This tool allows amazon to automatically identify perceptual quality issues (for example, block corruption or audio/video sync problems) and trigger a process to fix them
MCS(media convertor service): This converts input audio/video streams to frames or decrypted audio buffers that are sent to detectors
Defect detectors: This execute algorithms that analyze frames and audio buffers in real-time looking for defects (such as video freeze, block corruption, or audio/video synchronization problems) and send real-time notifications whenever a defect is found
Initial Distributed Server-less Architecture
The above digram is the initial server-less architecture for live video monitoring service built using AWS step functions with AWS Lambda playing the role of an orchestrator.
Let’s break it down step by step for better understanding. (fyi: some hypothesis are considered due to lack of info/clarity provided in the original article)
- The first step involves the client application uploading an audio or video stream to a media conversion service and triggering a step function to start conversion synchronously.
- To achieve this, a tool must be running on the consumer side to decode incoming encrypted audio/video streams (assumption as it’s not mentioned in the blog) before sending them to the media conversion service.
In the blog it’s not mentioned if the client side tool sends the decoded streams to media service at regular intervals or once during a live stream for now we’ll assume it sends partial data at regular intervals
- Once the conversion is completed MCS stores audio/video streams in s3 bucket and triggers a call to defect detectors service which runs in parallel and aggregated results are stores again in S3 bucket and also pushed in SNS topic for targeted consumers to take action accordingly.
- The entire system is built using a server-less architecture (AWS Step Functions and Lambda) and a micro-services architecture (MCS and defect detectors), where the primary cost stems from orchestration workflow and data transfer between distributed components. Not to mention, there are associated scaling bottlenecks for hot live streams with the current design
The main scaling bottleneck in the architecture was the orchestration management that was implemented using AWS Step Functions. The service performed multiple state transitions for every second of the stream, so we quickly reached account limits. Besides that, AWS Step Functions charges users per state transition.
The second cost problem was due the high number of Tier-1 calls to the S3 bucket by MCS and Defect detectors.
Monolith Architecture
In the updated architecture, the majority of the components remain the same (MCS, Detectors, Orchestrator), the only significant change is the components have been consolidated into a single ECS instance. But what does this mean, and how does it impact the system? Let’s break it down.
- Orchestration which was previously costly and used AWS step functions and Lambda, has been replaced with a new orchestration layer. This allows for better control of the components within a single instance, resulting in significant cost savings.
- The media converter (MCS) was previously running as a micro-service but has now been moved to a single ECS instance. This change allows for the conversion and storage of data locally in an active heap, resulting in faster processing and improved performance.
- Finally, the detector, which uses ML models to detect defects in audio/video streams, has also been moved to a single ECS instance. Although this means losing the ability of horizontal scalability, the cost savings make it a worthwhile tradeoff. However, it is worth noting that the detector may break under high load, so there are still some potential challenges to overcome
Overall the updated architecture have pros and cons.
- Pros: cost cutting as extra S3 bucket read write tier-1 calls is not required anymore as well as AWS Step functions and Lambda costly operation for orchestrations has been replaced with individual component.
- Cons: Losing horizontal scalability (for e.g Detectors also running in single ECS instance). Losing Flexibility as in future there can be multiple consumers for the audio/video streams which currently is tightly coupled to a detector.
Monolith + Macro service Architecture
The previous monolithic architecture had a major issue of losing horizontal scalability, particularly for detectors, within a single ECS instance. But the new macro + monolith design has tackled this problem. Let’s dive in deeper to see how.
- Detectors can only be scaled vertically in a single ECS instance to overcome this monolith service instance will be duplicated with parameterised subset of detectors in different ECS clusters and a light weight orchestrator layer using AWS lambda for request distribution.
- In the above design still we are losing some flexibility. Currently there is only one consumer of audio/video stream the detector what in future multiple consumers arrives which may require change in design approach.
Final Thoughts
In the world of software development, there’s no one-size-fits-all solution or design supremacy between micro-services and monolithic architecture. In the case of Prime Video’s move to a monolithic design, it resulted in a significant cost reduction of up to 90% and converged well with their use case.
However, it’s important to note that if multiple consumers for audio/video streams arise or the MCS service itself can’t be kept inside a single ECS instance due to the buffer size, then the current design may need to be re-evaluated. As software engineers, it’s important to continually assess and adjust our designs to meet changing requirements and optimise for cost and performance.
References
- Amazon prime video blog (Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% — Prime Video Tech)
Related Articles
- Computing Live stream viewers count in real time at High Scale !! | by Aashirwad Kashyap | Glance (medium.com)
- Deploy Node.js application over Google Cloud with CI/CD | by Aashirwad Kashyap | Bits and Pieces (medium.com)
- Building Resilient Data Pipelines with deep dive into AirFlow architecture | by Aashirwad Kashyap | Geek Culture | Medium
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 💰 Free coding interview course ⇒ View Course
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job
The Pros and Cons of Prime Video’s Move to Monolithic Architecture: Is It Worth the Risk? was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Aashirwad Kashyap
Aashirwad Kashyap | Sciencx (2023-05-08T03:24:55+00:00) The Pros and Cons of Prime Video’s Move to Monolithic Architecture: Is It Worth the Risk?. Retrieved from https://www.scien.cx/2023/05/08/the-pros-and-cons-of-prime-videos-move-to-monolithic-architecture-is-it-worth-the-risk/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.