Editorial coverage, in-depth analysis, and developer guides — 2 articles.
In this post, we walk through building a scalable, event-driven transcription pipeline that automatically processes audio files uploaded to Amazon Simple Storage Service (Amazon S3), and show you how to use Amazon EC2 Spot Instances and buffered streaming inference to further reduce costs.
In this post, you will learn how speculative decoding works and why it helps reduce cost per generated token on AWS Trainium2.