Serverless real-time machine learning inference with AWS

For a machine learning project, usually it is divided into two main categories: research and production. For research ML project, the model would be created and used locally on a researcher’s machine. For a production ML project, a deployment would be involved. Usual pattern is to create a service to load a model, accept input, then return a prediction. Production ML is also divided into two main patterns: batch or real-time. For batch inference, a job would be triggered on an interval to pre-calculate predictions, then store somewhere. As for real-time inference, it is more tricky, since this involves web application architecture (at least the data and application tier). ...

November 28, 2023 · 3 min · Karn Wong