Reduce operational costs with terraform

Background Think of websites you visit each day. Most likely they are hosted on a cloud provider such as AWS, GCP, Azure. The good news is it’s very easy to create a simple deployment with a virtual machine, but for scalable and high-availability workloads, usual recommendations is to use a container-based runtime such as AWS ECS/EKS, GCP Cloud Run/GKE. These services also require more configurations than a simple VM deployment. ...

November 4, 2023 · 3 min · Karn Wong

Spark on Kubernetes

Background For data processing tasks, there are different ways you can go about it: using SQL to leverage a database engine to perform data transformation dataframe-based frameworks such as pandas, ray, dask, polars big data processing frameworks such as spark Check out this article for more info on polars vs spark benchmark. The problem At larger data scale, other solutions (except spark) can work, but with a lot of vertical scaling, and this can get very expensive. For a comparison, our team had to scale a database to 4/16 GB and it still took the whole night, whereas spark on a single node can process the data in 2 minutes flat. ...

September 12, 2023 · 4 min · Karn Wong

Cost optimization with kubernetes

Correction 2023-07-02: fix homelab specs and corresponding AWS EC2 instance class (it’s actually 32GB RAM, not 64GB) Congratulations, you managed to successfully deployed a few services on kubernetes! But this is not the end 👀. Unfortunately money doesn’t grow on trees, and if you can’t justify your infra expenses, finance department won’t be happy. If you’re using Terraform, you can use Infracost to create a cost report. Pretty nifty. But what about kubernetes? Given cost reporting is a basic feature, kubernetes is no exception. ...

April 1, 2023 · 2 min · Karn Wong