Spark on Kubernetes

Background For data processing tasks, there are different ways you can go about it: using SQL to leverage a database engine to perform data transformation dataframe-based frameworks such as pandas, ray, dask, polars big data processing frameworks such as spark Check out this article for more info on polars vs spark benchmark. The problem At larger data scale, other solutions (except spark) can work, but with a lot of vertical scaling, and this can get very expensive. For a comparison, our team had to scale a database to 4/16 GB and it still took the whole night, whereas spark on a single node can process the data in 2 minutes flat. ...

September 12, 2023 · 4 min · Karn Wong

DevX starts at your local machine

Platform engineering is all the rage these days. Often, you’ll often hear this term with the keyword DevX. How are they related? Imagine you are working on a microservice backend. You are just starting out, so you don’t have much features to work on yet. But as a PoC, you only need to [fetch data] and [return aggregated price]. You can do microservices on Kubernetes, but you are not familiar with DevOps so you turn to a cloud provider - AWS. ...

April 22, 2023 · 4 min · Karn Wong