Using Apache Iceberg to reduce data lake operations overhead

Every business generates data, some very little, some do generate ginormous amount of data. If you are familiar with the basic web application architecture, there are data, application and web tier. But it doesn’t end there, because the data generated has to be analyzed for reports. A lot of organizations have analysts working on production database directly. This works fine and well, until the data they are working with is very large to the point that a single query can take half a day to process!...

November 15, 2023 · 4 min · Karn Wong

Spark on Kubernetes

Background For data processing tasks, there are different ways you can go about it: using SQL to leverage a database engine to perform data transformation dataframe-based frameworks such as pandas, ray, dask, polars big data processing frameworks such as spark Check out this article for more info on polars vs spark benchmark. The problem At larger data scale, other solutions (except spark) can work, but with a lot of vertical scaling, and this can get very expensive....

September 12, 2023 · 4 min · Karn Wong

Data Engineering Resources

Note: if you’ve seen the list elsewhere, it was probably me. I first posted this list on Data Engineering Discord and Data Engineer Cafe. Books Data fundamentals (good entrypoint) Fundamentals of Data Engineering - Joe Reis & Matt Housley Seven Databases in Seven Weeks - Luc Perkins & Eric Redmond & Jim Wilson Designing Data-Intensive Applications - Martin Kleppmann The Data Warehouse Toolkit - Ralph Kimball & Margy Ross Data Science for Business - Foster Provost & Tom Fawcett Practical Statistics for Data Scientists - Peter Gedeck & Peter Bruce & Andrew Bruce Software engineering Python Crash Course - Eric Matthes The Pragmatic Programmer - Andrew Hunt & David Thomas Platform Terraform: Up & Running - Yevgeniy Brikman Management Team Topologies - Matthew Skelton & Manuel Pais Radical Candor - Kim Scott Data Teams - Jesse Anderson Practical DataOps - Harvinder Atwal Resources https://brendanthompson....

September 9, 2023 · 1 min · Karn Wong

A Networking God Tale: All I Want is to Run a Speedtest Behind a Firewall

Imagine going to your client’s site to deploy a software. During the deployment process, you notice that the speed is atrociously slow. You have a suspicion that your client’s network bandwidth is the issue. To test this theory, you go to a speedtest website and run a test. Turns out you can’t because it’s blocked at the firewall level. Then you try another speedtest website, oops still got blocked. Then you try a few more, still no dice....

August 27, 2023 · 2 min · Karn Wong

Book Highlights - Build by Tony Fadell

Asshole assholes: They suck at work and everything else. These are the mean, jealous, insecure jerks who you’d avoid at a party, but who inevitably sit immediately next to you at the office. They cannot deliver, are deeply unproductive, so they do everything possible to deflect attention away from themselves. They will lie, craft gossip, and manipulate others to get people off their scent. The only good thing about these assholes is that they’re generally out the door pretty quickly—they can only deflect for so long before people start noticing that they bring zero value....

July 6, 2023 · 4 min · Karn Wong