GCP's service account credentials can be a security risk. Here's how to mitigate them.

If you look online, many sources would tell you that you should use service account to authenticate for GCP services. While this is true, it’s not for all the cases. For local development, you should use Application Default Credentials Imagine working in a team, and you have to work with Cloud Run, so you request your infra team for a service account. This looks good, but then your teammates also have to work with this service. They happen to be in a hurry, so you share your service account to your teammates. Now this can be a problem, because now there are multiple users who have access to this service account. It would be very tricky to trawl through the audit logs and identify which developer interact with cloud run, because the system only sees a single identity. ...

July 14, 2024 · 2 min · Karn Wong

How to connect to Cloud SQL from Cloud Run (no, you don't need a VPC)

A minimal application architecture would compose of a database, and an application backend. Serverless database is still in its infancy, but thankfully container-based runtime is very much alive and doing well. On GCP, a serverless container-based runtime do exist, known as Cloud Run. Standard database access pattern Per standard security practices, you should not expose your database to public, this means you should use a proxy/tunnel or private network to reach your database. ...

February 10, 2024 · 3 min · Karn Wong

Things to watch out for GCP SSL with Cloudflare DNS

For our production workload, we deploy the workloads on Kubernetes, in which an ingress resource is created per each deployment. Resources in ingress are GCP Load Balancer and SSL Certificate. As for DNS, we use Cloudflare since it enables CDN without extra configurations on our part. A few months after the deployment went live initially, we were informed that the website couldn’t be accessed. Turns out GCP couldn’t renew the SSL Certificate (error FAILED_NOT_VISIBLE.) Looking at GCP docs, turns out if the DNS couldn’t be resolved to the Load Balancer IP, it couldn’t provision/renew a certificate. ...

December 18, 2023 · 1 min · Karn Wong

Spark on Kubernetes

Background For data processing tasks, there are different ways you can go about it: using SQL to leverage a database engine to perform data transformation dataframe-based frameworks such as pandas, ray, dask, polars big data processing frameworks such as spark Check out this article for more info on polars vs spark benchmark. The problem At larger data scale, other solutions (except spark) can work, but with a lot of vertical scaling, and this can get very expensive. For a comparison, our team had to scale a database to 4/16 GB and it still took the whole night, whereas spark on a single node can process the data in 2 minutes flat. ...

September 12, 2023 · 4 min · Karn Wong

Bare metal works, until it doesn't. Hello, cloud.

Background Ever wonder how websites (and everything in between) work? Chances are you can create a project running on your local machine. It works as you expected, but to let other people access it, you have to “deploy” it. For many years, to support a lot of request volumes you need to run your applications in a data center. These days this setup is known as on-premise. ...

March 24, 2023 · 4 min · Karn Wong