Simplify self-hosting backups to S3 with docker

These days there are multiple ways to deploy a workload, be it cloud-based or bare-metal. For cloud, depending on whether you are using PaaS or IaaS, backup options can vary. Why do we need to backup? Because your workloads can contain a state, this can be stored as local files, inside a database, or as other assets outside the application itself. Take a database for example, ideally you would need a daily backup so you can revert a database to a state before its corruption without losing as much data. Some workloads might store uploaded images, for simplicity let’s say they are being written to disk. ...

September 7, 2024 · 4 min · Karn Wong

Slim down python docker image size with poetry and pip

Python package management is not straightforward, seeing default package manager (pip) does not behave like node’s npm, in a sense that it doesn’t track dependencies versions. This is why you should use poetry to manage python packages, since it creates a lock file, so you can be sure that on every re-install, the versions would be the same. However, this poses a challenge when you want to create a docker image with poetry, because you need to do an extra pip install poetry (unless you bake this into your base python image). Additionally, turns out using poetry to install packages results in larger docker image size. ...

April 7, 2024 · 2 min · Karn Wong

DevX starts at your local machine

Platform engineering is all the rage these days. Often, you’ll often hear this term with the keyword DevX. How are they related? Imagine you are working on a microservice backend. You are just starting out, so you don’t have much features to work on yet. But as a PoC, you only need to [fetch data] and [return aggregated price]. You can do microservices on Kubernetes, but you are not familiar with DevOps so you turn to a cloud provider - AWS. ...

April 22, 2023 · 4 min · Karn Wong

Use SSH key during docker build without embedding the key via ssh-agent

Imagine working in a company, and they have a super cool internal module! The module works great, except that it is a private module, which means you need to install it by cloning the source repo and install it from source. That shouldn’t be an issue if you work on your local machine. But for production usually this means you somehow need to bundle this awesome module into your docker image. You go create a Dockerfile and there’s one little problem: it couldn’t clone the module repo because it doesn’t have the required SSH key that can access the repo. ...

February 6, 2022 · 2 min · Karn Wong

Use pyspark locally with docker

For data that doesn’t fit into memory, spark is often a recommended solution, since it can utilize map-reduce to work with data in a distributed manner. However, setting up local spark development from scratch involves multiple steps, and definitely not for a faint of heart. Thankfully using docker means you can skip a lot of steps 😃 Instructions Install Docker Desktop Create docker-compose.yml in a directory somewhere version: "3.3" services: pyspark: container_name: pyspark image: jupyter/pyspark-notebook:latest ports: - "8888:8888" volumes: - ./:/home/jovyan/work Run docker-compose up from the same folder where the above file is located. You should see something like this. It’s the same from running jupyter notebook locally. Click the link at the end to access jupyter notebook. Creating pyspark ... done Attaching to pyspark pyspark | WARNING: Jupyter Notebook deprecation notice https://github.com/jupyter/docker-stacks#jupyter-notebook-deprecation-notice. pyspark | Entered start.sh with args: jupyter notebook pyspark | /usr/local/bin/start.sh: running hooks in /usr/local/bin/before-notebook.d as uid / gid: 1000 / 100 pyspark | /usr/local/bin/start.sh: running script /usr/local/bin/before-notebook.d/spark-config.sh pyspark | /usr/local/bin/start.sh: done running hooks in /usr/local/bin/before-notebook.d pyspark | Executing the command: jupyter notebook pyspark | [I 12:36:04.395 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret pyspark | [W 2021-12-21 12:36:05.487 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release. pyspark | [W 2021-12-21 12:36:05.488 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release. pyspark | [W 2021-12-21 12:36:05.488 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release. pyspark | [W 2021-12-21 12:36:05.488 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release. pyspark | [I 2021-12-21 12:36:05.497 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.9/site-packages/jupyterlab pyspark | [I 2021-12-21 12:36:05.498 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab pyspark | [I 12:36:05.504 NotebookApp] Serving notebooks from local directory: /home/jovyan pyspark | [I 12:36:05.504 NotebookApp] Jupyter Notebook 6.4.6 is running at: pyspark | [I 12:36:05.504 NotebookApp] http://bd20652c22d3:8888/?token=408f2020435dfb489c8d2710736a83f7a3c7cd969b3a1629 pyspark | [I 12:36:05.504 NotebookApp] or http://127.0.0.1:8888/?token=408f2020435dfb489c8d2710736a83f7a3c7cd969b3a1629 pyspark | [I 12:36:05.504 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). pyspark | [C 12:36:05.509 NotebookApp] pyspark | pyspark | To access the notebook, open this file in a browser: pyspark | file:///home/jovyan/.local/share/jupyter/runtime/nbserver-7-open.html pyspark | Or copy and paste one of these URLs: pyspark | http://bd20652c22d3:8888/?token=408f2020435dfb489c8d2710736a83f7a3c7cd969b3a1629 pyspark | or http://127.0.0.1:8888/?token=408f2020435dfb489c8d2710736a83f7a3c7cd969b3a1629 This snippet ...

December 21, 2021 · 3 min · Karn Wong