Lead a team of three engineers, educate and support data science team to achieve faster delivery
Collaborate with pre-sales and data science team to ensure the feasibility of incoming projects
Improve day-to-day operations within the consulting team to speed up processes
Act as technical lead for data science team
Guide teams on how to better collaborate to achieve faster feedback loops and delivery
Evangelize agile, software engineering and MLOps practices
Within 8 months since onboarding, managed to have the data science team who previously exclusively develop on Google Colab / Vertex Notebook do code versioning, pull requests, and python venv (via poetry). More than 30+ pull requests have been made during this period
Oversee technical direction and delivery for projects. This involves model development (MLOps) and application deployment (DevOps), while utilizing a common foundational blocks to reduce implementation overhead (platform engineering)
Lead a team of three engineers, building and maintaining infrastructure for five products, centralized data platform and MLOps platform.
Collaborate with engineering, product and data team to align the platform directions that satisfy all parties
Research and implement internal data platform v2, with CI/CD for integration and unit tests, dependencies and code changes
Use Terraform with Infracost to assess infrastructure cost to prune unused/underutilized resources, in turn reduce 33% of cloud cost per month (FinOps)
Design and implement end-to-end cloud-native application deployment (Terraform for infra/secrets, Helm for application deployment, GitHub Actions for CI/CD, Kubernetes for container orchestration)
Lead a major core product refactoring across ops, engineering and data (data engineering, data science and machine learning engineering). Affected areas: data pipelines, model training pipelines, real-time inference endpoints, database performance optimization, deployment pipelines, local development workflow
Lead a cloud migration from AWS to GCP (AWS ECS to GCP GKE)
Introduce H3 spatial indexing to improve spatial queries performance. This also allows any system to obtain the performance gain, since it is not tied to execution engine with spatial support
Lead a team of two engineers, building and maintaining data and MLOps platform
Collaborate with engineering, product and data team to ensure that the platform serves their needs
Enable backend team to perform auto-deployment to ECS with Terraform and GitHub Actions
Manage repo permissions / secrets / webhooks with Terraform
Research BI solution to consolidate fragmented dashboard platforms into a single one
Setup Grafana for centralized metrics and logs monitoring
Introduce Sourcegraph to help with code search and refactoring
Setup GitOps for Terraform to enable collaboration between different teams, in turn reducing configuration drift
Architect end-to-end machine learning project involving reproducible data / model training pipelines, hyper-parameter tuning, CI/CD for inference API endpoint (MLOps). Real-time model performance dashboard and tracing via request id are also implemented via Grafana
Reduce development time for ETL pipelines from a week to 1 day via workflow redesign + codebase refactoring
Mentor data engineers
Consult other teams as a platform engineer
Set up alerts & monitoring to automatically notify task failures (ChatOps)
Create a script to automatically grant postgres access permission based on user groups, with option for special permissions per-user basis
Optimize a large spark pipeline that fails often due to OOM with divide-and-conquer method for unlimited scaling
Reduce runtime for PR code quality checks by 90% to shorten feedback loop cycle
Set up secrets management using SOPS/Terraform/AWS SSM, for improved security and secrets rotation
Launch Baania Engineering Blog, a platform to showcase how Baania does things behind the scenes
Reduce employee onboarding time per employee from a few days to 30 minutes via a setup script to setup necessary tools, applications and environment for development
Optimize a nearby POI lookup in PostGIS against 4 million rows with lochash, in turn reducing query runtime and resources requirements. Runtime is reduced from 10 minutes to 2 seconds
Create and maintain data gathering infrastructure for daily ingestion and processing to be stored in data lake (S3)
Create and optimize machine learning models to achieve near-realtime performance
Create and maintain ETL pipelines via task orchestrator to reduce data update frequency from once a month to daily
Deploy and maintain ML via cloud services to reduce ML deployment time from a day to within minutes
Mentor data scientists
Automate infrastructure and governance using Terraform
Package cron services to AWS ECS and invoke via AWS ECS Task to cut down cost from 50 USD / year to 0.1 USD / year
Alternative / self-hosted version for popular subscription services: Netflix, Spotify, LastPass, Trello, Dropbox, NordVPN, etc
Managed via docker-compose, helm and terraform
Use terraform to manage Cloudflare and Kubernetes.
Use Caddy for reverse-proxy
SSO via Authentik
Personal documentation website on various topics
A cross-platform setup script that works with both Linux and Mac
pgcli wrapper to connect to PostgreSQL database specified in db.yaml. Proxy/tunnel connection is automatically created and killed when pgcli is exited
Benchmark performance between duckdb, polars and spark. In addition to runtime, RAM usage is also provided
Measure latency, RPS and CPU & memory utilization for a hello world api in various languages
Measure overheads incurred from using C FFI in various languages
FastAPI template with pydantic and pytest
Run spark jobs on kubernetes, which can be used both locally and on production environment
A working dagster demo with medallion architecture, partitioned data, schedules, assets dependencies, job status alerts and auto-materialize assets.
Use machine learning to fill in missing data
Utilize hyperparameter tuning to find the optimum parameters