Streamlit load test performance
Streamlit is well-loved by many people, especially among data folks due to the fact that it does not require prior web programming knowledge to get started. Popular use cases for streamlit can be anything from a quick machine learning application poc or internal dashboards. But what if you want to create a production deployment? Would streamlit still be a viable option? Experiment setup For a production deployment, let’s assume there are 500 concurrent users. For a single-page streamlit with a chat box taking 3 seconds to return a single-paragraph message: ...
Simplify self-hosting backups to S3 with docker
These days there are multiple ways to deploy a workload, be it cloud-based or bare-metal. For cloud, depending on whether you are using PaaS or IaaS, backup options can vary. Why do we need to backup? Because your workloads can contain a state, this can be stored as local files, inside a database, or as other assets outside the application itself. Take a database for example, ideally you would need a daily backup so you can revert a database to a state before its corruption without losing as much data. Some workloads might store uploaded images, for simplicity let’s say they are being written to disk. ...
Reasons why you shouldn't use programming languages for IaC
When it comes to IaC (infrastructure as code), most people might have heard of HashiCorp’s Terraform (it uses HCL as DSL. Interestingly enough, Terraform also has its own CDK to translate programming languages into HCL), Pulumi or AWS CDK. The latter two support programming languages as DSL. Mostly there are two camps: People who swear by HCL and think you shouldn’t use programming languages for IaC People who don’t see why you need to pick up a new language in order to use IaC, so they prefer using a programming language they already are familiar with instead Both camps are not wrong, they are both valid. However, I want to share my take on why you should use HCL for IaC. ...
GCP's service account credentials can be a security risk. Here's how to mitigate them.
If you look online, many sources would tell you that you should use service account to authenticate for GCP services. While this is true, it’s not for all the cases. For local development, you should use Application Default Credentials Imagine working in a team, and you have to work with Cloud Run, so you request your infra team for a service account. This looks good, but then your teammates also have to work with this service. They happen to be in a hurry, so you share your service account to your teammates. Now this can be a problem, because now there are multiple users who have access to this service account. It would be very tricky to trawl through the audit logs and identify which developer interact with cloud run, because the system only sees a single identity. ...
Thoughts on summarization service system design
For a summarization task, there should be an input, in which it’s reduced to a handful of paragraphs. This input is in text format. You don’t necessarily start from a text format though, since the source content can be audio or video files. But this means at the end, the source input has to be converted into a text format, and this involves a transcription task. Transcription means taking an audio, then convert it to text. Luckily these days there are APIs you can use to achieve this. Depending on each API provider, but it’s safe to assume most would support WAVE or FLAC encoding. ...