Use SQL against CSV (or other hard files) without CLI

CSV as a file format is very versatile, almost any programs can parse it. The only issue is you can’t use SQL against CSV files directly. This is a major pain point, since using SQL is so much faster than firing up a jupyter notebook and wrangle the data in python, or use Excel and apply transformations until you get desired results. But the question is how do we use SQL against CSV files in the first place. Many people want this to become a reality, so a few tools exist on github: ...

April 25, 2023 · 2 min · Karn Wong

Visualizing map region prefix/suffix

import pandas as pd import numpy as np import geopandas as gpd import geoplot as gplt import matplotlib.pyplot as plt from geoplot import polyplot from pythainlp.tokenize import word_tokenize, syllable_tokenize Data structure name: target region name geometry: spatial column *: parent region name, e.g. in “district” dataset it would have a “province” column Dissolving dataset in case you have multiple region level in the same file ## assuming you have a district dataset and want to dissolve to province only district_filename = "FILE_PATH_HERE" gdf = gpd.read_file(district_filename) used_columns = ['province', 'district',] gdf = gdf.rename(columns={'prov_namt'.upper(): 'province', # change to dummy 'amp_namt'.upper():'district', }) gdf = gdf[used_columns+['geometry']] ## desired data 🛎🛎🛎 please do create a datasest with outermost region, so we can use it as boundary for visualization province = gdf.dissolve(by='province') province = province.reset_index()\ .rename(columns={'province': 'name'})\ .drop(columns='district') province .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } ...

September 3, 2020 · 4 min · Karn Wong

Word-based analysis with song lyrics

2023 version: https://github.com/kahnwong/lyrics-analysis I listen to a lot of music, mostly symphonic heavy metal. What’s interesting is that in this genre, each album often has different themes, also each band focus on different topics in terms of lyrics. For instance, Nightwish focuses on nature, and their Imaginaerum album focuses on evolution. So I thought it would be interesting if I apply various text analysis methods to the lyrics, which resulted in this article. Github link here! ...

April 15, 2020 · 8 min · Karn Wong

Resettled refugees in Sweden

One of my friends is a Syrian refugee, who was granted asylum in Sweden last year. I also want to try data analysis, so it fits that I should analyze something that’s relevant to my friend. This is my first ever analysis in pandas, apologies for code abomination in advance. In this analysis, I use pandas for dataframe, numpy for dealing with numbers (because I need to count and do some math with it) and matplotlib for plotting graphs. ...

July 30, 2018 · 5 min · Karn Wong