Spatial data to QGIS server playbook (yes, this is for prod)

Some of you might be familiar with geoserver for serving spatial data as consumable WMS/WFS layers. The issue is that as far as I know, you have to manually manage assets upload and specifying styles manually. Also the tool is a bit dated. One modern alternative is QGIS server, you can find pre-made docker image online, and it also syncs with the Desktop version. The good thing about QGIS server is that you can create a QGIS project via the desktop application, then upload it wholesale to Postgres instance as QGIS server backend. ...

August 10, 2023 · 2 min · Karn Wong

Shapefile to data lake

Background: we use spark to read/write to data lake. For dealing with spatial data & analysis, we use sedona. Shapefile is converted to TSV then read by spark for further processing & archival. Recently I had to archive shapefiles in our data lake. It wasn’t rosy for the following reasons: Invalid geometries Sedona (and geopandas too) whines if it encounters invalid geometry during geometry casting. The invalid geometries could be from many reasons, one of them being unclean polygon clipping. ...

April 23, 2021 · 2 min · Karn Wong

Workarounds for archiving large shapefile in data lake

If you work with spatial data, chances are you are familiar with shapefile, a file format for viewing / editing spatial data. Essentially, shapefile is just a tabular data like csv, but it does thingamajig with geometry data type, where any gis tools like qgis or arcgis can understand right away. If you have a csv file with geometry column in wkt format (something like POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))), you’ll have to specify which column is to be used for geometry. ...

January 31, 2021 · 2 min · Karn Wong

Visualizing map region prefix/suffix

import pandas as pd import numpy as np import geopandas as gpd import geoplot as gplt import matplotlib.pyplot as plt from geoplot import polyplot from pythainlp.tokenize import word_tokenize, syllable_tokenize Data structure name: target region name geometry: spatial column *: parent region name, e.g. in “district” dataset it would have a “province” column Dissolving dataset in case you have multiple region level in the same file ## assuming you have a district dataset and want to dissolve to province only district_filename = "FILE_PATH_HERE" gdf = gpd.read_file(district_filename) used_columns = ['province', 'district',] gdf = gdf.rename(columns={'prov_namt'.upper(): 'province', # change to dummy 'amp_namt'.upper():'district', }) gdf = gdf[used_columns+['geometry']] ## desired data 🛎🛎🛎 please do create a datasest with outermost region, so we can use it as boundary for visualization province = gdf.dissolve(by='province') province = province.reset_index()\ .rename(columns={'province': 'name'})\ .drop(columns='district') province .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } ...

September 3, 2020 · 4 min · Karn Wong