For most people, using SQL to transform data is a no-brainer, seeing it’s a very versatile language, and doesn’t have quite a steep learning curve compared to python. There are some cases where SQL is more suitable for a task, but the reverse can also happen as well.
For instance, given a string conversion problem:
- if a string occurs only one time, replace it with
#
- if a string occurs multiple times, replace it with
&
> one
###
> three
###&&
> Heartbreak hotel
&&&&&#&&&##&#&&#
A solution in python would be:
from collections import Counter
s = "three"
s_counter = Counter(s)
output_str = ''
for char in s:
if s_counter[char]>1:
output_str += '&'
elif s_counter[char]==1:
output_str += '#'
But a solution in SQL is…(thanks Emily @data-engineering-discord!):
CREATE TABLE data (
string_value TEXT
);
INSERT INTO data VALUES
('one'),
('three'),
('Heartbreak hotel');
select
string_value,
translate(
lower(string_value),
string_agg(chr, '' order by chr),
string_agg(subs, '' order by chr)
)
from (
select
string_value,
chr,
case when count = 1 then '#' else '&' end as subs
from (
select
p.*,
count(*)
from (
select
string_value,
regexp_split_to_table(lower(string_value), '') as chr
from data
) as p
group by 1, 2
) as q
) as r
group by 1
😱😱😱😱😱😱😱😱😱