Data transformation - Python vs SQL showdown
2023-03-18
For most people, using SQL to transform data is a no-brainer, seeing it's a very versatile language, and doesn't have quite a steep learning curve compared to python. There are some cases where SQL is more suitable for a task, but the reverse can also happen as well.
For instance, given a string conversion problem:
- if a string occurs only one time, replace it with
#
- if a string occurs multiple times, replace it with
&
> one
###
> three
###&&
> Heartbreak hotel
&&&&&&#&
A solution in python would be:
=
=
=
+=
+=
But a solution in SQL is...(thanks Emily @data-engineering-discord!):
TEXT);
(string_value
INSERT INTO
data
VALUES
('one'),
('three'),
('Heartbreak hotel');
select
string_value,
translate(
lower(string_value),
string_agg (
chr,
''
order by
chr
),
string_agg (
subs,
''
order by
chr
)
)
from
(
select
string_value,
chr,
case
when count = 1 then '#'
else '&'
end as subs
from
(
select
p.*,
count(*)
from
(
select
string_value,
regexp_split_to_table (lower(string_value), '') as chr
from
data
) as p
group by
1,
2
) as q
) as r
group by
1
😱😱😱😱😱😱😱😱😱