Faster spark workloads with comet
For big data processing, spark is still king. Over the years, many improvements have been made to improve spark performance. Databricks themselves created photon, a spark engine that can accelerate spark queries, but this is proprietary to Databricks. Other alternatives do exist (see here for more details), but they are not trivial to setup. But if you use Apache Arrow DataFusion Comet, surprisingly it does not take much time at all to setup. Comet stands on arrow, a data format growing in popularity. ...