Components
DuckDB Now Powers Parquet2CSV Processor
We’ve replaced Pandas with DuckDB in the Parquet2CSV Processor, bringing a major performance boost and improved scalability for processing large Parquet files.
The Parquet2CSV Processor has undergone a significant upgrade in its latest version with the adoption of DuckDB as its new processing engine. Previously based on Pandas, the component now leverages DuckDB’s high-performance, in-process SQL engine, dramatically enhancing its speed and efficiency.
What’s new?
- DuckDB integration: Replaces the existing Pandas-based logic.
- Improved performance: Some scenarios now process over 10x faster.
- Better scalability: Efficient handling of large Parquet datasets.
- Simplified memory management: Thanks to DuckDB's optimized query execution.
This change is particularly valuable for users processing larger files or looking for more stable and scalable data transformations.
For more details and to try the updated processor, visit the component page.
Let us know how it works for your use cases—we welcome your feedback!