[P] I trained an XGBoost model with DuckLake and ADBC
A developer bypassed scikit-learn and pandas to train XGBoost directly on Arrow tables from DuckDB.
A developer combined Apache ADBC (Arrow Database Connectivity) and DuckLake (DuckDB's lakehouse architecture) to train an XGBoost model. The key innovation was passing Apache Arrow columnar data tables directly to XGBoost with minimal memory overhead, avoiding traditional tools like pandas and scikit-learn. ADBC's streaming capability also enabled handling datasets larger than available memory, creating a more efficient end-to-end pipeline for machine learning on tabular data.
Why It Matters
This approach reduces memory bottlenecks and data conversion steps, streamlining ML workflows on large datasets.