dataaaaa !

une plateforme pour les réunir toutes

yesterday
Logo

How to find balance in data work (and prevent burnout before it finds you)

Discussion on work-life balance and preventing burnout in the data field.

Who is the Best CDO?

Chief Data Officer role-playing game.

Logo

dbt Projects on Snowflake (General availability)

General availability of dbt Projects on Snowflake, integration with Snowsight and CLI, performance improvements and extended support for dbt commands.

Thursday, November 6

We moved analytics into an IDE — and haven’t looked back

Faire moved its data analytics into an IDE with Cursor, enhancing SQL workflows and report automation. The team uses Snowflake and developed MCPs to connect various tools.

Graph-Powered AI Chat for dbt

Demonstrates using FalkorDB and Neo4j to create a dbt knowledge graph and interact with it via AI-powered chat, optimizing complex dependency management.

Wednesday, November 5

A Deep Dive into DuckDB for Data Scientists

DuckDB provides a serverless solution for SQL operations, integrating with pandas and Polars, optimizing memory and performance for data scientists.

Logo

Interactive tables and interactive warehouses (Preview)

Snowflake introduces interactive tables and warehouses in preview on AWS, enhancing query performance and real-time data processing capabilities.

Tuesday, November 4
Logo

Sharing semantic views

Snowflake announces general availability of sharing semantic views, enabling providers to share them in private, public, and organizational listings.

Logo

Introducing pg_lake: Integrate Your Data Lakehouse with Postgres

Snowflake introduces pg_lake to integrate data lakehouses with Postgres, enabling federated SQL queries.

We're open-sourcing the successor of Jupyter notebook

Deepnote is open-sourcing its successor to Jupyter notebooks, offering reactive, collaborative, and AI-ready features with a human- and AI-readable project format.

Logo

ClickHouse welcomes LibreChat: Introducing the open-source Agentic Data Stack

ClickHouse announces the acquisition of LibreChat, an open-source AI chat platform, enhancing agentic analytics capabilities with large-scale data processing.

Logo

Real-Time Text-to-SQL Behind Snowflake Intelligence

Snowflake Intelligence leverages real-time text-to-SQL processing to enhance query efficiency.

Logo

Apache Polaris 1.2 (incubating): Enhanced Governance & Connectivity

Apache Polaris 1.2 enhances governance and connectivity, integrating better with Snowflake.

Meet the founders of dltHub: Matthaus Krzykowski, Marcin Rudolf, Adrian Brudaru, and Anna Hoffmann

dltHub develops a Python-native data platform to accelerate data pipelines, combining simplicity with enterprise-grade governance.

Logo

Chat with your Snowflake Data from Microsoft Teams

Integration of Snowflake Cortex with Microsoft Teams to interact with data via a bot, using AI agents and semantic views.

Why we're making nao Free

nao offers a free version of its data IDE integrated with AI features, allowing to connect data warehouses and execute SQL.

Monday, November 3
Logo

BigQuery : The Data Engineering Agent is now in preview

BigQuery introduces a Data Engineering Agent in preview, automating complex tasks.

Sunday, November 2

"You Don't Need Kafka, Just Use Postgres" Considered Harmful

Technical comparison between Kafka and Postgres for real-time data processing, highlighting Kafka's advantages for event streaming.

Logo

Announcing Pg_duckdb Version 1.0

Version 1.0 of pg_duckdb, a PostgreSQL extension embedding DuckDB's vectorized analytical engine, is announced, bringing performance improvements and better MotherDuck integration.

Logo

4 Senior Data Engineers Answer 10 Top Reddit Questions

Four senior data engineers address Reddit's top questions on fundamentals, data quality, and tech choices.

Logo

Building a Better Lakehouse: From Airflow to Dagster

Technical comparison between Airflow and Dagster for lakehouse orchestration, highlighting Dagster's improvements in smart partitioning, event-driven architecture, and data quality framework.

Logo

How we scaled raw GROUP BY to 100 B+ rows in under a second

ClickHouse Cloud introduces parallel replicas, enabling GROUP BY queries on over 100 billion rows in under a second, without pre-aggregation or data reshuffling.

Logo

Are open-table-formats + lakehouses the future of observability?

Lakehouses using open table formats like Apache Iceberg and Delta Lake are becoming viable for observability, pairing Parquet’s columnar compression and filtering with schema evolution.

Logo

How Netflix optimized its petabyte-scale logging system with ClickHouse

Netflix uses ClickHouse to process 5 petabytes of logs daily, with optimizations like generated lexers and native serialization.

Saturday, November 1

SQLite concurrency and why you should care about it

Explanation of SQLite's concurrency limitations and solutions implemented by Jellyfin, including optimistic and pessimistic locking strategies.

Thursday, October 30

Introducing dbc

Introduction to dbc, a command-line tool that manages connections and executes SQL queries.

Wednesday, October 29
Logo

Openflow vs. dlt for Snowflake users

Comparison between Openflow, a GUI-based tool built on Apache NiFi for data ingestion in Snowflake, and dlt, an open-source Python library for creating custom data pipelines.

Logo

The feature we were afraid to talk about

dltHub enhances scaffolds by combining deterministic parsing and LLM for more reliable data pipeline generation.

Logo

Why moving from stored procedures to dbt drives trust, talent, and AI-readiness

Migrating from stored procedures to dbt enhances trust, attracts talent, and prepares for AI. dbt provides better documentation, integrated testing, and transparent data lineage.

Logo

The era of open data infrastructure

The era of open data infrastructure: dbt Labs and Fivetran merge to create an open data infrastructure based on Apache Iceberg, enabling better data utilization and enhanced governance.

Logo

State-aware orchestration now in Preview for Fusion projects

dbt Labs announces state-aware orchestration for Fusion projects, reducing compute costs by 29%+ by avoiding unnecessary processing. This innovation uses intent-based configurations to optimize data freshness.

dbt Labs Open Sources MetricFlow: An Independent Schema for Data Interoperability

dbt Labs open sources MetricFlow, an independent schema for data interoperability, enhancing consistency and collaboration in data pipelines.

Tuesday, October 28
Logo

Relational Charades: Turning Movies into Tables

DuckDB can store and process videos by converting them into relational tables. The post demonstrates how to turn a movie into a table with 47 billion rows.

Thursday, October 2

Apache Iceberg™ vs Delta Lake vs Apache Hudi™ - Feature Comparison Deep Dive

In-depth technical comparison between Apache Iceberg, Delta Lake, and Apache Hudi, highlighting their features, performance, and integrations.