OpenData Blog

Engineering insights and technical deep dives from the OpenData team.

Why Your Charts Don't Get Shared (And Chartr's Do)

Chartr grew to 500K+ subscribers by making data visualization shareable. What they figured out about headline-first framing, minimal chrome, and social optimization applies to anyone making charts.

Riley Hilliard
Riley Hilliard·Mar 26, 2026·8 min

Store Flat, Transform on Read

Why we store all data in long format and apply transforms at query time instead of pre-computing views. A technical deep dive into DuckDB, Parquet, and the architecture behind OpenData's query engine.

Riley Hilliard
Riley Hilliard·Mar 19, 2026·9 min

70% of AI Training Datasets Have the Wrong License

A large-scale audit found that over 70% of popular AI datasets have missing or wrong license metadata. With the EU AI Act now enforcing training data transparency, this isn't just sloppy. It's a liability.

Riley Hilliard
Riley Hilliard·Mar 12, 2026·10 min

Public Data Has a Discovery Problem

Government data is technically public but practically inaccessible. Here's what that actually costs researchers, journalists, and anyone trying to answer a question with data.

Riley Hilliard
Riley Hilliard·Mar 5, 2026·7 min

Welcome to the OpenData Blog

Introducing the OpenData blog. We'll be sharing project updates, deep dives into open data infrastructure, and lessons learned building a platform for public datasets.

Riley Hilliard
Riley Hilliard·Feb 25, 2026·3 min

The Hidden Mess Inside 'Clean' Government Data

Government data has a reputation for being clean and reliable. Anyone who's tried to ingest it programmatically knows that's not the full story. Here are the real encoding quirks, format traps, and silent failures hiding in data from FRED, BLS, Census, the World Bank, and the EPA.

Riley Hilliard
Riley Hilliard·Feb 19, 2026·8 min

The State of Open Data Infrastructure in 2026

A survey of the open data landscape: what data.gov, Socrata, FRED, Kaggle, Hugging Face, and Datasette do well, what's still broken, and where the connective tissue between data sources is finally being built.

Riley Hilliard
Riley Hilliard·Feb 12, 2026·9 min

Building a Headless Visualization Engine

How we separated chart computation from rendering by building a spec-driven visualization engine. The architecture behind @opendata/viz: four packages, a compilation pipeline, and zero DOM dependencies in the math layer.

Riley Hilliard
Riley Hilliard·Feb 5, 2026·9 min

Bootstrapping a Data Platform on Two Mac Minis

OpenData runs in production on two Mac Minis at $0/month infrastructure cost. Here's the architecture, the tradeoffs, and the specific triggers that would move us to cloud.

Riley Hilliard
Riley Hilliard·Jan 29, 2026·10 min

What Happens When All the World's Open Data Lives in One Place

Open data has a discovery problem, not an access problem. When you centralize datasets from hundreds of portals, entirely new capabilities emerge: knowledge graphs that reveal hidden connections, bridge datasets that make cross-agency joins possible, and a compounding network where every new dataset makes every existing one more useful.

Riley Hilliard
Riley Hilliard·Jan 22, 2026·11 min