The 7 Best Sample Databases for Learning SQL in 2026 (and Their Limitations)
A honest comparison of AdventureWorks, Northwind, Chinook, Sakila, and newer alternatives. What each is good for and where they fall short.
The Classic Problem
Every SQL tutorial starts with a sample database. But most sample databases were designed decades ago for a different era of software development. They're small, they lack realistic business complexity, and they don't reflect modern data engineering practices.
Here's an honest comparison of the most popular options — and where the field is heading.
1. AdventureWorks (Microsoft)
Released: 2005 (updated through 2019)
Tables: ~70 (many unused)
Rows: ~120K across all tables
Format: SQL Server backup (.bak)
Strengths
- Well-documented with extensive tutorials
- Covers manufacturing, sales, and HR domains
- Included with SQL Server installations
Limitations
- SQL Server only — not portable to PostgreSQL, MySQL, or SQLite without conversion
- Outdated schema — designed for SQL Server 2005 patterns
- No accounting — no double-entry bookkeeping, no journal entries, no trial balance
- No tax compliance — no GST, VAT, PAYG, or payroll tax calculations
- Complex but shallow — many tables exist but with minimal data
- Not deterministic — no way to regenerate the same data
"AdventureWorks is the default recommendation, but it's a 20-year-old database designed to showcase SQL Server features, not to teach realistic business data modeling." — Common criticism in data engineering communities
2. Northwind (Microsoft)
Released: 1997
Tables: 13
Rows: ~3K
Format: SQL Server / Access (.mdb)
Strengths
- Simple and easy to understand
- Great for absolute beginners
- Covers orders, products, customers, suppliers
Limitations
- Extremely small — too little data for meaningful analytics
- No financial data — no invoices, payments, or accounting
- Single currency, single country — no multi-jurisdiction complexity
- No payroll or HR — no employees beyond basic contact info
- Frozen in 1997 — product catalog feels dated
3. Chinook (open source)
Released: 2008
Tables: 11
Rows: ~15K
Format: Multiple (SQLite, PostgreSQL, MySQL, SQL Server)
Strengths
- Multi-platform — available for all major databases
- Media/music domain (tracks, albums, artists, playlists)
- Good for teaching joins and relationships
Limitations
- Niche domain — media store, not a general business
- Small — 11 tables is not enough for complex queries
- No financial depth — invoices exist but no accounting behind them
- No temporal complexity — no time-series patterns to analyze
4. Sakila (MySQL)
Released: 2005
Tables: 16
Rows: ~47K
Format: MySQL
Strengths
- DVD rental domain — intuitive and fun
- Good for practicing joins, subqueries, and aggregations
- Well-structured with clear relationships
Limitations
- MySQL only — designed for MySQL-specific features
- Obsolete domain — DVD rentals in 2026?
- No business operations — no purchasing, no payroll, no inventory management
- Limited scale — not suitable for data engineering or BI workloads
5. TPC-H / TPC-DS (Transaction Processing Council)
Released: 1999 / 2006
Tables: 8 (TPC-H) / 25 (TPC-DS)
Rows: Scalable (1GB to 100TB)
Format: Generator tool
Strengths
- Industry standard for benchmarking
- Scalable — generate any size dataset
- Well-defined queries — comes with standard benchmark queries
- Complex star/snowflake schemas (TPC-DS)
Limitations
- Designed for benchmarking, not learning — schema is abstract and unintuitive
- No business realism — table names like LINEITEM, PARTSUPP don't map to real business concepts
- No accounting or compliance — purely transactional
- Difficult to set up — requires compilation and configuration
6. PostgreSQL Sample Databases (dvdrental, pagila)
Released: Various
Tables: 15-16
Rows: ~46K
Format: PostgreSQL
Strengths
- Native PostgreSQL format
- Active community maintenance
- Good documentation
Limitations
- Same domain limitations as Sakila (DVD rentals)
- Small scale
- No financial or operational depth
7. Mindweave SME-Sim Datasets (2026)
Released: 2026
Tables: 42
Rows: 39K - 259K per company (up to 825K in bundles)
Format: CSV, PostgreSQL SQL, Apache Parquet, SQLite
Strengths
- 42 tables, 44 foreign keys — full end-to-end business operations
- Double-entry accounting — debits always equal credits
- Real tax compliance — ATO (AU), IRS (US), HMRC (UK) actual brackets
- 3 countries × 3 industries — genuinely different business patterns
- 4 formats — use with any database or analytics tool
- Deterministic — same seed = identical data every time
- Multi-company bundles — test group reporting and consolidation
- Time-series rich — 730+ days of day-by-day simulation
Limitations
- Not free (full datasets) — $19-$199 depending on product, but free samples available
- SME focused — small/medium business, not enterprise or manufacturing
- Three countries — AU, US, UK only (more planned)
Comparison Table
| Database | Tables | Rows | Accounting | Tax | Multi-format | Deterministic | Free |
|---|---|---|---|---|---|---|---|
| AdventureWorks | ~70 | ~120K | No | No | SQL Server | No | Yes |
| Northwind | 13 | ~3K | No | No | SQL Server | No | Yes |
| Chinook | 11 | ~15K | No | No | Yes | No | Yes |
| Sakila | 16 | ~47K | No | No | MySQL | No | Yes |
| TPC-H | 8 | Scalable | No | No | Generator | Yes | Yes |
| TPC-DS | 25 | Scalable | No | No | Generator | Yes | Yes |
| SME-Sim | 42 | 39K-825K | Yes | Yes | Yes (4) | Yes | Samples |
Which Should You Use?
Learning basic SQL joins: Chinook or Sakila — small, simple, well-documented.
Learning business data modeling: SME-Sim — 42 tables with real business relationships and accounting.
Benchmarking database performance: TPC-H or TPC-DS — industry standard, scalable.
Testing ERP or accounting software: SME-Sim — the only option with double-entry accounting and real tax compliance.
Building BI dashboards: SME-Sim — 730+ days of time-series data across 7 business domains.
Data engineering pipelines: SME-Sim (Parquet format) or TPC-DS — both offer structured, scalable data.
Try It
Free samples are available on multiple platforms:
- GitHub (AU) | GitHub (US) | GitHub (UK)
- Kaggle (AU) | Kaggle (US) | Kaggle (UK)
- Hugging Face (AU) | Hugging Face (US) | Hugging Face (UK)
*This comparison is based on publicly available documentation for each database as of April 2026. AdventureWorks and Northwind are trademarks of Microsoft Corporation. TPC-H and TPC-DS are trademarks of the Transaction Processing Performance Council.*
Ready to try production-realistic data?
42 tables, double-entry accounting, real tax compliance. Free samples available.