42 Tables · 44 Foreign Keys · Double-Entry Accounting

Synthetic business datasets that actually balance

Day-by-day simulated SME operations across Australia, the US, and the UK. Real tax compliance. Real payroll. Real inventory. 4 export formats. Stop building demos on AdventureWorks.

How It Works

Simulated, not randomly generated

Each dataset is produced by a deterministic simulation engine that models 730+ days of real business operations.

01

Configure

Choose country, industry, and seed. Set company size, staff count, product catalog.

02

Initialize

Engine creates company entity, chart of accounts, employees, opening balances.

03

Simulate

730+ days of operations: sales, purchases, payroll, tax, inventory, bank reconciliation.

04

Export

Output to CSV, PostgreSQL SQL, Apache Parquet, and SQLite. All formats included.

Why These Datasets

Built for realism, not just volume

Six properties that set these apart from typical synthetic data generators.

Simulated, not generated

Day-by-day business operations over 730+ days — not random data

Double-entry accounting

Debits always equal credits. Every transaction balances.

44 FK relationships

Customer → Order → Invoice → Payment → Bank → Journal

Real tax compliance

ATO (AU), IRS (US), HMRC (UK) — actual brackets and rates

Deterministic

Same seed = identical output. Fully reproducible results.

3 countries × 3 industries

Genuinely different tax, payroll, seasonal patterns, and COA

42-Table Schema

End-to-end business coverage

Every dataset includes 42 interconnected tables spanning 7 business domains with 44 foreign key relationships.

Core Business

Company, Customers, Suppliers, Products/Services, Employees

Sales & Orders

Sales Orders, Order Lines, Invoices, Invoice Lines, Credit Notes

Purchasing

Purchase Orders, PO Lines, Bills, Bill Lines, Goods Received

Inventory

Inventory, Stock Movements, Warehouses, Reorder Rules

Accounting

Chart of Accounts, Journal Entries, Journal Lines, GL, Bank Transactions

HR & Payroll

Payroll Runs, Pay Slips, Leave Balances, Tax Withholdings

Banking

Bank Accounts, Bank Reconciliations, Payment Allocations

Formats

4 formats, every product

Each purchase includes all four formats. Use what fits your stack.

CSV

Universal — import anywhere

SQL

PostgreSQL schema + full dump

Parquet

Data engineering / analytics

SQLite

Single-file database, zero setup

Retail — $49 each

Outdoor & retail companies

Complete retail operations: inventory, POS sales, supplier management, GST/sales-tax compliance.

Australia

Outback Outdoor Supplies Pty Ltd

ATO PAYG, GST 10%, BAS, Super 11.5%

$4983K rows
United States

Summit Outdoor Gear LLC

IRS federal tax, FICA, ~7.5% sales tax, 401(k)

$4978K rows
United Kingdom

Peak District Outdoor Supplies Ltd

HMRC PAYE, NI 10%+13.8%, VAT 20%, Pension 8%

$4939K rows

Hospitality — $79 each

Pubs, restaurants & kitchens

3-year simulations with staff rosters, menu items, daily covers, weekly/fortnightly/monthly payroll.

Australia

The Golden Wattle Pub & Kitchen

ATO, GST, BAS, Super

3-yr sim, 22 staff, weekly payroll, 25 menu items

$79226K rows
United States

Rocky Mountain Grill & Taphouse

IRS, FICA, sales tax, 401(k)

3-yr sim, 25 staff, fortnightly payroll

$79259K rows
United Kingdom

The Chequers Inn & Kitchen

HMRC, PAYE, NI, VAT 20%, Pension

3-yr sim, 20 staff, monthly payroll

$79180K rows

Professional Services — $79 each

Consulting & advisory firms

Service-based businesses with project billing, timesheets, monthly retainers, and multi-tier staff.

Australia

Meridian Advisory Group Pty Ltd

ATO, GST, BAS, Super

3-yr sim, 18 staff, monthly payroll, 25 service lines

$79143K rows
United States

Blackstone Ridge Consulting LLC

IRS, FICA, sales tax, 401(k)

3-yr sim, 20 staff, fortnightly payroll

$79156K rows
United Kingdom

Wharton & Clarke Advisory LLP

HMRC, PAYE, NI, VAT, Pension

3-yr sim, 16 staff, monthly payroll

$79109K rows

Multi-Company Bundles

Consolidation-ready bundles

Multiple companies in the same industry and country. Perfect for group reporting, multi-entity ERP testing, and inter-company reconciliation.

Retail Bundles

Australia

3 AU Retail Companies

Seeds 42, 100, 200

$99246K rows
Australia

5 AU Retail Companies (Enterprise)

Seeds 42–400

$199400K rows
United States

3 US Retail Companies

Seeds 42, 100, 200

$99230K rows

Hospitality Bundles

Australia

3 AU Pubs

$149825K rows
United States

3 US Restaurants

$149768K rows
United Kingdom

3 UK Pubs

$149817K rows

Consulting Bundles

Australia

3 AU Consulting Firms

$149425K rows
United States

3 US Consulting Firms

$149467K rows
United Kingdom

3 UK Advisory Firms

$149323K rows

Domain Packs — $19 each

Just the tables you need

Australian retail domain subsets. Ideal when you only need accounting, sales, HR, or inventory data.

Accounting & Finance

15 tables · ~30K rows

Sales & CRM

13 tables · ~34K rows

HR & Payroll

10 tables · ~1K rows

Inventory & Purchasing

11 tables · ~18K rows

FAQ

Common questions

Everything you need to know about the datasets.

What format are the datasets in?

Every dataset ships in 4 formats: CSV (universal), PostgreSQL SQL (schema + dump), Apache Parquet (analytics), and SQLite (single-file database). All formats are included in every purchase.

Are these real company datasets?

No. Every dataset is 100% synthetic — generated by a day-by-day business simulation engine. The companies are fictional but the data is realistic, with proper tax compliance, double-entry accounting, and 44 foreign key relationships across 42 tables.

What countries and industries are covered?

We cover 3 countries (Australia, United States, United Kingdom) and 3 industries (Retail, Hospitality, Professional Services). Each combination has genuine differences in tax rules, payroll cycles, seasonal patterns, and chart of accounts.

Can I use these datasets for training ML models?

Absolutely. The datasets are ideal for training and testing ML models, building ERP demos, BI dashboards, accounting software tests, data engineering pipelines, and educational purposes.

Is there a free sample?

Yes. Free samples are available on GitHub, Kaggle, Hugging Face, and Gumroad for all three countries.

Are the datasets deterministic?

Yes. Same seed always produces identical output. This means you can reproduce results, compare runs, and use them reliably in CI/CD pipelines and automated tests.

How many rows are in each dataset?

Single-company datasets range from 39K to 259K rows depending on industry and country. Multi-company bundles go up to 825K rows. Enterprise bundles with 5 companies reach 400K+ rows.

Ready to get started?

Stop building on toy data

Get a production-realistic dataset in minutes. Free samples available on GitHub, Kaggle, and Hugging Face.