Synthetic business datasets that actually balance
Day-by-day simulated SME operations across Australia, the US, and the UK. Real tax compliance. Real payroll. Real inventory. 4 export formats. Stop building demos on AdventureWorks.
How It Works
Simulated, not randomly generated
Each dataset is produced by a deterministic simulation engine that models 730+ days of real business operations.
Configure
Choose country, industry, and seed. Set company size, staff count, product catalog.
Initialize
Engine creates company entity, chart of accounts, employees, opening balances.
Simulate
730+ days of operations: sales, purchases, payroll, tax, inventory, bank reconciliation.
Export
Output to CSV, PostgreSQL SQL, Apache Parquet, and SQLite. All formats included.
Why These Datasets
Built for realism, not just volume
Six properties that set these apart from typical synthetic data generators.
Simulated, not generated
Day-by-day business operations over 730+ days — not random data
Double-entry accounting
Debits always equal credits. Every transaction balances.
44 FK relationships
Customer → Order → Invoice → Payment → Bank → Journal
Real tax compliance
ATO (AU), IRS (US), HMRC (UK) — actual brackets and rates
Deterministic
Same seed = identical output. Fully reproducible results.
3 countries × 3 industries
Genuinely different tax, payroll, seasonal patterns, and COA
42-Table Schema
End-to-end business coverage
Every dataset includes 42 interconnected tables spanning 7 business domains with 44 foreign key relationships.
Core Business
Company, Customers, Suppliers, Products/Services, Employees
Sales & Orders
Sales Orders, Order Lines, Invoices, Invoice Lines, Credit Notes
Purchasing
Purchase Orders, PO Lines, Bills, Bill Lines, Goods Received
Inventory
Inventory, Stock Movements, Warehouses, Reorder Rules
Accounting
Chart of Accounts, Journal Entries, Journal Lines, GL, Bank Transactions
HR & Payroll
Payroll Runs, Pay Slips, Leave Balances, Tax Withholdings
Banking
Bank Accounts, Bank Reconciliations, Payment Allocations
Formats
4 formats, every product
Each purchase includes all four formats. Use what fits your stack.
Universal — import anywhere
PostgreSQL schema + full dump
Data engineering / analytics
Single-file database, zero setup
Retail — $49 each
Outdoor & retail companies
Complete retail operations: inventory, POS sales, supplier management, GST/sales-tax compliance.
Outback Outdoor Supplies Pty Ltd
ATO PAYG, GST 10%, BAS, Super 11.5%
Summit Outdoor Gear LLC
IRS federal tax, FICA, ~7.5% sales tax, 401(k)
Hospitality — $79 each
Pubs, restaurants & kitchens
3-year simulations with staff rosters, menu items, daily covers, weekly/fortnightly/monthly payroll.
The Golden Wattle Pub & Kitchen
ATO, GST, BAS, Super
3-yr sim, 22 staff, weekly payroll, 25 menu items
Rocky Mountain Grill & Taphouse
IRS, FICA, sales tax, 401(k)
3-yr sim, 25 staff, fortnightly payroll
The Chequers Inn & Kitchen
HMRC, PAYE, NI, VAT 20%, Pension
3-yr sim, 20 staff, monthly payroll
Professional Services — $79 each
Consulting & advisory firms
Service-based businesses with project billing, timesheets, monthly retainers, and multi-tier staff.
Meridian Advisory Group Pty Ltd
ATO, GST, BAS, Super
3-yr sim, 18 staff, monthly payroll, 25 service lines
Blackstone Ridge Consulting LLC
IRS, FICA, sales tax, 401(k)
3-yr sim, 20 staff, fortnightly payroll
Wharton & Clarke Advisory LLP
HMRC, PAYE, NI, VAT, Pension
3-yr sim, 16 staff, monthly payroll
Multi-Company Bundles
Consolidation-ready bundles
Multiple companies in the same industry and country. Perfect for group reporting, multi-entity ERP testing, and inter-company reconciliation.
Retail Bundles
Hospitality Bundles
Consulting Bundles
Domain Packs — $19 each
Just the tables you need
Australian retail domain subsets. Ideal when you only need accounting, sales, HR, or inventory data.
Free Samples
Try before you buy
Download free sample datasets on GitHub, Kaggle, Hugging Face, or Gumroad.
FAQ
Common questions
Everything you need to know about the datasets.
What format are the datasets in?
Every dataset ships in 4 formats: CSV (universal), PostgreSQL SQL (schema + dump), Apache Parquet (analytics), and SQLite (single-file database). All formats are included in every purchase.
Are these real company datasets?
No. Every dataset is 100% synthetic — generated by a day-by-day business simulation engine. The companies are fictional but the data is realistic, with proper tax compliance, double-entry accounting, and 44 foreign key relationships across 42 tables.
What countries and industries are covered?
We cover 3 countries (Australia, United States, United Kingdom) and 3 industries (Retail, Hospitality, Professional Services). Each combination has genuine differences in tax rules, payroll cycles, seasonal patterns, and chart of accounts.
Can I use these datasets for training ML models?
Absolutely. The datasets are ideal for training and testing ML models, building ERP demos, BI dashboards, accounting software tests, data engineering pipelines, and educational purposes.
Is there a free sample?
Yes. Free samples are available on GitHub, Kaggle, Hugging Face, and Gumroad for all three countries.
Are the datasets deterministic?
Yes. Same seed always produces identical output. This means you can reproduce results, compare runs, and use them reliably in CI/CD pipelines and automated tests.
How many rows are in each dataset?
Single-company datasets range from 39K to 259K rows depending on industry and country. Multi-company bundles go up to 825K rows. Enterprise bundles with 5 companies reach 400K+ rows.
Ready to get started?
Stop building on toy data
Get a production-realistic dataset in minutes. Free samples available on GitHub, Kaggle, and Hugging Face.