Synthetic business datasets that actually balance
Day-by-day simulated SME operations across Australia, the US, and the UK. Real tax compliance. Real payroll. Real inventory. 4 export formats. Stop building demos on AdventureWorks.
How It Works
Simulated, not randomly generated
Each dataset is produced by a deterministic simulation engine that models 730+ days of real business operations.
Configure
Choose country, industry, and seed. Set company size, staff count, product catalog.
Initialize
Engine creates company entity, chart of accounts, employees, opening balances.
Simulate
730+ days of operations: sales, purchases, payroll, tax, inventory, bank reconciliation.
Export
Output to CSV, PostgreSQL SQL, Apache Parquet, and SQLite. All formats included.
Why These Datasets
Built for realism, not just volume
Six properties that set these apart from typical synthetic data generators.
Simulated, not generated
Day-by-day business operations over 730+ days — not random data
Double-entry accounting
Debits always equal credits. Every transaction balances.
44 FK relationships
Customer → Order → Invoice → Payment → Bank → Journal
Real tax compliance
ATO (AU), IRS (US), HMRC (UK) — actual brackets and rates
Deterministic
Same seed = identical output. Fully reproducible results.
3 countries × 3 industries
Genuinely different tax, payroll, seasonal patterns, and COA
42-Table Schema
End-to-end business coverage
Every dataset includes 42 interconnected tables spanning 7 business domains with 44 foreign key relationships.
Core Business
Company, Customers, Suppliers, Products/Services, Employees
Sales & Orders
Sales Orders, Order Lines, Invoices, Invoice Lines, Credit Notes
Purchasing
Purchase Orders, PO Lines, Bills, Bill Lines, Goods Received
Inventory
Inventory, Stock Movements, Warehouses, Reorder Rules
Accounting
Chart of Accounts, Journal Entries, Journal Lines, GL, Bank Transactions
HR & Payroll
Payroll Runs, Pay Slips, Leave Balances, Tax Withholdings
Banking
Bank Accounts, Bank Reconciliations, Payment Allocations
Formats
4 formats, every product
Each purchase includes all four formats. Use what fits your stack.
Universal — import anywhere
PostgreSQL schema + full dump
Data engineering / analytics
Single-file database, zero setup
Retail — $49 each
Outdoor & retail companies
Complete retail operations: inventory, POS sales, supplier management, GST/sales-tax compliance.
Outback Outdoor Supplies Pty Ltd
ATO PAYG, GST 10%, BAS, Super 11.5%
Summit Outdoor Gear LLC
IRS federal tax, FICA, ~7.5% sales tax, 401(k)
Hospitality — $79 each
Pubs, restaurants & kitchens
3-year simulations with staff rosters, menu items, daily covers, weekly/fortnightly/monthly payroll.
The Golden Wattle Pub & Kitchen
ATO, GST, BAS, Super
3-yr sim, 22 staff, weekly payroll, 25 menu items
Rocky Mountain Grill & Taphouse
IRS, FICA, sales tax, 401(k)
3-yr sim, 25 staff, fortnightly payroll
The Chequers Inn & Kitchen
HMRC, PAYE, NI, VAT 20%, Pension
3-yr sim, 20 staff, monthly payroll
Professional Services — $79 each
Consulting & advisory firms
Service-based businesses with project billing, timesheets, monthly retainers, and multi-tier staff.
Meridian Advisory Group Pty Ltd
ATO, GST, BAS, Super
3-yr sim, 18 staff, monthly payroll, 25 service lines
Blackstone Ridge Consulting LLC
IRS, FICA, sales tax, 401(k)
3-yr sim, 20 staff, fortnightly payroll
Wharton & Clarke Advisory LLP
HMRC, PAYE, NI, VAT, Pension
3-yr sim, 16 staff, monthly payroll
Multi-Company Bundles
Consolidation-ready bundles
Multiple companies in the same industry and country. Perfect for group reporting, multi-entity ERP testing, and inter-company reconciliation.
Retail Bundles
Hospitality Bundles
Consulting Bundles
Domain Packs — $19 each
Just the tables you need
Australian retail domain subsets. Ideal when you only need accounting, sales, HR, or inventory data.
Domain-Specific Datasets — from $29
15 industries, ready to query
Standalone synthetic datasets for specific domains. Each includes injected anomalies for ML training, realistic distributions, and deterministic generation. Free samples on Hugging Face.
IoT Sensor Telemetry
3 tables · 50K rows
4 sensors, 6 months of readings, drift & failure anomalies
IT Help Desk Tickets
5 tables · 34K rows
10K tickets, SLA tracking, agent performance, outage anomaly
SaaS Subscription Billing
6 tables · 52K rows
5K customers, MRR growth, churn analytics, trial conversion
Food Delivery Platform
4 tables · 34K rows
25K orders, 120 restaurants, Super Bowl & snowstorm anomalies
Real Estate Listings
5 tables · 40K rows
8K properties, price history, seasonal patterns, market correction
Email Campaign Analytics
4 tables · 56K rows
15K subscribers, 157 campaigns, bounce & viral anomalies
Call Center Records
3 tables · 20K rows
20K calls, IVR paths, CSAT scores, VoIP outage anomaly
University Student Grades
5 tables · 39K rows
4,200 students, 6 semesters, COVID & cheating anomalies
Hotel Reservations
4 tables · 22K rows
13K bookings, 3 properties, seasonal pricing, conference surge
City Parking Violations
3 tables · 15K rows
15K citations, 50 zones, repeat offenders, meter failure anomaly
Web Server Access Logs
2 tables · 50K rows
3 servers, DDoS attack & database outage anomalies
Gym & Fitness Memberships
4 tables · 23K rows
2,500 members, New Year surge, 50% Jan churn, TikTok spike
Library Book Loans
4 tables · 37K rows
20K loans, 12K catalog, summer reading surge, branch closure
Vehicle Fleet Management
5 tables · 13K rows
60 vehicles, trips, maintenance, fuel price spike anomaly
Agricultural Crop Yields
4 tables · 906 rows
25 farms, 5 seasons, drought & pest outbreak anomalies
Free Samples
Try before you buy
Download free sample datasets on GitHub, Kaggle, Hugging Face, or Gumroad.
FAQ
Common questions
Everything you need to know about the datasets.
What format are the datasets in?
Every dataset ships in 4 formats: CSV (universal), PostgreSQL SQL (schema + dump), Apache Parquet (analytics), and SQLite (single-file database). All formats are included in every purchase.
Are these real company datasets?
No. Every dataset is 100% synthetic — generated by a day-by-day business simulation engine. The companies are fictional but the data is realistic, with proper tax compliance, double-entry accounting, and 44 foreign key relationships across 42 tables.
What countries and industries are covered?
We cover 3 countries (Australia, United States, United Kingdom) and 3 industries (Retail, Hospitality, Professional Services). Each combination has genuine differences in tax rules, payroll cycles, seasonal patterns, and chart of accounts.
Can I use these datasets for training ML models?
Absolutely. The datasets are ideal for training and testing ML models, building ERP demos, BI dashboards, accounting software tests, data engineering pipelines, and educational purposes.
Is there a free sample?
Yes. Free samples are available on GitHub, Kaggle, Hugging Face, and Gumroad for all three countries.
Are the datasets deterministic?
Yes. Same seed always produces identical output. This means you can reproduce results, compare runs, and use them reliably in CI/CD pipelines and automated tests.
How many rows are in each dataset?
Single-company datasets range from 39K to 259K rows depending on industry and country. Multi-company bundles go up to 825K rows. Enterprise bundles with 5 companies reach 400K+ rows.
Ready to get started?
Stop building on toy data
Get a production-realistic dataset in minutes. Free samples available on GitHub, Kaggle, and Hugging Face.