Why Your Test Data Doesn't Balance — And Why It Matters for ERP Testing
Most synthetic datasets ignore accounting fundamentals. Here's why double-entry bookkeeping in test data catches bugs that random data never will.
The Problem
You're testing an ERP system, accounting software, or a BI dashboard. You import test data. Everything looks fine — until someone runs a trial balance and the numbers don't add up.
This happens because most synthetic data generators treat financial tables as independent entities. They generate random invoices, random payments, and random journal entries — with no guarantee that debits equal credits, that payments match invoices, or that bank reconciliations balance.
In real businesses, every financial transaction touches multiple tables simultaneously. A single sale creates records in at least 6 tables: Sales Order → Order Lines → Invoice → Invoice Lines → Journal Entry → Journal Lines → Bank Transaction.
If your test data doesn't preserve these relationships, you're testing with data that could never exist in production.
What Double-Entry Accounting Means for Data
Double-entry bookkeeping is a 500-year-old system with one iron rule:
Every debit must have a corresponding credit of equal value.
In database terms, this means:
SUM(debit_amount) = SUM(credit_amount) -- Always, for every journal entry
This constraint cascades through the entire financial schema:
| Relationship | Constraint |
|---|---|
| Invoice → Payment | Payment amount ≤ Invoice amount |
| Payment → Bank | Bank transaction matches payment |
| Bank → Journal | Journal entry auto-generated |
| Journal → GL | General ledger is sum of journals |
| GL → Trial Balance | Total debits = Total credits |
| Trial Balance → P&L / BS | Revenue - Expenses = Net Profit |
If any link in this chain breaks, the entire financial picture is wrong.
What Goes Wrong with Random Data
Here's what happens when you use randomly generated financial data:
1. Orphaned Records
Payments that reference invoices that don't exist. Journal entries without corresponding bank transactions. This triggers referential integrity errors in any properly designed system.
2. Impossible Balances
A payment of $5,000 against an invoice of $3,000. A credit note larger than the original sale. These create negative balances that real accounting systems would reject.
3. Missing Temporal Logic
An invoice dated before the sales order. A payment received before the invoice was issued. A bank reconciliation for a month with no transactions. Time-series analytics break immediately.
4. Tax Calculation Errors
GST/VAT/Sales tax calculated at wrong rates, or not calculated at all. PAYG/PAYE withholding that doesn't match the employee's tax bracket. These errors are invisible in testing but catastrophic in production.
How Simulation Fixes This
A simulation engine doesn't generate data — it operates a virtual business. Here's the sequence for a single sale:
Day 45: Customer PO received
→ Create Sales Order (sales_orders table)
→ Create Order Lines (order_lines table)
→ Reserve Inventory (inventory_movements table)
Day 47: Goods shipped
→ Update Sales Order status
→ Create Invoice (invoices table)
→ Create Invoice Lines (invoice_lines table)
→ Create Journal Entry: Debit Accounts Receivable, Credit Revenue
→ Create Journal Lines (2 rows, balanced)
→ Apply GST/VAT/Sales Tax per country rules
Day 62: Payment received (Net 15 terms)
→ Create Payment record (payments table)
→ Create Bank Transaction (bank_transactions table)
→ Create Journal Entry: Debit Bank, Credit Accounts Receivable
→ Update Invoice status to 'Paid'
→ Mark for Bank Reconciliation
Every step is causally linked. Every journal entry balances. Every payment matches an invoice. Every bank transaction has a corresponding journal entry.
The 42-Table Schema
Our simulation engine produces 42 interconnected tables across 7 business domains:
| Domain | Tables | Key Relationships |
|---|---|---|
| Core Business | Company, Customers, Suppliers, Products, Employees | Foundation entities |
| Sales & Orders | Sales Orders, Order Lines, Invoices, Invoice Lines, Credit Notes | Customer → Order → Invoice chain |
| Purchasing | Purchase Orders, PO Lines, Bills, Bill Lines, Goods Received | Supplier → PO → Bill chain |
| Inventory | Inventory, Stock Movements, Warehouses, Reorder Rules | Movement tracking with FIFO/LIFO |
| Accounting | Chart of Accounts, Journal Entries, Journal Lines, GL | Double-entry core |
| HR & Payroll | Payroll Runs, Pay Slips, Leave, Tax Withholdings | Country-specific tax brackets |
| Banking | Bank Accounts, Bank Transactions, Reconciliations, Allocations | Full reconciliation chain |
44 foreign key relationships connect these tables into a single, consistent business reality.
Real Tax Compliance
Each country simulation uses actual tax rules:
| Country | Tax Rules Applied |
|---|---|
| Australia | ATO PAYG brackets, GST 10%, BAS quarterly, Superannuation 11.5% |
| United States | IRS federal brackets, FICA 7.65%, state sales tax ~7.5%, 401(k) |
| United Kingdom | HMRC PAYE brackets, NI 10%+13.8%, VAT 20%, Pension 8% |
These aren't approximations — they're the actual bracket tables and rates that a real business would use.
Verification
Every dataset we ship passes these invariants:
- ✅ All journal entries balance (debits = credits)
- ✅ Trial balance balances (total debits = total credits)
- ✅ All foreign keys resolve (no orphaned records)
- ✅ Payment amounts ≤ Invoice amounts
- ✅ Inventory never goes negative
- ✅ Payroll withholdings match tax brackets
- ✅ Bank reconciliations balance against GL
You can verify this yourself with the free sample datasets.
Who Needs This
- ERP vendors testing multi-entity, multi-currency scenarios
- Accounting software teams validating trial balance and reporting
- BI/analytics teams building dashboards with realistic data
- Data engineers testing ETL pipelines with complex schemas
- Educators teaching accounting, databases, or data engineering
- ML teams training models on structured business data
Browse datasets → | Free samples on GitHub
*References: Double-entry bookkeeping principles per IFRS/GAAP standards. Tax rates from ATO, IRS, and HMRC official schedules (2024-2025).*
Ready to try production-realistic data?
42 tables, double-entry accounting, real tax compliance. Free samples available.