Back to Blog
Technical2026-04-0310 min

Why Your Test Data Doesn't Balance — And Why It Matters for ERP Testing

Most synthetic datasets ignore accounting fundamentals. Here's why double-entry bookkeeping in test data catches bugs that random data never will.

The Problem

You're testing an ERP system, accounting software, or a BI dashboard. You import test data. Everything looks fine — until someone runs a trial balance and the numbers don't add up.

This happens because most synthetic data generators treat financial tables as independent entities. They generate random invoices, random payments, and random journal entries — with no guarantee that debits equal credits, that payments match invoices, or that bank reconciliations balance.

In real businesses, every financial transaction touches multiple tables simultaneously. A single sale creates records in at least 6 tables: Sales Order → Order Lines → Invoice → Invoice Lines → Journal Entry → Journal Lines → Bank Transaction.

If your test data doesn't preserve these relationships, you're testing with data that could never exist in production.


What Double-Entry Accounting Means for Data

Double-entry bookkeeping is a 500-year-old system with one iron rule:

Every debit must have a corresponding credit of equal value.

In database terms, this means:

SUM(debit_amount) = SUM(credit_amount)  -- Always, for every journal entry

This constraint cascades through the entire financial schema:

Relationship Constraint
Invoice → Payment Payment amount ≤ Invoice amount
Payment → Bank Bank transaction matches payment
Bank → Journal Journal entry auto-generated
Journal → GL General ledger is sum of journals
GL → Trial Balance Total debits = Total credits
Trial Balance → P&L / BS Revenue - Expenses = Net Profit

If any link in this chain breaks, the entire financial picture is wrong.


What Goes Wrong with Random Data

Here's what happens when you use randomly generated financial data:

1. Orphaned Records

Payments that reference invoices that don't exist. Journal entries without corresponding bank transactions. This triggers referential integrity errors in any properly designed system.

2. Impossible Balances

A payment of $5,000 against an invoice of $3,000. A credit note larger than the original sale. These create negative balances that real accounting systems would reject.

3. Missing Temporal Logic

An invoice dated before the sales order. A payment received before the invoice was issued. A bank reconciliation for a month with no transactions. Time-series analytics break immediately.

4. Tax Calculation Errors

GST/VAT/Sales tax calculated at wrong rates, or not calculated at all. PAYG/PAYE withholding that doesn't match the employee's tax bracket. These errors are invisible in testing but catastrophic in production.


How Simulation Fixes This

A simulation engine doesn't generate data — it operates a virtual business. Here's the sequence for a single sale:

Day 45: Customer PO received

→ Create Sales Order (sales_orders table)

→ Create Order Lines (order_lines table)

→ Reserve Inventory (inventory_movements table)

Day 47: Goods shipped

→ Update Sales Order status

→ Create Invoice (invoices table)

→ Create Invoice Lines (invoice_lines table)

→ Create Journal Entry: Debit Accounts Receivable, Credit Revenue

→ Create Journal Lines (2 rows, balanced)

→ Apply GST/VAT/Sales Tax per country rules

Day 62: Payment received (Net 15 terms)

→ Create Payment record (payments table)

→ Create Bank Transaction (bank_transactions table)

→ Create Journal Entry: Debit Bank, Credit Accounts Receivable

→ Update Invoice status to 'Paid'

→ Mark for Bank Reconciliation

Every step is causally linked. Every journal entry balances. Every payment matches an invoice. Every bank transaction has a corresponding journal entry.


The 42-Table Schema

Our simulation engine produces 42 interconnected tables across 7 business domains:

Domain Tables Key Relationships
Core Business Company, Customers, Suppliers, Products, Employees Foundation entities
Sales & Orders Sales Orders, Order Lines, Invoices, Invoice Lines, Credit Notes Customer → Order → Invoice chain
Purchasing Purchase Orders, PO Lines, Bills, Bill Lines, Goods Received Supplier → PO → Bill chain
Inventory Inventory, Stock Movements, Warehouses, Reorder Rules Movement tracking with FIFO/LIFO
Accounting Chart of Accounts, Journal Entries, Journal Lines, GL Double-entry core
HR & Payroll Payroll Runs, Pay Slips, Leave, Tax Withholdings Country-specific tax brackets
Banking Bank Accounts, Bank Transactions, Reconciliations, Allocations Full reconciliation chain

44 foreign key relationships connect these tables into a single, consistent business reality.


Real Tax Compliance

Each country simulation uses actual tax rules:

Country Tax Rules Applied
Australia ATO PAYG brackets, GST 10%, BAS quarterly, Superannuation 11.5%
United States IRS federal brackets, FICA 7.65%, state sales tax ~7.5%, 401(k)
United Kingdom HMRC PAYE brackets, NI 10%+13.8%, VAT 20%, Pension 8%

These aren't approximations — they're the actual bracket tables and rates that a real business would use.


Verification

Every dataset we ship passes these invariants:

  • All journal entries balance (debits = credits)
  • Trial balance balances (total debits = total credits)
  • All foreign keys resolve (no orphaned records)
  • Payment amounts ≤ Invoice amounts
  • Inventory never goes negative
  • Payroll withholdings match tax brackets
  • Bank reconciliations balance against GL

You can verify this yourself with the free sample datasets.


Who Needs This

  • ERP vendors testing multi-entity, multi-currency scenarios
  • Accounting software teams validating trial balance and reporting
  • BI/analytics teams building dashboards with realistic data
  • Data engineers testing ETL pipelines with complex schemas
  • Educators teaching accounting, databases, or data engineering
  • ML teams training models on structured business data

Browse datasets → | Free samples on GitHub


*References: Double-entry bookkeeping principles per IFRS/GAAP standards. Tax rates from ATO, IRS, and HMRC official schedules (2024-2025).*

Ready to try production-realistic data?

42 tables, double-entry accounting, real tax compliance. Free samples available.