How do I combine multiple bank statement CSV files into one, remove duplicates, and export to Excel? — Blog

Closing the books shouldn’t mean copying and pasting. If you’re juggling monthly CSV downloads from a bunch of bank accounts, let’s fix that. I’ll show you how to combine multiple bank statement CSV files into one Excel workbook, remove duplicates from overlapping statements, and export a clean .xlsx you can actually trust.

You’ll see how to standardize messy inputs (different date formats, decimal separators, and turning Debit/Credit into one signed Amount), decide what counts as a duplicate (pending vs. posted, reversals, and those annoying same-day same-amount charges), and build a unique key that won’t nuke real transactions. Then we’ll run through practical options: Excel Power Query, Google Sheets + Apps Script, a simple Python/pandas script, and a faster path with BankXLSX for automated ingestion (CSV and PDF), smart merging, and an audit trail per file.

We’ll wrap with quick checks against statement balances and a simple way to automate your month-end so your Excel exports are accurate, consistent, and ready whenever you are.

What you’ll achieve and who this guide is for

By the end, you’ll have one clean ledger in Excel built from all your bank CSVs—duplicates removed, columns standardized, and ready for reporting or import. No more ten tabs and a bunch of VLOOKUPs that break the minute a new file shows up.

This is for controllers, bookkeepers, finance leads, and operators who care about accuracy and time. If you’re fine paying for tools that save hours and keep auditors calm, you’re in the right place. We’ll use Excel Power Query, Google Sheets, Python, and, when you’re ready to stop babysitting files, BankXLSX.

We’ll call out real-world gotchas like negative amounts in parentheses, day-first dates, stray headers in the middle of a file—plus a simple way to merge multi-account bank statements into a single ledger without mixing things up.

Quick answer: your options at a glance

Most teams start with Excel Power Query. Use “From Folder” to append a pile of CSVs, map columns into a consistent shape, build a de-dupe key, and export to .xlsx. It’s quick, repeatable, and refreshes when you drop new files in the folder.

Prefer the browser? Google Sheets + a tiny Apps Script can pull CSVs from Drive, dedupe, and let teammates comment before you download to Excel. If you’re comfortable with code, Python/pandas handles big volumes and runs on a schedule. Want automation and an audit trail? BankXLSX ingests CSVs and PDFs, applies banking-aware rules (like keeping posted over pending), validates balances, and exports one polished .xlsx—hands-off after setup.

What you need before you start

Organize your files. Create folders by bank and account (for example: 2024_Checking_1234). Write down your target schema: Date, Description, Amount (signed), Currency, Account, Balance, and Transaction ID/Reference if you have it. Decide now if you’ll exclude “Pending.”

Add a SourceFile column for an audit trail—just the filename is enough to answer “Where did this come from?” later. If you plan to merge multi-account bank statements into a single ledger, pick a clear Account label (IBAN, last 4, or a short name) and include it in every row.

Bonus prep: list common header synonyms (Posting Date, Book Date, Value Date; Details, Narrative) so you can map them once and reuse every month.

Inspect and standardize your CSVs (avoid schema traps)

Banks don’t agree on anything. Expect mixed date formats (DD/MM vs MM/DD), amounts with commas for decimals (1.234,56), and negatives in parentheses. Import with the right locale, convert parentheses to negatives, and turn separate Debit/Credit fields into one signed Amount.

Clean up descriptions: uppercase, trim, collapse double spaces, remove junk punctuation. That helps matching. Watch for repeated header rows in the middle of data (happens when you export multiple months). If Currency is missing, set a default per account during import.

Weird characters usually mean encoding issues—reload as UTF‑8. If Account isn’t in the CSV, pull it from the filename so every row carries its identity.

Define “duplicate” for banking data (and what not to remove)

Duplicates happen when statement exports overlap—same account, same currency, same date, same amount, same (or very similar) description. Those should go. But treat pending vs. posted carefully: keep the posted one and drop the earlier pending if the amount matches and the reference looks the same within a couple of days.

Reversals and chargebacks aren’t duplicates; they’re a debit and a credit pair. Same-day, same-amount payments can both be legit (two rideshares, two small purchases)—look at references.

If you mix banks or currencies, include Account and Currency in your de-dupe key so you never collapse unrelated rows.

Build a robust unique key for deduplication

Use a banking-aware key. When a Transaction ID exists and is stable, that’s gold: Account | Currency | TransactionID. If not, build a composite: Account + Currency + normalized Date + signed Amount in cents + Normalized Description.

Normalize the text: uppercase, trim, collapse multiple spaces. Convert Debit/Credit to one Amount. Round to the minor unit (cents) to avoid tiny rounding quirks.

Handle pending vs. posted with a small window rule: if two rows share Account, Currency, signed Amount, and similar Description within 1–3 days, keep the one marked posted or the later date. In Power Query that’s a sort-and-group; in pandas, sort and drop_duplicates with keep='last'.

Method 1 — Excel Power Query: combine, dedupe, and export

In Excel, go to Data > Get Data > From File > From Folder. Pick your folder, then “Combine & Transform.” In the sample query, map the bank’s headers to your schema, convert Debit/Credit to a single Amount, set Date with the right locale, and add Account (pull from filenames if needed).

Create a Normalized Description (Clean, Trim, Uppercase). Add a UniqueKey like Date|Currency|Account|AmountInCents|NormalizedDescription. Use Remove Duplicates on that column. If you have a status, filter out “Pending,” or group by the non-status fields and keep the row with the higher status or later date.

Close & Load to a table, save as .xlsx, and you’re set. Next month, drop new CSVs into the folder and click Refresh. For stability, lock the schema with Choose Columns, set data types explicitly, and always include a SourceFile column.

Method 2 — Google Sheets + Apps Script: collaborative pipeline

Put all CSVs in a Google Drive folder. In Sheets, add a short Apps Script to read each file and append rows to a “Master” sheet with consistent headers. Normalize text during import (uppercase and trim) to keep formulas light.

Add an ARRAYFORMULA to build a UniqueKey (Date|Currency|Account|AmountInCents|NormalizedDescription). Use QUERY or UNIQUE to spit out a clean, deduped “Clean” sheet. Then File > Download > Microsoft Excel to get your .xlsx.

This works great for team review and comments. Just mind the row limits—keep it to a few hundred thousand rows total. Include SourceFile and maybe a LoadBatch timestamp so you can trace issues fast.

Method 3 — Python/pandas: scalable and auditable

Reading many CSVs with pandas is straightforward: use glob, map headers into your standard schema, parse dates (dayfirst when needed), and clean up amounts (strip thousands separators, turn parentheses into negatives). Build the UniqueKey and call drop_duplicates on it.

Sort by status and date so posted or latest wins. Add a SourceFile column for traceability. Export with to_excel and you’re done. Store your rename map and rules in JSON or YAML so non-coders can request tweaks without touching code.

Schedule it with cron or a task runner. Add a quick reconciliation step: group by month and compare net change to expected statement deltas. If you later move to BankXLSX, your normalization rules carry over easily.

Handling real-world banking pitfalls

Overlapping months? Very common. Your de-dupe key must work across files and months. Locale stuff will bite: lock in date and decimal settings per bank or account to avoid quiet errors.

CSV quirks: repeated headers in the middle of data, embedded commas/quotes, even line breaks inside descriptions. Use a parser that respects quoted fields. Convert CR/DR flags into a single signed Amount.

Multiple currencies in one dataset? Always include Currency in the key. For pending/posting, keep a 1–3 day window and prefer posted (or the later date). Watch for subtle schema changes over time—whitelist expected columns and fail loudly if something new appears.

Reconciliation and quality checks before you trust the output

Count rows first: how many in, how many out after dedupe, how many final. A small drop (often a few percent) is normal when removing duplicates from overlapping statements.

Reconcile monthly: compare the net change in your merged data to opening/closing statement balances. Build a quick pivot—sum Amount by Month and Account—and look for mismatches. Seed a running balance and make sure it lands on each closing balance from the statements.

Do a few spot checks. Pick high-value items and a random handful, tie them back to SourceFile (and PDF if you have it). Keep an “Exceptions” tab for rows you excluded on purpose (like pending) with a reason.

Automate and audit with BankXLSX (fastest SaaS route)

When you want this done with less babysitting, BankXLSX pulls your CSVs (and PDFs), detects formats, maps fields, and applies banking-aware dedupe. You can force rules like “keep posted,” avoid flagging reversal pairs, and pick Transaction ID or a composite key.

It runs balance checks, spots gaps or overlaps, and shows a preview before export. One click gives you an Excel .xlsx—either one consolidated tab or split by account. You can schedule monthly runs and get a summary by email or webhook.

Every row carries SourceFile, and PDFs can include SourcePage, so answering “where did this come from?” takes seconds. It’s a reliable way to convert bank statement CSV to Excel with less manual effort and better traceability.

Scheduling, governance, and collaboration

Set a monthly routine tied to when statements land. Decide who uploads files, who reviews duplicates, and who signs off on reconciliations. Keep a simple log: timestamps, file counts, validation results.

In Excel, store the workbook in a shared drive and protect the query sheets. In Python, commit the script and the config to version control. In BankXLSX, use role-based access and check the processing logs.

Add a SourceFile column everywhere and include a LoadBatch ID for quick forensics. Set alerts for weird row count swings, missing files, or balance mismatches. If your data grows past Excel’s limits (~1,048,576 rows per sheet), split by account or month, move to the data model, or push the job to Python or BankXLSX.

Troubleshooting common issues

Dates show up as text? In Power Query, set Data Type using the correct Locale. In pandas, use to_datetime with dayfirst or a specific format. Amounts won’t parse? Remove thousands separators, convert parentheses to negatives, and check decimal settings.

Seeing header rows in the middle? Filter rows where Date equals the literal “Date” or where date parsing fails. Still getting duplicates? Make sure your key includes Account and Currency so you don’t merge across ledgers.

Pending vs. posted confusion? Sort by a status weight and date, then group and keep the last. Encoding gibberish (�)? Re-import as UTF‑8. Power Query broke after new files? Lock the schema with Choose Columns and set defaults for optional headers. Huge datasets? Stage in Power Query or switch to Python/BankXLSX.

FAQs

How do I ensure I’m not removing legitimate repeats?
Use a composite key (Account, Currency, Date, signed Amount in cents, Normalized Description) with a 1–3 day window that keeps posted and drops pending. Don’t de-dupe debit/credit reversal pairs.

Can I mix banks and currencies?
Yes. Standardize columns first, then include Account and Currency in your key to avoid cross-ledger collisions.

What if my CSVs don’t share headers?
Create a small mapping from each bank’s headers to your standard schema. In Power Query, use Transform Sample File; in Python, use a rename map.

How big can Excel go?
Roughly 1,048,576 rows per sheet. Above that, split the output, use the data model, move to Python, or automate with BankXLSX.

How do I keep posted and drop pending?
Add a Status field (or infer from the description), sort by status and date, and keep the later entry for the same Account/Currency/Amount/Description.

Templates and checklists

Use these basics to keep your process consistent and quick to review.

Pre-merge checklist:

All files grouped by account and bank
Target schema documented
Dedupe strategy chosen (Transaction ID or composite)
SourceFile and LoadBatch fields added

Sample rename map:

Posting Date, Book Date → Date
Details, Narrative → Description
Debit/Credit → Amount (signed)

Normalization rules:

Uppercase + trim descriptions; collapse whitespace
Convert parentheses to negatives; remove thousands separators
Parse dates with explicit locale settings

Validation checklist:

Row count before/after de-dupe
Monthly net change vs. statements
Running balance spot checks
High-value and random sample ties to SourceFile

These lightweight pieces help you automate a monthly merge-and-dedupe workflow without overcomplicating anything.

Key Points

Build a repeatable flow: combine multiple bank statement CSV files into one Excel ledger with Power Query (From Folder) or automate end-to-end with BankXLSX (CSV/PDF ingestion, Smart Merge, one-click .xlsx export).
Deduplicate safely: prefer Transaction ID when available; otherwise use Account | Currency | Date | SignedAmountInCents | NormalizedDescription. Keep posted over pending within 1–3 days, and never remove debit/credit reversals.
Standardize first: map headers, convert Debit/Credit to a signed Amount, fix dates and decimal separators, handle negative parentheses and encoding, and add a SourceFile for audit trails.
Validate and automate: reconcile monthly net change to statements, run row-count and running-balance checks, then schedule a monthly workflow (split output by account if needed) to ship trusted .xlsx files on demand.

Next steps

Pick one account and the last six months as a pilot. Set up Power Query (no code), a small Python script, or BankXLSX if you want hands-off runs. Build your de-dupe key, test pending vs. posted handling, and make sure reversals stay put.

Add SourceFile and LoadBatch, write down the steps, and reconcile to opening/closing balances. Once you’re happy, roll it out to all accounts and set a monthly schedule. If your team wants a central place with audit logs and PDF support, move the process into BankXLSX and export clean .xlsx files whenever you need them.

Conclusion

The recipe is simple: standardize your columns, de-dupe with a banking-aware key, and reconcile to statement balances. Whether you use Excel, Sheets, or Python, track SourceFile, prefer posted, and round to cents to avoid silly mismatches. Ready to automate? Let BankXLSX ingest your CSVs and PDFs, apply Smart Merge with an audit trail, and export clean .xlsx on a schedule. Try it or grab a demo and see the workflow end to end.