How to Extract Transactions from a Bank Statement PDF (3 Ways)

Jun 24, 2026

Convert your bank statement to Excel now

PDF, JPG, PNG, BMP, HEIC, TIFF

Upload your bank statement

Extract:
|
|

You can extract transactions from a bank statement PDF in three ways: upload it to a converter and download a spreadsheet with no code, call a bank statement parser API that hands your app the transactions as JSON, or write your own script with a Python PDF library. This guide walks through all three, with working code, and shows where each one earns its place.

Why a bank statement PDF is hard to parse

A PDF stores text by position on the page, not as a table with rows and columns. When you copy a statement into a spreadsheet, the date, description, and amount land in one cell, multi-line descriptions split across rows, and the repeated page headers scatter your data. Debits and credits often share a column, the running balance hides at the far right, and a scanned or photographed statement has no text layer to read at all. Those quirks are why a quick copy and paste almost never gives you clean numbers, and why every approach below has to deal with structure, not just text.

How do I extract data from a bank statement PDF?

The fastest way to extract data from a bank statement PDF is to run it through a tool built for the job rather than copying by hand. For one statement or a handful, a no-code converter gives you a clean spreadsheet in under a minute. For software that needs to do this on its own, a parser API returns the data as JSON. The three methods below cover both, plus the build-it-yourself route.

Method 1: A no-code converter (one or a few statements)

If you just need the numbers out of a few statements, drag the PDF into a bank statement converter and download the result as Excel or CSV with date, description, debit, credit, and running balance columns already separated. It reads digital PDFs, scanned statements, and phone photos, so you do not have to retype anything. When your destination is a spreadsheet, the PDF bank statement to Excel converter produces a workbook you can sort, filter, and total straight away. This is the right tool when a person, not a program, is doing the conversion.

Is there an API to parse bank statements?

Yes. When the parsing has to happen inside your product, on a user's behalf, or at a volume nobody wants to click through, use a bank statement parser API. You send the file over HTTP and get the transactions back as structured data your code can store or post to a ledger, with no spreadsheet step in the middle.

Method 2: A bank statement parser API (inside your product)

The bank statement converter API uses a three-call workflow: upload the file, start an extraction, then fetch the parsed transactions. Here is the full round trip in Python with the requests library.

import requests

TOKEN = "YOUR_API_TOKEN"
auth = {"Authorization": f"Bearer {TOKEN}"}

# 1. Upload the statement
up = requests.post(
    "https://bankxlsx.com/api/documents/upload",
    headers=auth,
    files={"document": open("statement.pdf", "rb")},
).json()
file_id = up["data"]["file_id"]

# 2. Start the extraction
ex = requests.post(
    "https://bankxlsx.com/api/documents/extract",
    headers={**auth, "Content-Type": "application/json"},
    json={"file_id": file_id},
).json()
extraction_hash = ex["data"]["extraction_hash"]

# 3. Retrieve the parsed transactions
result = requests.get(
    f"https://bankxlsx.com/api/documents/extraction/{extraction_hash}",
    headers=auth,
).json()
print(result["data"]["fields"])

The response is JSON: account-level fields such as account number, statement date, and balance, plus one object per transaction with its date, description, and amount. That maps cleanly onto a ledger row, so you can store it or categorize it without touching a spreadsheet. The full endpoint and field reference lives on the bank statement API page, and because it reads more than 90 US bank layouts and runs OCR on scans, you skip building a parser per bank.

Can Python read a bank statement PDF?

Yes, Python can read a bank statement PDF, and for a single known bank layout a homegrown script can work well. The usual libraries are pdfplumber and Camelot for digital PDFs, with an OCR step from pytesseract or a cloud service when the statement is scanned. A simple table grab looks like this.

import pdfplumber

rows = []
with pdfplumber.open("statement.pdf") as pdf:
    for page in pdf.pages:
        for table in page.extract_tables():
            rows.extend(table)

for row in rows:
    print(row)

This gets you a quick proof of concept, but the gap between a demo and production is wide. You end up writing a separate parser for each bank, re-tuning it every time a bank redesigns its statement, and bolting on OCR for scanned files and a cleanup pass for amounts that arrive as text. For one fixed layout it is a weekend project. For many banks it quietly becomes a product of its own, which is why most teams move from a script to an API once they support more than one or two institutions.

How do I convert a bank statement PDF to JSON?

To convert a bank statement PDF to JSON, send it to a parser API and read the JSON it returns. The three-call workflow above gives you the account fields and every transaction as JSON objects, ready to consume directly in your code. If you would rather have a flat file for an import job instead of an integration, the same conversion also exports to JSON, CSV, or Excel without writing any code.

Can I extract transactions from a scanned bank statement?

Yes. A scanned or photographed statement is an image with no text to copy, so you need OCR to read it. Both the converter and the API run OCR automatically on scanned PDFs and on JPG, PNG, HEIC, and TIFF photos, returning the same structured transactions you get from a digital PDF. With a homegrown Python script, OCR is an extra layer you have to add and tune yourself, which is one more reason teams reach for a managed parser as soon as real-world uploads include scans.

Which method should you use?

Your situationBest method
A few statements, no code, need a spreadsheetNo-code converter
Parsing inside your app, many banks, or high volumeParser API
One fixed bank layout and you want full controlPython script

After you have the transactions

Bank statements are rarely the only document a finance workflow touches. The same API approach reads other documents like contracts and tax forms, turns invoices into structured data for accounts payable, and if files reach you by email you can parse incoming emails into JSON before they hit your queue. If you are building for lenders, the bank statement converter for lenders covers the underwriting angle, and automated bank statement processing handles incoming files hands-off once your pipeline is in place.