How to convert PDF tables to Excel

PDFs were never designed to be edited. When someone sends you a financial report, a price list or a registry export as a PDF and you need it in Excel, the right approach depends entirely on what kind of PDF it is. A digital PDF (created from Word or Excel) still contains the actual numbers as text, so extraction is fast and accurate. A scanned PDF is a picture of a table, so you need OCR — optical character recognition — to read the numbers off the image. This guide covers both, including the editing steps that turn ragged extraction into clean Excel data.

  1. Check whether the PDF is digital or scanned. Open the PDF and try to select a number with your cursor. If the cursor highlights the digits, it's a digital PDF — extraction will be accurate. If the cursor draws a rectangle (no text selectable), it's a scan and you'll need OCR.
  2. For digital PDFs: copy-paste. Select the table, copy, and paste into Excel. If columns merge into one, paste into Word first, then copy the table from Word to Excel — Word interprets the column structure better than Excel does directly.
  3. For digital PDFs: use Excel's Get Data. Excel 2016+ has Data → Get Data → From File → From PDF. It detects every table on every page and lets you pick which to import. The result is fully editable, with column types inferred automatically.
  4. For scanned PDFs: run OCR first. Use Adobe Acrobat (Tools → Recognise Text → In This File) or a free alternative like OCRmyPDF. After OCR, the scan becomes a hybrid PDF you can copy text from — then follow the digital-PDF steps above.

Your files stay on your device

All processing happens locally in your browser using JavaScript. We never upload, store or look at your files.

Real cases where this comes up

A bank statement export

Most banks export PDFs that are digital. Use Excel → Get Data → From PDF and you'll have one row per transaction in seconds. Convert dates to date format and you can pivot, sum, and filter immediately.

A scanned invoice from a supplier

Run OCR first (Acrobat or OCRmyPDF), then copy-paste the line items into Excel. Expect to fix one or two characters where the OCR mis-read a 0 as O — accuracy on a clean scan is around 99%.

A government data report

Many open-data PDFs use multi-row headers that confuse extractors. Import to Excel, then merge the header rows manually and clean up the column names. Tools like Tabula are better for these complex layouts.

Catalogue prices from a competitor

If the catalogue is digital, Excel → Get Data picks up every product as one row. If it's scanned, OCR + manual cleanup is faster than retyping.

PDF-to-Excel pitfalls

Numbers came in as text

Excel often imports digits with hidden non-breaking spaces. Select the column → Data → Text to Columns → Delimited → Finish. That re-parses every cell and converts strings to numbers. For currency, then format the column as Currency.

Columns merged into one

PDF text is positioned, not tabulated. The extractor sees "Apple 1.20 1" as one string. Try the paste-via-Word trick, or use Excel's Get Data which is much better at detecting column boundaries from coordinates.

OCR mis-read characters

0 vs O, 1 vs l, 5 vs S are the usual culprits. Run a Find-Replace pass on the suspect column. Better: re-scan at 300 DPI or higher; OCR accuracy drops sharply below 200 DPI.

Multi-page tables import as separate sheets

That's how Excel's Get Data behaves by default. After import, copy each sheet's data underneath the previous one in a master sheet, or use Power Query's Append Queries to combine them automatically.

Decimal separators are wrong

European PDFs use "1.234,56"; US Excel reads this as text. In Power Query, set the locale during import (Use Locale → Dutch / German / French) so the decimals parse correctly.

Frequently asked questions

Related PDF tasks