Menu OCR

Layout-aware OCR pipeline for extracting structured menu items and prices from noisy restaurant photos.

November 8, 2023ยท 7 min read

Problem

Restaurant menu images are highly inconsistent in layout, font quality, and lighting, which makes raw OCR output unreliable for downstream structured use.

Solution

Built a modular OCR system combining PaddleOCR and OpenCV preprocessing, then applied DBSCAN-based column grouping, FSM/Viterbi hierarchy enforcement, and Hungarian matching for robust item-price pairing.

Pipeline

  1. Image normalization for contrast, skew, and noise reduction.
  2. OCR extraction of text candidates and bounding regions.
  3. Layout reconstruction to infer columns and item blocks.
  4. Semantic correction to map sections, items, and prices.

Why It Worked

The quality improvement came from treating OCR as a full information extraction problem rather than just text recognition, with explicit modeling of layout and business rules.