The problem.

I'm a student. My professors are great at teaching, but their course materials are stuck in the past: scanned photocopies of handwritten notes from the 80s, image-only PDFs with no text layer, no copy-paste, no search. Dense physics, statics, fluid dynamics, differential equations, tensor notation, all locked inside blurry images. I tried the usual OCR tools. Tesseract chokes on ∂²u/∂x². Google Docs mangles integrals. Nothing really works for STEM content.

The idea.

So I built Palimpsest: a tool that takes those old scans and rewrites them as proper, structured, beautifully typeset LaTeX documents. Drop a PDF, get back a clean .tex source and a compiled .pdf. That's it. The name comes from the old manuscript pages that were scraped clean and rewritten over, which is basically what the tool does to a tired scan.

How it works.

Under the hood it's a small pipeline: extract pages at high DPI, clean them up with OpenCV (binarize, deskew, denoise), then send each page to a vision model that reads the formulas directly and produces LaTeX. A context layer keeps track of variables and notation across pages so equations stay consistent over a 50-page document. Everything is then merged into a single .tex file and compiled with xelatex. There's a CLI for power users and a small web UI for everyone else: drag, drop, watch the pages get processed live, download the result.

Open source.

Palimpsest is free and open source under the MIT licence. I built it for myself, but it solves a problem a lot of students share, so I put it on GitHub for anyone who wants to use it, fork it, or contribute. Issues and PRs are welcome. If you're also stuck with terrible scanned PDFs, jump in. The repo is here: github.com/cmrabdu/Palimpsest.

I like the kind of small tools that quietly remove a daily friction.

Why I built it.

This one started as a weekend hack to make my own revisions easier and ended up being a proper little project, with a Docker image, a self-hosted deployment, and a roadmap I keep poking at. No profit, no startup, no catch. Just a way to make old knowledge accessible again.