Project Overview

GPT Document Intelligence

Upload entire PDF libraries and query them with natural language. Built on LangChain, OpenAI embeddings, and Pinecone. Features hybrid BM25 + dense retrieval, citation tracking, and a streaming chat UI. Handles 10k-page corpora with sub-200ms retrieval.

PythonLangChainOpenAIPineconeFastAPI
PrivateBack to Projects
GALLERY
Hybrid BM25 + dense retrieval results
PDF library upload & indexing pipeline

PDF library upload & indexing pipeline

Streaming natural language chat with citations