On-premises AI document processing system for a law firm that handles depositions, contracts, and case files while maintaining attorney-client privilege.

Qwen 2.5 (72B) · Python · LangChain · FastAPI · Docker · PyTorch · NVIDIA H100 · Tesseract OCR · Whisper
A major international law firm approached us in early 2024 needing an AI system to process legal documents faster. They couldn't use cloud-based AI services like ChatGPT or Claude because attorney-client privilege required all data to stay on their own servers. Document review was taking weeks, and they needed a solution that would speed up this process while keeping everything secure.
We built an on-premises AI system using Qwen 2.5 that runs entirely on their hardware. The system processes depositions, contracts, legal briefs, and case files. It extracts key information, generates summaries, and helps attorneys find relevant information quickly.
The firm's attorneys spent most of their time manually reading through depositions, discovery documents, and case files. For a typical case, they would receive hundreds or thousands of pages of documents that needed to be reviewed, tagged, and summarized. This process took 4-6 weeks per case.
They had looked at commercial AI services but couldn't use them. Attorney-client privilege means they cannot send confidential case information to external servers. Everything had to run on their own infrastructure with no data leaving their network.
We implemented the system in three stages over six months. Each stage built on the previous one to gradually introduce AI capabilities.
| Stage | Focus Area | Status | Key Deliverables |
|---|---|---|---|
| 1 | Infrastructure Setup | Completed | NVIDIA H100 server installation, Qwen 2.5 model deployment, security configuration, network isolation, encrypted storage |
| 2 | Document Processing | Completed | OCR for scanned documents, text extraction pipeline, deposition transcription, document categorization, search indexing |
| 3 | AI Analysis Tools | Completed | Document summarization, key fact extraction, timeline generation, citation finder, question answering interface |
We deployed Qwen 2.5 (72 billion parameters) on a server with four NVIDIA H100 GPUs. The model runs entirely on their premises with no internet connection to the AI components. All data stays within their network.
The system handles multiple document types: scanned PDFs require OCR processing first, audio depositions go through speech-to-text conversion, and native digital documents are processed directly. We set up automated pipelines that process incoming documents overnight so attorneys have results ready each morning.
The system extracts text from scanned documents using OCR, then applies the Qwen model to understand content and structure. For depositions, it transcribes audio recordings and identifies speakers, questions, and answers. The AI recognizes legal terminology and understands context specific to legal documents.
Attorneys can upload documents through a web interface. The system categorizes each document (deposition, contract, motion, discovery, etc.), extracts key dates and parties, and generates a summary. Everything is searchable, and attorneys can ask questions about the documents in natural language.
Timeline Generator: Automatically creates chronological timelines by extracting dates and events from multiple documents. For a case with 50 depositions and hundreds of exhibits, it builds a timeline showing what happened when.
Citation Finder: Identifies when documents reference case law, statutes, or other legal precedents. Attorneys can quickly see what legal authorities are mentioned in their case materials.
Fact Extraction: Pulls out key facts, names, dates, locations, and dollar amounts. This saves attorneys from manually highlighting and tagging these elements.
Question Answering: Attorneys type questions like 'What did witness X say about the contract?' and the system searches through all documents to provide relevant excerpts with source citations.
Document review time dropped from 4-6 weeks to 1-2 weeks per case. Attorneys spend less time searching for information and more time on legal strategy. The firm can now handle more cases with the same number of attorneys.
The system processes about 2,000 documents per day. OCR accuracy is around 98% for typed documents and 95% for handwritten notes. The AI summaries are accurate enough that attorneys use them as starting points rather than reading every document from scratch.
Because everything runs on their servers, the firm maintains complete control over confidential information. They passed security audits from their state bar association and major clients who were concerned about data handling.
If your team is one or two unknowns away from a system like this one, a thirty-minute call is the fastest way to find out.
Book a discovery callEngagements range from two-week diagnostics to multi-month builds, scoped after a single discovery call.
Every project on this page shipped because we said no to the wrong scope before we said yes to the right one. Half the value of working with us is the engagement we will not take. The other half is the system that ends up running in your business.
Healthcare, defense-adjacent, and enterprise clients sign NDAs that prevent naming. Engagement scope, technology stack, and measured outcomes can be shared publicly. Client identity stays protected.