flowchart LR A[PDF Input] --> B[Text Extraction] B --> C[LLM Analysis] C --> D[Error Detection] D --> E[Report Generation] E --> F[Visualizations]
VeritaScribe Documentation
AI-Powered Thesis Review Tool
1 Welcome to VeritaScribe
VeritaScribe is an intelligent document analysis system that automatically reviews PDF thesis documents for quality issues including grammar errors, content plausibility problems, and citation format inconsistencies.
1.1 What is VeritaScribe?
VeritaScribe combines advanced AI language models with structured document processing to provide comprehensive academic document review. Built with modern Python tools including DSPy for LLM orchestration, Pydantic for structured data modeling, and PyMuPDF for PDF processing.
1.2 Key Features
1.2.1 🔍 Comprehensive Analysis
- Grammar and linguistic error detection
- Content plausibility validation
- Citation format verification
- Error severity classification
1.2.2 📊 Smart Reporting
- Detailed error reports with locations
- Visual analytics and charts
- JSON data export
- Markdown reports
1.2.3 ⚙️ Flexible Configuration
- Multiple LLM model support
- Customizable analysis parameters
- Citation style configuration
- Processing optimization settings
1.2.4 🚀 Easy to Use
- Command-line interface
- Quick analysis mode
- Demo mode for testing
- Comprehensive error messages
1.3 How It Works
- PDF Processing: Extracts text while preserving layout and location information
- AI Analysis: Uses large language models to analyze content for various types of errors
- Error Classification: Categorizes and scores errors by type and severity
- Report Generation: Creates comprehensive reports and visualizations
1.4 Quick Start
Get started with VeritaScribe in just a few steps:
Install dependencies:
uv sync
Configure API key:
cp .env.example .env # Edit .env to add your OpenAI API key
Try the demo:
uv run python -m veritascribe demo
Analyze your document:
uv run python -m veritascribe analyze your_thesis.pdf
1.5 Error Types Detected
1.5.1 Grammar and Linguistics
- Spelling mistakes and typos
- Grammatical inconsistencies
- Punctuation errors
- Style and readability issues
1.5.2 Content Quality
- Logical inconsistencies
- Factual accuracy concerns
- Argument structure problems
- Citation-content mismatches
1.5.3 Citation Format
- Incorrect citation style formatting
- Missing or incomplete references
- Inconsistent bibliography formatting
- Citation accuracy issues
1.6 Architecture Overview
VeritaScribe follows a modular pipeline architecture:
- Configuration Layer: Environment-based settings management
- PDF Processing: Text extraction with layout preservation
- LLM Analysis: DSPy-based structured analysis modules
- Data Models: Pydantic schemas for type safety
- Report Generation: Multi-format output with visualizations
1.7 Next Steps
- Installation Guide: Detailed setup instructions
- Configuration Reference: Complete configuration options
- Usage Guide: Comprehensive usage examples
- API Reference: Technical documentation
- Architecture Guide: System design and development
1.8 Support
If you encounter issues or have questions:
- Check the Troubleshooting Guide
- Run system diagnostics:
uv run python -m veritascribe test
- Review configuration:
uv run python -m veritascribe config
VeritaScribe is designed for defensive security and academic quality assurance purposes only.