Usage Guide

1 Usage Guide

This comprehensive guide covers all aspects of using VeritaScribe to analyze thesis documents.

1.1 Command Overview

VeritaScribe provides several commands for different use cases:

Command Purpose Use Case
demo Create and analyze sample document Testing setup
quick Fast analysis of document subset Quick feedback
analyze Full document analysis Complete review
config View configuration Troubleshooting
providers List available LLM providers Provider setup
optimize-prompts Fine-tune analysis prompts Accuracy improvement
test System diagnostics Verify installation

1.2 Provider Selection

VeritaScribe supports multiple LLM providers. Choose based on your needs:

1.2.1 View Available Providers

uv run python -m veritascribe providers

This shows all supported providers, their models, and configuration examples.

1.2.2 Provider Recommendations

For Academic Use: - OpenAI: Most reliable, extensive model selection - Anthropic: Excellent reasoning, safety-focused - OpenRouter: Access to multiple providers, competitive pricing

For Cost Optimization: - OpenRouter: Free models available (z-ai/glm-4.5-air:free) - Local Ollama: No API costs, privacy-focused - OpenAI gpt-3.5-turbo: Cheapest commercial option

For Privacy/Security: - Local Ollama: Complete data privacy - Custom endpoints: Organizational control - Azure OpenAI: Enterprise compliance

1.2.3 Quick Provider Setup

# OpenAI (standard)
echo "LLM_PROVIDER=openai" >> .env
echo "OPENAI_API_KEY=your-key" >> .env

# OpenRouter (100+ models)
echo "LLM_PROVIDER=openrouter" >> .env  
echo "OPENROUTER_API_KEY=your-key" >> .env
echo "DEFAULT_MODEL=anthropic/claude-3.5-sonnet" >> .env

# Anthropic (direct Claude)
echo "LLM_PROVIDER=anthropic" >> .env
echo "ANTHROPIC_API_KEY=your-key" >> .env

# Local Ollama (free)
echo "LLM_PROVIDER=custom" >> .env
echo "OPENAI_BASE_URL=http://localhost:11434/v1" >> .env
echo "DEFAULT_MODEL=llama3.1:8b" >> .env

1.3 Getting Started

1.3.1 1. Try the Demo

Start with the demo to familiarize yourself with VeritaScribe:

uv run python -m veritascribe demo

This command will: - Create a sample thesis PDF (demo_thesis.pdf) - Perform quick analysis if API key is configured - Show example output and reports

1.3.2 2. Quick Analysis

For rapid feedback on your document:

uv run python -m veritascribe quick your_thesis.pdf

Quick analysis: - Analyzes first 5 text blocks by default - Provides immediate feedback - Useful during writing process - Lower API costs

Customize block count:

uv run python -m veritascribe quick your_thesis.pdf --blocks 10

1.3.3 3. Full Analysis

For comprehensive document review:

uv run python -m veritascribe analyze your_thesis.pdf

This performs: - Complete document analysis - All error types detection - Detailed reporting - Visualization generation

1.4 Command Details

1.4.1 analyze - Full Document Analysis

The primary command for comprehensive thesis analysis.

1.4.1.1 Basic Usage

uv run python -m veritascribe analyze thesis.pdf

1.4.1.2 Advanced Options

uv run python -m veritascribe analyze thesis.pdf \
  --output ./results \
  --citation-style APA \
  --annotate \
  --verbose

1.4.1.3 Options Reference

Option Short Description Default
--output -o Output directory ./analysis_output
--citation-style -c Citation style APA
--quick -q Quick mode (10 blocks) false
--no-viz Skip visualizations false
--annotate Generate annotated PDF false
--verbose -v Verbose logging false

1.4.1.4 Citation Styles Supported

# American Psychological Association
--citation-style APA

# Modern Language Association  
--citation-style MLA

# Chicago Manual of Style
--citation-style Chicago

# IEEE format
--citation-style IEEE

# Harvard referencing
--citation-style Harvard

1.4.1.5 Example Workflows

Standard Analysis:

uv run python -m veritascribe analyze thesis.pdf

Generate Annotated PDF:

uv run python -m veritascribe analyze thesis.pdf --annotate

The annotated PDF will contain: - Color-coded highlights on problematic text (red/orange/yellow by severity) - Detailed sticky note annotations with suggestions and explanations - All original document formatting preserved

Custom Output Location:

uv run python -m veritascribe analyze thesis.pdf \
  --output ~/Documents/thesis_review

1.4.2 quick - Fast Analysis

Ideal for iterative writing and quick feedback.

1.4.2.1 Basic Usage

uv run python -m veritascribe quick thesis.pdf

1.4.2.2 Customize Analysis Scope

# Analyze first 3 blocks
uv run python -m veritascribe quick thesis.pdf --blocks 3

# Analyze first 15 blocks  
uv run python -m veritascribe quick thesis.pdf --blocks 15

1.4.3 optimize-prompts - DSPy Prompt Optimization

Fine-tunes the analysis prompts using few-shot learning with bilingual training data for improved accuracy.

uv run python -m veritascribe optimize-prompts

1.4.3.1 What This Does

The prompt optimization process:

  1. Trains Language-Specific Modules: Creates optimized DSPy modules for both English and German
  2. Uses Curated Training Data: Leverages hand-crafted examples of grammar, content, and citation errors
  3. Applies Few-Shot Learning: Uses DSPy’s BootstrapFewShot compilation to improve prompt performance
  4. Enhances Accuracy: Results in significantly better error detection and fewer false positives

1.4.3.2 When to Use

  • After Installation: Run once to set up optimized prompts for better results
  • When Analysis Quality is Poor: If you notice many false positives or missed errors
  • After Updating Training Data: When you’ve added new examples to the training dataset
  • For Research/Academic Use: When maximum accuracy is more important than processing speed

1.4.3.3 Process Details

# The optimization process will:
# 1. Load bilingual training examples
# 2. Compile optimized modules for each analysis type
# 3. Save compiled modules for future use
# 4. Take 3-5 minutes to complete

uv run python -m veritascribe optimize-prompts

Note: This process requires significant LLM API usage as it trains multiple modules. Budget approximately $2-5 in API costs for the full optimization process.

1.4.4 demo - Create Sample Document

Creates and analyzes a demo thesis document for testing and demonstration.

uv run python -m veritascribe demo

This command: - Creates a sample PDF document (demo_thesis.pdf) with intentional errors - Runs quick analysis if API key is configured - Provides example output to familiarize you with the system - Perfect for testing your setup and configuration

1.4.5 config - View Configuration

Displays current configuration settings and provider information.

uv run python -m veritascribe config

Shows: - Active LLM provider and model - API key configuration status - Analysis settings (temperature, max tokens, etc.) - Recommended models for your provider - Configuration validation results

1.4.6 providers - List Available Providers

Shows all supported LLM providers, models, and configuration examples.

uv run python -m veritascribe providers

Displays: - Available providers (OpenAI, OpenRouter, Anthropic, Custom) - Recommended models for each provider - Configuration examples - Quick setup instructions for each provider

1.4.7 test - Run System Tests

Verifies that all components are working correctly.

uv run python -m veritascribe test

Tests: - Configuration loading - PDF processing functionality - Analysis modules (if API key configured) - System integration and dependencies - Provides diagnostic information for troubleshooting

1.5 Understanding Output

1.5.1 Console Output

VeritaScribe provides rich console output with progress indicators and summaries:

Starting analysis of: thesis.pdf
Output directory: ./analysis_output

Analyzing document... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%

Analysis Results: thesis.pdf
┌─────────────────────────────────────────────────────────────────────────────────┐
│ 📄 Pages: 45                                                                    │
│ 📝 Words: 12,543                                                               │
│ 🔍 Text blocks analyzed: 87                                                     │
│ ⚠️  Total errors: 23                                                            │
│ 📊 Error rate: 1.83 per 1,000 words                                           │
│ ⏱️  Processing time: 45.32s                                                     │
│ 🔤 Token usage: 15,234 tokens                                                    │
│ 💰 Estimated cost: $0.0457 USD                                                   │
└─────────────────────────────────────────────────────────────────────────────────┘

Errors by Type
...

1.5.2 Generated Files

Each analysis produces several output files:

1.5.2.1 1. Text Report (.md)

Comprehensive Markdown report with summaries, detailed error listings, and cost information.

1.5.2.2 2. JSON Data Export (.json)

Structured data for programmatic access, including all errors, locations, and analysis statistics.

1.5.2.3 3. Visualizations

Charts and graphs showing error distribution, density, and severity.

1.5.2.4 4. Annotated PDF (_annotated.pdf)

An interactive PDF that transforms your original document into a comprehensive visual review tool.

What You Get:

  • Visual Error Highlighting: Problematic text is highlighted directly on the page using a severity-based color system:

    • 🔴 Red Highlights: High-severity errors requiring immediate attention (grammar mistakes, logical inconsistencies, missing citations)
    • 🟠 Orange Highlights: Medium-severity errors needing improvement (style issues, minor grammar problems, formatting inconsistencies)
    • 🟡 Yellow Highlights: Low-severity suggestions for enhancement (stylistic preferences, optional improvements)
  • Detailed Sticky Note Annotations: Each error includes a comprehensive sticky note with:

    ERROR: Grammar
    Severity: High
    
    Original: The results shows that our hypothesis
    Suggested: The results show that our hypothesis
    
    Explanation: Subject-verb disagreement: 'results' is plural 
    and requires the plural verb 'show'
    
    Confidence: 95.0%
  • Smart Annotation Positioning: Annotations are automatically positioned to avoid overlaps while maintaining readability

  • Preserves Original Document: All original formatting, images, and layout are maintained while adding the review layer

Perfect Use Cases: - Supervisor Review: Share annotated PDFs with advisors for targeted feedback discussions - Collaborative Editing: Team members can see exactly what needs attention - Iterative Writing: Quickly identify areas needing revision during the writing process - Self-Review: Visual representation helps you understand error patterns in your writing - Academic Presentations: Demonstrate quality control processes to committees

Important Notes: - Annotated PDFs are only generated when errors are found - The --annotate flag must be used explicitly - Annotations preserve the original PDF’s formatting and can be viewed in any standard PDF reader

1.6 Multi-Language Document Analysis

VeritaScribe provides intelligent multi-language support with automatic language detection and language-specific analysis.

1.6.1 Automatic Language Detection

The system automatically detects document language and applies appropriate analysis:

# English document - automatically detected and analyzed with English grammar rules
uv run python -m veritascribe analyze english_thesis.pdf

# German document - automatically detected and analyzed with German grammar rules  
uv run python -m veritascribe analyze deutsche_abschlussarbeit.pdf

# Mixed-language document - intelligently handles language switching
uv run python -m veritascribe analyze multilingual_thesis.pdf

1.6.2 Language-Specific Features

1.6.2.1 English Analysis

  • Grammar: Subject-verb agreement, tense consistency, article usage
  • Academic Style: Formal writing conventions, passive voice usage
  • Citations: APA, MLA, Chicago style formatting
  • Content: Logical flow, argument structure, evidence validation

1.6.2.2 German Analysis

  • Grammar: Kasus-Kongruenz (case agreement), Subjekt-Verb-Kongruenz
  • Academic Style: German academic writing conventions, complex sentence structure
  • Citations: German academic citation styles, bibliography formatting
  • Content: German academic argumentation patterns, cultural context awareness

1.6.3 Optimization for Specific Languages

# Run prompt optimization to improve accuracy for both languages
uv run python -m veritascribe optimize-prompts

# This creates optimized modules for:
# - English grammar, content, and citation analysis
# - German grammar, content, and citation analysis
# - Language detection and switching logic

1.6.4 Best Practices for Multi-Language Documents

  1. Language Consistency: Ensure your document maintains consistent language usage
  2. Cultural Context: Be aware of different academic writing conventions
  3. Citation Styles: Use appropriate citation styles for your document’s language
  4. Optimization: Run prompt optimization for best results with non-English documents

1.7 Advanced Usage Patterns

1.7.1 Batch Processing

Process multiple documents:

# Create script for batch processing
cat > batch_analyze.sh << 'EOF'
#!/bin/bash
for pdf in *.pdf; do
  echo "Analyzing $pdf..."
  uv run python -m veritascribe analyze "$pdf" \
    --output "./results/$(basename "$pdf" .pdf)"
done
EOF

chmod +x batch_analyze.sh
./batch_analyze.sh

1.7.2 Iterative Review Process

Workflow for document improvement:

# Step 1: Initial quick review
uv run python -m veritascribe quick draft.pdf --blocks 10

# Step 2: Address major issues, then full analysis with annotation
uv run python -m veritascribe analyze draft.pdf --output ./review_1 --annotate

# The annotated PDF will visually show all errors with severity-based highlighting

# Step 3: After revisions, re-analyze
uv run python -m veritascribe analyze revised_draft.pdf --output ./review_2

# Step 4: Compare results
diff ./review_1/draft_*_data.json ./review_2/revised_draft_*_data.json

1.7.3 Cost Management Strategies

Optimize API usage for large documents using different providers and models:

# Strategy 1: Use free OpenRouter models
LLM_PROVIDER=openrouter \
DEFAULT_MODEL=z-ai/glm-4.5-air:free \
uv run python -m veritascribe analyze large_thesis.pdf

# Strategy 2: Use cheaper OpenAI models
LLM_PROVIDER=openai \
DEFAULT_MODEL=gpt-3.5-turbo \
uv run python -m veritascribe analyze large_thesis.pdf

# Strategy 3: Use local models (no API costs)
LLM_PROVIDER=custom \
OPENAI_BASE_URL=http://localhost:11434/v1 \
DEFAULT_MODEL=llama3.1:8b \
uv run python -m veritascribe analyze large_thesis.pdf

1.8 Best Practices

1.8.1 Document Preparation

  1. Use text-based PDFs: Avoid scanned documents when possible
  2. Ensure proper formatting: Well-structured documents analyze better
  3. Check PDF integrity: Corrupted files may cause issues
  4. Remove passwords: Encrypted PDFs cannot be processed

1.8.2 Analysis Strategy

  1. Start with quick analysis: Get overview before full analysis
  2. Choose appropriate provider: Match your needs and budget
  3. Focus on high-priority issues: Address critical errors first
  4. Use optimize-prompts: Fine-tune prompts for better accuracy, especially for non-English documents
  5. Consider document language: German and other non-English documents benefit significantly from prompt optimization

For troubleshooting common issues, see the Troubleshooting Guide.