Usage Guide
1 Usage Guide
This comprehensive guide covers all aspects of using VeritaScribe to analyze thesis documents.
1.1 Command Overview
VeritaScribe provides several commands for different use cases:
| Command | Purpose | Use Case |
|---|---|---|
demo |
Create and analyze sample document | Testing setup |
quick |
Fast analysis of document subset | Quick feedback |
analyze |
Full document analysis | Complete review |
config |
View configuration | Troubleshooting |
providers |
List available LLM providers | Provider setup |
optimize-prompts |
Fine-tune analysis prompts | Accuracy improvement |
test |
System diagnostics | Verify installation |
1.2 Provider Selection
VeritaScribe supports multiple LLM providers. Choose based on your needs:
1.2.1 View Available Providers
uv run python -m veritascribe providersThis shows all supported providers, their models, and configuration examples.
1.2.2 Provider Recommendations
For Academic Use: - OpenAI: Most reliable, extensive model selection - Anthropic: Excellent reasoning, safety-focused - OpenRouter: Access to multiple providers, competitive pricing
For Cost Optimization: - OpenRouter: Free models available (z-ai/glm-4.5-air:free) - Local Ollama: No API costs, privacy-focused - OpenAI gpt-3.5-turbo: Cheapest commercial option
For Privacy/Security: - Local Ollama: Complete data privacy - Custom endpoints: Organizational control - Azure OpenAI: Enterprise compliance
1.2.3 Quick Provider Setup
# OpenAI (standard)
echo "LLM_PROVIDER=openai" >> .env
echo "OPENAI_API_KEY=your-key" >> .env
# OpenRouter (100+ models)
echo "LLM_PROVIDER=openrouter" >> .env
echo "OPENROUTER_API_KEY=your-key" >> .env
echo "DEFAULT_MODEL=anthropic/claude-3.5-sonnet" >> .env
# Anthropic (direct Claude)
echo "LLM_PROVIDER=anthropic" >> .env
echo "ANTHROPIC_API_KEY=your-key" >> .env
# Local Ollama (free)
echo "LLM_PROVIDER=custom" >> .env
echo "OPENAI_BASE_URL=http://localhost:11434/v1" >> .env
echo "DEFAULT_MODEL=llama3.1:8b" >> .env1.3 Getting Started
1.3.1 1. Try the Demo
Start with the demo to familiarize yourself with VeritaScribe:
uv run python -m veritascribe demoThis command will: - Create a sample thesis PDF (demo_thesis.pdf) - Perform quick analysis if API key is configured - Show example output and reports
1.3.2 2. Quick Analysis
For rapid feedback on your document:
uv run python -m veritascribe quick your_thesis.pdfQuick analysis: - Analyzes first 5 text blocks by default - Provides immediate feedback - Useful during writing process - Lower API costs
Customize block count:
uv run python -m veritascribe quick your_thesis.pdf --blocks 101.3.3 3. Full Analysis
For comprehensive document review:
uv run python -m veritascribe analyze your_thesis.pdfThis performs: - Complete document analysis - All error types detection - Detailed reporting - Visualization generation
1.4 Command Details
1.4.1 analyze - Full Document Analysis
The primary command for comprehensive thesis analysis.
1.4.1.1 Basic Usage
uv run python -m veritascribe analyze thesis.pdf1.4.1.2 Advanced Options
uv run python -m veritascribe analyze thesis.pdf \
--output ./results \
--citation-style APA \
--annotate \
--verbose1.4.1.3 Options Reference
| Option | Short | Description | Default |
|---|---|---|---|
--output |
-o |
Output directory | ./analysis_output |
--citation-style |
-c |
Citation style | APA |
--quick |
-q |
Quick mode (10 blocks) | false |
--no-viz |
Skip visualizations | false |
|
--annotate |
Generate annotated PDF | false |
|
--verbose |
-v |
Verbose logging | false |
1.4.1.4 Citation Styles Supported
# American Psychological Association
--citation-style APA
# Modern Language Association
--citation-style MLA
# Chicago Manual of Style
--citation-style Chicago
# IEEE format
--citation-style IEEE
# Harvard referencing
--citation-style Harvard1.4.1.5 Example Workflows
Standard Analysis:
uv run python -m veritascribe analyze thesis.pdfGenerate Annotated PDF:
uv run python -m veritascribe analyze thesis.pdf --annotateThe annotated PDF will contain: - Color-coded highlights on problematic text (red/orange/yellow by severity) - Detailed sticky note annotations with suggestions and explanations - All original document formatting preserved
Custom Output Location:
uv run python -m veritascribe analyze thesis.pdf \
--output ~/Documents/thesis_review1.4.2 quick - Fast Analysis
Ideal for iterative writing and quick feedback.
1.4.2.1 Basic Usage
uv run python -m veritascribe quick thesis.pdf1.4.2.2 Customize Analysis Scope
# Analyze first 3 blocks
uv run python -m veritascribe quick thesis.pdf --blocks 3
# Analyze first 15 blocks
uv run python -m veritascribe quick thesis.pdf --blocks 151.4.3 optimize-prompts - DSPy Prompt Optimization
Fine-tunes the analysis prompts using few-shot learning with bilingual training data for improved accuracy.
uv run python -m veritascribe optimize-prompts1.4.3.1 What This Does
The prompt optimization process:
- Trains Language-Specific Modules: Creates optimized DSPy modules for both English and German
- Uses Curated Training Data: Leverages hand-crafted examples of grammar, content, and citation errors
- Applies Few-Shot Learning: Uses DSPy’s BootstrapFewShot compilation to improve prompt performance
- Enhances Accuracy: Results in significantly better error detection and fewer false positives
1.4.3.2 When to Use
- After Installation: Run once to set up optimized prompts for better results
- When Analysis Quality is Poor: If you notice many false positives or missed errors
- After Updating Training Data: When you’ve added new examples to the training dataset
- For Research/Academic Use: When maximum accuracy is more important than processing speed
1.4.3.3 Process Details
# The optimization process will:
# 1. Load bilingual training examples
# 2. Compile optimized modules for each analysis type
# 3. Save compiled modules for future use
# 4. Take 3-5 minutes to complete
uv run python -m veritascribe optimize-promptsNote: This process requires significant LLM API usage as it trains multiple modules. Budget approximately $2-5 in API costs for the full optimization process.
1.4.4 demo - Create Sample Document
Creates and analyzes a demo thesis document for testing and demonstration.
uv run python -m veritascribe demoThis command: - Creates a sample PDF document (demo_thesis.pdf) with intentional errors - Runs quick analysis if API key is configured - Provides example output to familiarize you with the system - Perfect for testing your setup and configuration
1.4.5 config - View Configuration
Displays current configuration settings and provider information.
uv run python -m veritascribe configShows: - Active LLM provider and model - API key configuration status - Analysis settings (temperature, max tokens, etc.) - Recommended models for your provider - Configuration validation results
1.4.6 providers - List Available Providers
Shows all supported LLM providers, models, and configuration examples.
uv run python -m veritascribe providersDisplays: - Available providers (OpenAI, OpenRouter, Anthropic, Custom) - Recommended models for each provider - Configuration examples - Quick setup instructions for each provider
1.4.7 test - Run System Tests
Verifies that all components are working correctly.
uv run python -m veritascribe testTests: - Configuration loading - PDF processing functionality - Analysis modules (if API key configured) - System integration and dependencies - Provides diagnostic information for troubleshooting
1.5 Understanding Output
1.5.1 Console Output
VeritaScribe provides rich console output with progress indicators and summaries:
Starting analysis of: thesis.pdf
Output directory: ./analysis_output
Analyzing document... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
Analysis Results: thesis.pdf
┌─────────────────────────────────────────────────────────────────────────────────┐
│ 📄 Pages: 45 │
│ 📝 Words: 12,543 │
│ 🔍 Text blocks analyzed: 87 │
│ ⚠️ Total errors: 23 │
│ 📊 Error rate: 1.83 per 1,000 words │
│ ⏱️ Processing time: 45.32s │
│ 🔤 Token usage: 15,234 tokens │
│ 💰 Estimated cost: $0.0457 USD │
└─────────────────────────────────────────────────────────────────────────────────┘
Errors by Type
...
1.5.2 Generated Files
Each analysis produces several output files:
1.5.2.1 1. Text Report (.md)
Comprehensive Markdown report with summaries, detailed error listings, and cost information.
1.5.2.2 2. JSON Data Export (.json)
Structured data for programmatic access, including all errors, locations, and analysis statistics.
1.5.2.3 3. Visualizations
Charts and graphs showing error distribution, density, and severity.
1.5.2.4 4. Annotated PDF (_annotated.pdf)
An interactive PDF that transforms your original document into a comprehensive visual review tool.
What You Get:
Visual Error Highlighting: Problematic text is highlighted directly on the page using a severity-based color system:
- 🔴 Red Highlights: High-severity errors requiring immediate attention (grammar mistakes, logical inconsistencies, missing citations)
- 🟠 Orange Highlights: Medium-severity errors needing improvement (style issues, minor grammar problems, formatting inconsistencies)
- 🟡 Yellow Highlights: Low-severity suggestions for enhancement (stylistic preferences, optional improvements)
Detailed Sticky Note Annotations: Each error includes a comprehensive sticky note with:
ERROR: Grammar Severity: High Original: The results shows that our hypothesis Suggested: The results show that our hypothesis Explanation: Subject-verb disagreement: 'results' is plural and requires the plural verb 'show' Confidence: 95.0%Smart Annotation Positioning: Annotations are automatically positioned to avoid overlaps while maintaining readability
Preserves Original Document: All original formatting, images, and layout are maintained while adding the review layer
Perfect Use Cases: - Supervisor Review: Share annotated PDFs with advisors for targeted feedback discussions - Collaborative Editing: Team members can see exactly what needs attention - Iterative Writing: Quickly identify areas needing revision during the writing process - Self-Review: Visual representation helps you understand error patterns in your writing - Academic Presentations: Demonstrate quality control processes to committees
Important Notes: - Annotated PDFs are only generated when errors are found - The --annotate flag must be used explicitly - Annotations preserve the original PDF’s formatting and can be viewed in any standard PDF reader
1.6 Multi-Language Document Analysis
VeritaScribe provides intelligent multi-language support with automatic language detection and language-specific analysis.
1.6.1 Automatic Language Detection
The system automatically detects document language and applies appropriate analysis:
# English document - automatically detected and analyzed with English grammar rules
uv run python -m veritascribe analyze english_thesis.pdf
# German document - automatically detected and analyzed with German grammar rules
uv run python -m veritascribe analyze deutsche_abschlussarbeit.pdf
# Mixed-language document - intelligently handles language switching
uv run python -m veritascribe analyze multilingual_thesis.pdf1.6.2 Language-Specific Features
1.6.2.1 English Analysis
- Grammar: Subject-verb agreement, tense consistency, article usage
- Academic Style: Formal writing conventions, passive voice usage
- Citations: APA, MLA, Chicago style formatting
- Content: Logical flow, argument structure, evidence validation
1.6.2.2 German Analysis
- Grammar: Kasus-Kongruenz (case agreement), Subjekt-Verb-Kongruenz
- Academic Style: German academic writing conventions, complex sentence structure
- Citations: German academic citation styles, bibliography formatting
- Content: German academic argumentation patterns, cultural context awareness
1.6.3 Optimization for Specific Languages
# Run prompt optimization to improve accuracy for both languages
uv run python -m veritascribe optimize-prompts
# This creates optimized modules for:
# - English grammar, content, and citation analysis
# - German grammar, content, and citation analysis
# - Language detection and switching logic1.6.4 Best Practices for Multi-Language Documents
- Language Consistency: Ensure your document maintains consistent language usage
- Cultural Context: Be aware of different academic writing conventions
- Citation Styles: Use appropriate citation styles for your document’s language
- Optimization: Run prompt optimization for best results with non-English documents
1.7 Advanced Usage Patterns
1.7.1 Batch Processing
Process multiple documents:
# Create script for batch processing
cat > batch_analyze.sh << 'EOF'
#!/bin/bash
for pdf in *.pdf; do
echo "Analyzing $pdf..."
uv run python -m veritascribe analyze "$pdf" \
--output "./results/$(basename "$pdf" .pdf)"
done
EOF
chmod +x batch_analyze.sh
./batch_analyze.sh1.7.2 Iterative Review Process
Workflow for document improvement:
# Step 1: Initial quick review
uv run python -m veritascribe quick draft.pdf --blocks 10
# Step 2: Address major issues, then full analysis with annotation
uv run python -m veritascribe analyze draft.pdf --output ./review_1 --annotate
# The annotated PDF will visually show all errors with severity-based highlighting
# Step 3: After revisions, re-analyze
uv run python -m veritascribe analyze revised_draft.pdf --output ./review_2
# Step 4: Compare results
diff ./review_1/draft_*_data.json ./review_2/revised_draft_*_data.json1.7.3 Cost Management Strategies
Optimize API usage for large documents using different providers and models:
# Strategy 1: Use free OpenRouter models
LLM_PROVIDER=openrouter \
DEFAULT_MODEL=z-ai/glm-4.5-air:free \
uv run python -m veritascribe analyze large_thesis.pdf
# Strategy 2: Use cheaper OpenAI models
LLM_PROVIDER=openai \
DEFAULT_MODEL=gpt-3.5-turbo \
uv run python -m veritascribe analyze large_thesis.pdf
# Strategy 3: Use local models (no API costs)
LLM_PROVIDER=custom \
OPENAI_BASE_URL=http://localhost:11434/v1 \
DEFAULT_MODEL=llama3.1:8b \
uv run python -m veritascribe analyze large_thesis.pdf1.8 Best Practices
1.8.1 Document Preparation
- Use text-based PDFs: Avoid scanned documents when possible
- Ensure proper formatting: Well-structured documents analyze better
- Check PDF integrity: Corrupted files may cause issues
- Remove passwords: Encrypted PDFs cannot be processed
1.8.2 Analysis Strategy
- Start with quick analysis: Get overview before full analysis
- Choose appropriate provider: Match your needs and budget
- Focus on high-priority issues: Address critical errors first
- Use
optimize-prompts: Fine-tune prompts for better accuracy, especially for non-English documents - Consider document language: German and other non-English documents benefit significantly from prompt optimization
For troubleshooting common issues, see the Troubleshooting Guide.