Usage Guide
1 Usage Guide
This comprehensive guide covers all aspects of using VeritaScribe to analyze thesis documents.
1.1 Command Overview
VeritaScribe provides several commands for different use cases:
Command | Purpose | Use Case |
---|---|---|
demo |
Create and analyze sample document | Testing setup |
quick |
Fast analysis of document subset | Quick feedback |
analyze |
Full document analysis | Complete review |
config |
View configuration | Troubleshooting |
providers |
List available LLM providers | Provider setup |
optimize-prompts |
Fine-tune analysis prompts | Accuracy improvement |
test |
System diagnostics | Verify installation |
1.2 Provider Selection
VeritaScribe supports multiple LLM providers. Choose based on your needs:
1.2.1 View Available Providers
uv run python -m veritascribe providers
This shows all supported providers, their models, and configuration examples.
1.2.2 Provider Recommendations
For Academic Use: - OpenAI: Most reliable, extensive model selection - Anthropic: Excellent reasoning, safety-focused - OpenRouter: Access to multiple providers, competitive pricing
For Cost Optimization: - OpenRouter: Free models available (z-ai/glm-4.5-air:free
) - Local Ollama: No API costs, privacy-focused - OpenAI gpt-3.5-turbo: Cheapest commercial option
For Privacy/Security: - Local Ollama: Complete data privacy - Custom endpoints: Organizational control - Azure OpenAI: Enterprise compliance
1.2.3 Quick Provider Setup
# OpenAI (standard)
echo "LLM_PROVIDER=openai" >> .env
echo "OPENAI_API_KEY=your-key" >> .env
# OpenRouter (100+ models)
echo "LLM_PROVIDER=openrouter" >> .env
echo "OPENROUTER_API_KEY=your-key" >> .env
echo "DEFAULT_MODEL=anthropic/claude-3.5-sonnet" >> .env
# Anthropic (direct Claude)
echo "LLM_PROVIDER=anthropic" >> .env
echo "ANTHROPIC_API_KEY=your-key" >> .env
# Local Ollama (free)
echo "LLM_PROVIDER=custom" >> .env
echo "OPENAI_BASE_URL=http://localhost:11434/v1" >> .env
echo "DEFAULT_MODEL=llama3.1:8b" >> .env
1.3 Getting Started
1.3.1 1. Try the Demo
Start with the demo to familiarize yourself with VeritaScribe:
uv run python -m veritascribe demo
This command will: - Create a sample thesis PDF (demo_thesis.pdf
) - Perform quick analysis if API key is configured - Show example output and reports
1.3.2 2. Quick Analysis
For rapid feedback on your document:
uv run python -m veritascribe quick your_thesis.pdf
Quick analysis: - Analyzes first 5 text blocks by default - Provides immediate feedback - Useful during writing process - Lower API costs
Customize block count:
uv run python -m veritascribe quick your_thesis.pdf --blocks 10
1.3.3 3. Full Analysis
For comprehensive document review:
uv run python -m veritascribe analyze your_thesis.pdf
This performs: - Complete document analysis - All error types detection - Detailed reporting - Visualization generation
1.4 Command Details
1.4.1 analyze
- Full Document Analysis
The primary command for comprehensive thesis analysis.
1.4.1.1 Basic Usage
uv run python -m veritascribe analyze thesis.pdf
1.4.1.2 Advanced Options
uv run python -m veritascribe analyze thesis.pdf \
--output ./results \
--citation-style APA \
--annotate \
--verbose
1.4.1.3 Options Reference
Option | Short | Description | Default |
---|---|---|---|
--output |
-o |
Output directory | ./analysis_output |
--citation-style |
-c |
Citation style | APA |
--quick |
-q |
Quick mode (10 blocks) | false |
--no-viz |
Skip visualizations | false |
|
--annotate |
Generate annotated PDF | false |
|
--verbose |
-v |
Verbose logging | false |
1.4.1.4 Citation Styles Supported
# American Psychological Association
--citation-style APA
# Modern Language Association
--citation-style MLA
# Chicago Manual of Style
--citation-style Chicago
# IEEE format
--citation-style IEEE
# Harvard referencing
--citation-style Harvard
1.4.1.5 Example Workflows
Standard Analysis:
uv run python -m veritascribe analyze thesis.pdf
Generate Annotated PDF:
uv run python -m veritascribe analyze thesis.pdf --annotate
The annotated PDF will contain: - Color-coded highlights on problematic text (red/orange/yellow by severity) - Detailed sticky note annotations with suggestions and explanations - All original document formatting preserved
Custom Output Location:
uv run python -m veritascribe analyze thesis.pdf \
--output ~/Documents/thesis_review
1.4.2 quick
- Fast Analysis
Ideal for iterative writing and quick feedback.
1.4.2.1 Basic Usage
uv run python -m veritascribe quick thesis.pdf
1.4.2.2 Customize Analysis Scope
# Analyze first 3 blocks
uv run python -m veritascribe quick thesis.pdf --blocks 3
# Analyze first 15 blocks
uv run python -m veritascribe quick thesis.pdf --blocks 15
1.4.3 optimize-prompts
- DSPy Prompt Optimization
Fine-tunes the analysis prompts using few-shot learning with bilingual training data for improved accuracy.
uv run python -m veritascribe optimize-prompts
1.4.3.1 What This Does
The prompt optimization process:
- Trains Language-Specific Modules: Creates optimized DSPy modules for both English and German
- Uses Curated Training Data: Leverages hand-crafted examples of grammar, content, and citation errors
- Applies Few-Shot Learning: Uses DSPy’s BootstrapFewShot compilation to improve prompt performance
- Enhances Accuracy: Results in significantly better error detection and fewer false positives
1.4.3.2 When to Use
- After Installation: Run once to set up optimized prompts for better results
- When Analysis Quality is Poor: If you notice many false positives or missed errors
- After Updating Training Data: When you’ve added new examples to the training dataset
- For Research/Academic Use: When maximum accuracy is more important than processing speed
1.4.3.3 Process Details
# The optimization process will:
# 1. Load bilingual training examples
# 2. Compile optimized modules for each analysis type
# 3. Save compiled modules for future use
# 4. Take 3-5 minutes to complete
uv run python -m veritascribe optimize-prompts
Note: This process requires significant LLM API usage as it trains multiple modules. Budget approximately $2-5 in API costs for the full optimization process.
1.4.4 demo
- Create Sample Document
Creates and analyzes a demo thesis document for testing and demonstration.
uv run python -m veritascribe demo
This command: - Creates a sample PDF document (demo_thesis.pdf
) with intentional errors - Runs quick analysis if API key is configured - Provides example output to familiarize you with the system - Perfect for testing your setup and configuration
1.4.5 config
- View Configuration
Displays current configuration settings and provider information.
uv run python -m veritascribe config
Shows: - Active LLM provider and model - API key configuration status - Analysis settings (temperature, max tokens, etc.) - Recommended models for your provider - Configuration validation results
1.4.6 providers
- List Available Providers
Shows all supported LLM providers, models, and configuration examples.
uv run python -m veritascribe providers
Displays: - Available providers (OpenAI, OpenRouter, Anthropic, Custom) - Recommended models for each provider - Configuration examples - Quick setup instructions for each provider
1.4.7 test
- Run System Tests
Verifies that all components are working correctly.
uv run python -m veritascribe test
Tests: - Configuration loading - PDF processing functionality - Analysis modules (if API key configured) - System integration and dependencies - Provides diagnostic information for troubleshooting
1.5 Understanding Output
1.5.1 Console Output
VeritaScribe provides rich console output with progress indicators and summaries:
Starting analysis of: thesis.pdf
Output directory: ./analysis_output
Analyzing document... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
Analysis Results: thesis.pdf
┌─────────────────────────────────────────────────────────────────────────────────┐
│ 📄 Pages: 45 │
│ 📝 Words: 12,543 │
│ 🔍 Text blocks analyzed: 87 │
│ ⚠️ Total errors: 23 │
│ 📊 Error rate: 1.83 per 1,000 words │
│ ⏱️ Processing time: 45.32s │
│ 🔤 Token usage: 15,234 tokens │
│ 💰 Estimated cost: $0.0457 USD │
└─────────────────────────────────────────────────────────────────────────────────┘
Errors by Type
...
1.5.2 Generated Files
Each analysis produces several output files:
1.5.2.1 1. Text Report (.md
)
Comprehensive Markdown report with summaries, detailed error listings, and cost information.
1.5.2.2 2. JSON Data Export (.json
)
Structured data for programmatic access, including all errors, locations, and analysis statistics.
1.5.2.3 3. Visualizations
Charts and graphs showing error distribution, density, and severity.
1.5.2.4 4. Annotated PDF (_annotated.pdf
)
An interactive PDF that transforms your original document into a comprehensive visual review tool.
What You Get:
Visual Error Highlighting: Problematic text is highlighted directly on the page using a severity-based color system:
- 🔴 Red Highlights: High-severity errors requiring immediate attention (grammar mistakes, logical inconsistencies, missing citations)
- 🟠 Orange Highlights: Medium-severity errors needing improvement (style issues, minor grammar problems, formatting inconsistencies)
- 🟡 Yellow Highlights: Low-severity suggestions for enhancement (stylistic preferences, optional improvements)
Detailed Sticky Note Annotations: Each error includes a comprehensive sticky note with:
ERROR: Grammar Severity: High Original: The results shows that our hypothesis Suggested: The results show that our hypothesis Explanation: Subject-verb disagreement: 'results' is plural and requires the plural verb 'show' Confidence: 95.0%
Smart Annotation Positioning: Annotations are automatically positioned to avoid overlaps while maintaining readability
Preserves Original Document: All original formatting, images, and layout are maintained while adding the review layer
Perfect Use Cases: - Supervisor Review: Share annotated PDFs with advisors for targeted feedback discussions - Collaborative Editing: Team members can see exactly what needs attention - Iterative Writing: Quickly identify areas needing revision during the writing process - Self-Review: Visual representation helps you understand error patterns in your writing - Academic Presentations: Demonstrate quality control processes to committees
Important Notes: - Annotated PDFs are only generated when errors are found - The --annotate
flag must be used explicitly - Annotations preserve the original PDF’s formatting and can be viewed in any standard PDF reader
1.6 Multi-Language Document Analysis
VeritaScribe provides intelligent multi-language support with automatic language detection and language-specific analysis.
1.6.1 Automatic Language Detection
The system automatically detects document language and applies appropriate analysis:
# English document - automatically detected and analyzed with English grammar rules
uv run python -m veritascribe analyze english_thesis.pdf
# German document - automatically detected and analyzed with German grammar rules
uv run python -m veritascribe analyze deutsche_abschlussarbeit.pdf
# Mixed-language document - intelligently handles language switching
uv run python -m veritascribe analyze multilingual_thesis.pdf
1.6.2 Language-Specific Features
1.6.2.1 English Analysis
- Grammar: Subject-verb agreement, tense consistency, article usage
- Academic Style: Formal writing conventions, passive voice usage
- Citations: APA, MLA, Chicago style formatting
- Content: Logical flow, argument structure, evidence validation
1.6.2.2 German Analysis
- Grammar: Kasus-Kongruenz (case agreement), Subjekt-Verb-Kongruenz
- Academic Style: German academic writing conventions, complex sentence structure
- Citations: German academic citation styles, bibliography formatting
- Content: German academic argumentation patterns, cultural context awareness
1.6.3 Optimization for Specific Languages
# Run prompt optimization to improve accuracy for both languages
uv run python -m veritascribe optimize-prompts
# This creates optimized modules for:
# - English grammar, content, and citation analysis
# - German grammar, content, and citation analysis
# - Language detection and switching logic
1.6.4 Best Practices for Multi-Language Documents
- Language Consistency: Ensure your document maintains consistent language usage
- Cultural Context: Be aware of different academic writing conventions
- Citation Styles: Use appropriate citation styles for your document’s language
- Optimization: Run prompt optimization for best results with non-English documents
1.7 Advanced Usage Patterns
1.7.1 Batch Processing
Process multiple documents:
# Create script for batch processing
cat > batch_analyze.sh << 'EOF'
#!/bin/bash
for pdf in *.pdf; do
echo "Analyzing $pdf..."
uv run python -m veritascribe analyze "$pdf" \
--output "./results/$(basename "$pdf" .pdf)"
done
EOF
chmod +x batch_analyze.sh
./batch_analyze.sh
1.7.2 Iterative Review Process
Workflow for document improvement:
# Step 1: Initial quick review
uv run python -m veritascribe quick draft.pdf --blocks 10
# Step 2: Address major issues, then full analysis with annotation
uv run python -m veritascribe analyze draft.pdf --output ./review_1 --annotate
# The annotated PDF will visually show all errors with severity-based highlighting
# Step 3: After revisions, re-analyze
uv run python -m veritascribe analyze revised_draft.pdf --output ./review_2
# Step 4: Compare results
diff ./review_1/draft_*_data.json ./review_2/revised_draft_*_data.json
1.7.3 Cost Management Strategies
Optimize API usage for large documents using different providers and models:
# Strategy 1: Use free OpenRouter models
LLM_PROVIDER=openrouter \
DEFAULT_MODEL=z-ai/glm-4.5-air:free \
uv run python -m veritascribe analyze large_thesis.pdf
# Strategy 2: Use cheaper OpenAI models
LLM_PROVIDER=openai \
DEFAULT_MODEL=gpt-3.5-turbo \
uv run python -m veritascribe analyze large_thesis.pdf
# Strategy 3: Use local models (no API costs)
LLM_PROVIDER=custom \
OPENAI_BASE_URL=http://localhost:11434/v1 \
DEFAULT_MODEL=llama3.1:8b \
uv run python -m veritascribe analyze large_thesis.pdf
1.8 Best Practices
1.8.1 Document Preparation
- Use text-based PDFs: Avoid scanned documents when possible
- Ensure proper formatting: Well-structured documents analyze better
- Check PDF integrity: Corrupted files may cause issues
- Remove passwords: Encrypted PDFs cannot be processed
1.8.2 Analysis Strategy
- Start with quick analysis: Get overview before full analysis
- Choose appropriate provider: Match your needs and budget
- Focus on high-priority issues: Address critical errors first
- Use
optimize-prompts
: Fine-tune prompts for better accuracy, especially for non-English documents - Consider document language: German and other non-English documents benefit significantly from prompt optimization
For troubleshooting common issues, see the Troubleshooting Guide.