/ Ghost Writer / Docs
All Tools Open App GitHub Contact

Ghost Writer Documentation

Complete reference for the Ghost Protocol methodology, 5-stage pipeline, 40-point QA system, platform specs, voice profiles, detection evasion, and API integration.

1. Ghost Protocol Methodology

Ghost Writer operates under three core laws that govern every piece of content:

  1. Controlled Chaos — Deliberate imperfection in every piece. Sentence fragments, unexpected metaphors, paragraphs that run long because the thought demands it. Never uniform.
  2. Signature Voice — Every content type gets a specific voice profile. The engine writes as a defined persona with consistent tone, vocabulary, and structural habits.
  3. Invisible Architecture — Detection evasion is baked into generation, not bolted on. The writing passes because of how it's constructed, not because of post-processing.

2. 5-Stage Pipeline

Every piece of content flows through five sequential stages:

  1. Profile — Load voice profile, set perplexity/burstiness targets, identify platform constraints.
  2. Writer — GPT-5.4 generation with Ghost Protocol system prompt.
  3. QA Engine — 40-point check across 10 blocks.
  4. Adapter — Format for target platform (18 supported).
  5. Polish — Human-pass simulation with 2–3 small edits.

3. The 40-Point QA System

Every piece is validated against 40 checks organized into 10 blocks. Hard checks must pass; soft checks inform quality scoring.

Block A: Statistical (#1–7)

ID Name Type Target Description
#1 Sentence Length Variance Hard stdev ≥ 5 Sentence length standard deviation must meet minimum for burstiness.
#2 Vocabulary Richness TTR Soft ≥ 0.45 Type-token ratio indicates lexical diversity.
#3 Hapax Legomena Ratio Soft ≥ 0.25 Ratio of words used once to total unique words.
#4 Average Sentence Length Soft 8–25 words Within human-typical range.
#5 Short Sentence Presence Hard ≥ 1 sentence ≤ 5 words At least one short sentence or fragment.
#6 Long Sentence Presence Soft ≥ 1 sentence ≥ 25 words At least one longer, complex sentence.
#7 N-gram Diversity Soft Varied distribution Token distribution should not be overly predictable.

Block B: Classifier Resistance (#8–12)

ID Name Type Target Description
#8 Conjunction Starters Hard ≥ 1 paragraph At least one paragraph starts with And/But/So.
#9 Fragment Usage Soft Contains fragments Content includes sentence fragments.
#10 Parenthetical Asides Soft Contains () or — Parentheticals or em-dashes present.
#11 Temperature Variance Soft 0.85–0.95 Generation temperature at creation.
#12 Model Attribution Defense Soft Varied patterns Patterns that resist model-specific attribution.

Block C: Linguistic (#13–18)

ID Name Type Target Description
#13 Phrase Blacklist Hard 0 hits Zero hits from 120+ banned AI-detectable phrases.
#14 Lexical Diversity Soft TTR ≥ 0.50 Vocabulary richness threshold.
#15 Readability Variance Soft Flesch-Kincaid 20–100 Readability score within range.
#16 Syntactic Variety Soft stdev ≥ 4 Sentence structure variation.
#17 Emotional Authenticity Soft Voice-driven Tone matches voice profile.
#18 Metaphor/Analogy Presence Soft ≥ 1 At least one metaphor or analogy.

Block D: Watermark (#19–20)

ID Name Type Target Description
#19 Unicode Normalization Hard Clean No invisible characters or watermark artifacts.
#20 Metadata Clean Hard None No embedded metadata or hidden markers.

Block E: Scoring (#21–25)

Confidence targeting, sentence-level clean, plagiarism check, anti-humanizer resistance, language authenticity.

Block F: Bias (#26–28)

Non-native bias clear, domain patterns validated, length optimization.

Block G: Adversarial (#29–31)

Pattern diversity, translation proof, authorship consistency.

Block H: Infrastructure (#32–34)

Multi-detector validation, plain text normalization, platform compliance.

Block I: Evaluation (#35–37)

Third-party benchmark, FPR exploitation clear, AI-assisted classification.

Block J: Governance (#38–40)

Disclosure compliance, audit trail, provenance proof.

4. Platform Specs

All 18 supported platforms with character limits, truncation rules, format, and best practices.

Platform Max Chars Truncation Format Best Length Key Rules
LinkedIn 3,000 140 mobile plain 300–1200 3 hashtags max, line breaks only
X/Twitter 280 / 25K premium plain 200–280 Thread format (1/n)
Reddit 40,000 markdown 400–1500 words TL;DR for >300 words
Instagram 2,200 125 plain 125–500 3–5 hashtags, emojis=2 chars
Facebook 63,206 125 mobile plain 40–250 Front-load message
TikTok 4,000 100 plain 100–300 Hook-first
YouTube 5,000 200 plain 200–1000 Timestamps, chapters
Substack unlimited html/markdown 800–3000 words H2/H3, pull quotes
Email html+plain 50–300 words subject<60, preheader<90
Blog unlimited markdown 800–2500 words H2/H3, meta<160
White Paper unlimited markdown 2000–5000 words Exec summary, citations
Threads 500 500 plain 100–500 Complete thought
Medium unlimited markdown 800–2500 words 5 tags, subtitle
Pinterest 500 plain 100–300 Keyword-rich, no hashtag spam
GBP 1,500 plain 150–300 CTA button types
Website unlimited html 300–800 words Conversion copy
Reply context-matched context-matched 50–200 words Acknowledge + answer
Reddit Comment 10,000 markdown 50–300 words Conversational

5. Voice Profiles

Four built-in profiles plus custom import:

Profile Style Avg Words Stdev
john-williams Direct, opinionated, coaching analogies 16 9
agency Professional, data-driven 18 7
technical Precise, specification-heavy 14 6
casual Conversational, fragment-heavy 12 11

Voice Import

Paste 2–5 writing samples → engine analyzes sentence length, vocabulary, structure, tone → extracts fingerprint → saves as custom profile.

6. Detection Methodology

How each detector works and how Ghost Writer defeats it:

GPTZero

Uses perplexity + burstiness + 7 indicators. We target perplexity >30, burstiness stdev >5, inject fragments and unexpected word choices.

Pangram v3

Classifies as AI / AI-Assisted / Human with model attribution. We vary token distribution patterns and use voice-specific vocabulary.

Originality.ai v2

99% accuracy, catches paraphrasing. We generate with human patterns from scratch rather than paraphrasing AI text.

7. API Reference

POST /api/writing-agent — Generate content

// Request
{
  "platform": "linkedin",
  "voice": "john-williams",
  "topic": "Why PMax works better with brand campaigns",
  "context": "B2B SaaS audience",
  "length": "500"
}

// Response
{
  "content": "...",
  "platform": "linkedin",
  "checks": { "passed": 38, "failed": 2, "details": [...] }
}

POST /api/writing-agent-check — Check existing text

// Request
{
  "text": "Your existing content to analyze..."
}

// Response
{
  "readability": { "fleschKincaid": 65, "grade": "8th grade", ... },
  "tone": ["confident", "direct"],
  "aiScore": 0.12,
  "suggestions": [...]
}

POST /api/writing-agent-voice — Voice profile management

// Request (analyze samples)
{
  "action": "analyze",
  "samples": ["Sample 1...", "Sample 2...", "Sample 3..."]
}

// Response
{
  "fingerprint": {
    "sentenceLength": { "mean": 16, "stdev": 9, ... },
    "vocabulary": { "ttr": 0.52, "domainTerms": [...], ... }
  }
}

8. Research & Citations