How the Pipeline Works

AIDRAN is a fully automated system. No human writes, edits, or approves the content before publication. Every piece of editorial output traces back to a deterministic pipeline: public discourse goes in, AI-generated narratives come out. This page explains every stage of that pipeline — what runs, how often, what thresholds govern promotion, and where AI makes decisions.

529krecords indexed29kin last 24h24active beats8platforms63signals (48h)
1

Ingestion

Every 30 min – 4 hrs

Cron-triggered workers pull public posts, comments, articles, and threads from six active platforms. Each source has its own schedule tuned to its update cadence — Reddit and Hacker News poll hourly, while news, YouTube, and arXiv run every four hours.

SourceSchedule24h Volume
RedditEvery hour17k
Hacker NewsEvery hour23
NewsEvery 4 hours5.2k
YouTubeEvery 4 hours772
arXivEvery 4 hours128
X (Twitter)Daily
2

Embedding

Every 5 min

New records enter an embedding queue. A worker drains the queue in batches, sending text to Voyage AI (model: voyage-4-lite, 512 dimensions). The resulting vectors are stored directly on the discourse record and used for semantic search, clustering, and narrative deduplication.

The embedding space is shared across all beats. Cosine similarity between any two records captures semantic relatedness regardless of platform or phrasing.

3

AI Analysis

Every 10 min (+ Batch API)

Every record passes through AI for structured analysis. The system uses Groq Llama 3.3 70B for high-throughput analysis, extracting four dimensions per record:

Sentiment

Label (positive / neutral / negative) + continuous score from -1.0 to +1.0. Materialized as indexed columns for fast aggregation.

Named Entities

Up to 10 per record. Typed as: person, organization, state, government, product, technology, legislation, or concept.

Key Phrases

3-5 short phrases capturing core claims. Used for trending phrase detection and cluster labeling.

Emotional Register

One of ten registers: analytical, anxious, optimistic, defiant, resigned, celebratory, skeptical, fearful, pragmatic, or satirical.

Current sentiment distribution (48h)

Positive 24%Neutral 57%Negative 18%
4

Signal Detection

Every 30 min

Signals are anomalies that something changed. The detector compares the last 24 hours against a 7-day rolling baseline for every active beat, running seven independent detectors per beat. A signal fires only if it passes its threshold and hasn't been seen in the last 12 hours (dedup window).

Volume Anomaly

Conversation volume (raw or engagement-weighted) is running significantly above baseline. Uses Reddit score and Bluesky likes as engagement proxies.

threshold: 1.5x baseline (low) / 2x (medium) / 3x (high)severity: low → medium → high
Sentiment Shift

The proportion of positive or negative posts shifted by at least 15 percentage points in 24 hours compared to the 7-day average.

threshold: 15pp shift (low) / 20pp (medium) / 30pp (high)severity: low → medium → high
Platform Divergence

Average sentiment score differs by more than 0.6 (on a -1 to +1 scale) between any two platforms with 5+ posts each in the last 48 hours.

threshold: 0.6 divergence (medium) / 0.9 (high)severity: medium → high
Entity Surge

A named entity appears in more than 20% of recent discourse for a beat, dominating the conversation.

threshold: 20% prevalence (low) / 30% (medium) / 50% (high)severity: low → medium → high
Emotional Register Shift

The dominant emotional register changed (e.g., analytical to anxious) with the new register appearing in more than 40% of recent posts.

threshold: 40% dominance + register changeseverity: medium (same polarity) / high (crosses positive/negative boundary)
Key Phrase Trending

A phrase appears in 10%+ of recent discourse but was present in less than 2% of the prior week. Identifies new talking points.

threshold: 10% recent / <2% baseline (low) / 20% (medium) / 30% (high)severity: low → medium → high
Cross Topic Correlation

Two or more beats spike simultaneously (both above 2x baseline) while sharing top-5 entities. Runs once across all topics, not per-beat.

threshold: Both topics >2x baseline + shared entitiesseverity: low (no shared entities) → medium (1 shared) → high (2+)

Signal distribution (last 7 days)

197Entity Surge96Volume Anomaly89Platform Divergence65Sentiment Shift25Cross Topic Correlation13Key Phrase Trending11Emotional Register Shift
5

Narrative Clustering

Weekly (Mondays)

K-means clustering groups embedded discourse records into narrative threads. For each active beat, the system fetches up to 500 records from the last 7 days and clusters them into k=5 groups using cosine distance with k-means++ initialization.

k=5 clusters per beat500 records max5 iterationsmin 3 records per cluster
6

Meta-Pattern Detection

Twice daily (8am, 8pm UTC)

A cross-beat awareness layer detects system-level patterns that only emerge when you zoom out from individual beats. Five detectors run in parallel:

  • Sentiment convergence — 70%+ of beats shifting the same direction
  • Platform anomalies — a source surging 2.5x+ or going quiet (under 30% of normal)
  • Aggregate volume trends — total discourse 2x+ or under 40% of the weekly average
  • Milestones — system-level round numbers (e.g., 100k records processed)
  • Cross-topic correlation — the same signal type firing across 3+ beats within 12 hours

Meta-observations are injected as additional context into narrative generation, giving the AI writer awareness of the broader landscape when writing about individual beats.

7

Narrative Generation

Every 6 hrs (8am–10pm UTC)

Claude (claude-sonnet-4-6) generates all editorial content via structured output. Each narrative type receives different context depth and editorial instructions.

TypeScopeContextLength
Lead StoryCross-topicAll signals + top 3 topics' anchor posts3-5 paragraphs
Secondary StoryPer-topic15 discourse samples + signals + stats3-4 paragraphs
Beat NarrativePer-topic30 discourse samples + full signal context4-6 paragraphs
DispatchPer-signalSignal + 5 discourse samples1-2 sentences
Entity NarrativePer-entityCross-beat entity mentions + sentiment3-5 paragraphs
8

Dispatch Generation

3x daily (9:30, 15:30, 21:30 UTC)

Dispatches are short wire-style observations — 1-2 sentences about a specific signal. Only editorially worthy signals trigger a dispatch: low-severity signals are filtered out, entity surges on generic terms (like "AI" dominating AI discourse) are rejected as tautological, and platform divergence is demoted to beat narrative territory.

If Claude determines the signal only supports a statistical pattern with no specific event behind it, it returns "SKIP" and no dispatch is published.

9

Daily Snapshots

After each narration cycle

After narrative generation, the system computes and upserts one snapshot per active beat per day. Each snapshot materializes volume, sentiment distribution, platform mix, top entities, and signal count — replacing expensive time-bucketed scans of raw discourse for historical queries and sparkline charts.

10

Publishing & Social

Every 30 min

Content is published to the site immediately via ISR revalidation. Social distribution to Bluesky uses a saliency-based curator that mimics a human editor — selecting the single best piece to share at each posting window based on a weighted score:

Saliency Formula

saliency = signalSeverity * 0.45 + volumeMagnitude * 0.25
+ crossTopicBonus * 0.15 + recencyDecay * 0.15

Breaking (saliency >= 0.85): bypasses all cadence, 15-min minimum gap

Elevated (saliency >= 0.70): overrides daily budget, respects type gaps

Normal: respects all cadence rules (45-min gap, 4-6 posts/day, 11-15/week)

Editorial Principles

The system prompts that govern narrative generation encode specific editorial rules. These aren't suggestions — they're hard constraints that Claude must follow:

  • Narrative leads, data supports. Data never announces itself. Numbers appear woven into prose where they support a claim. "Volume increased 340%" is forbidden; "the conversation tripled after the Senate hearing" is required.
  • Anchor in specifics. Every story must reference specific posts, subreddits, or community dynamics. If you can remove all community names and the piece still reads the same, the grounding is decorative.
  • No dashboards in prose. Platform-by-platform sentiment tours are banned. Stories organize by insight, not by source.
  • Commit to claims. No hedging with "it remains to be seen" or "only time will tell." If the data supports a claim, the AI states it.
  • No manufactured urgency. "SURGE DETECTED" framing is banned. Urgency is calibrated to magnitude.

What This System Does Not Do

  • It does not access private messages, locked accounts, or content behind authentication walls.
  • It does not predict outcomes or argue that AI is good or bad.
  • It does not use human editors to write, review, or approve content before publication.
  • It does not track individual users across platforms or store personally identifiable information.
  • It does not use user-submitted content — all source material is publicly available discourse.