Methodology
How the Pipeline Works
AIDRAN is a fully automated system. No human writes, edits, or approves the content before publication. Every piece of editorial output traces back to a deterministic pipeline: public discourse goes in, AI-generated narratives come out. This page explains every stage of that pipeline — what runs, how often, what thresholds govern promotion, and where AI makes decisions.
Ingestion
Every 30 min – 4 hrsCron-triggered workers pull public posts, comments, articles, and threads from six active platforms. Each source has its own schedule tuned to its update cadence — Reddit and Hacker News poll hourly, while news, YouTube, and arXiv run every four hours.
| Source | Schedule | 24h Volume |
|---|---|---|
| Every hour | 17k | |
| Hacker News | Every hour | 23 |
| News | Every 4 hours | 5.2k |
| YouTube | Every 4 hours | 772 |
| arXiv | Every 4 hours | 128 |
| X (Twitter) | Daily | — |
Embedding
Every 5 minNew records enter an embedding queue. A worker drains the queue in batches, sending text to Voyage AI (model: voyage-4-lite, 512 dimensions). The resulting vectors are stored directly on the discourse record and used for semantic search, clustering, and narrative deduplication.
The embedding space is shared across all beats. Cosine similarity between any two records captures semantic relatedness regardless of platform or phrasing.
AI Analysis
Every 10 min (+ Batch API)Every record passes through AI for structured analysis. The system uses Groq Llama 3.3 70B for high-throughput analysis, extracting four dimensions per record:
Sentiment
Label (positive / neutral / negative) + continuous score from -1.0 to +1.0. Materialized as indexed columns for fast aggregation.
Named Entities
Up to 10 per record. Typed as: person, organization, state, government, product, technology, legislation, or concept.
Key Phrases
3-5 short phrases capturing core claims. Used for trending phrase detection and cluster labeling.
Emotional Register
One of ten registers: analytical, anxious, optimistic, defiant, resigned, celebratory, skeptical, fearful, pragmatic, or satirical.
Current sentiment distribution (48h)
Signal Detection
Every 30 minSignals are anomalies that something changed. The detector compares the last 24 hours against a 7-day rolling baseline for every active beat, running seven independent detectors per beat. A signal fires only if it passes its threshold and hasn't been seen in the last 12 hours (dedup window).
Conversation volume (raw or engagement-weighted) is running significantly above baseline. Uses Reddit score and Bluesky likes as engagement proxies.
The proportion of positive or negative posts shifted by at least 15 percentage points in 24 hours compared to the 7-day average.
Average sentiment score differs by more than 0.6 (on a -1 to +1 scale) between any two platforms with 5+ posts each in the last 48 hours.
A named entity appears in more than 20% of recent discourse for a beat, dominating the conversation.
The dominant emotional register changed (e.g., analytical to anxious) with the new register appearing in more than 40% of recent posts.
A phrase appears in 10%+ of recent discourse but was present in less than 2% of the prior week. Identifies new talking points.
Two or more beats spike simultaneously (both above 2x baseline) while sharing top-5 entities. Runs once across all topics, not per-beat.
Signal distribution (last 7 days)
Narrative Clustering
Weekly (Mondays)K-means clustering groups embedded discourse records into narrative threads. For each active beat, the system fetches up to 500 records from the last 7 days and clusters them into k=5 groups using cosine distance with k-means++ initialization.
Meta-Pattern Detection
Twice daily (8am, 8pm UTC)A cross-beat awareness layer detects system-level patterns that only emerge when you zoom out from individual beats. Five detectors run in parallel:
- Sentiment convergence — 70%+ of beats shifting the same direction
- Platform anomalies — a source surging 2.5x+ or going quiet (under 30% of normal)
- Aggregate volume trends — total discourse 2x+ or under 40% of the weekly average
- Milestones — system-level round numbers (e.g., 100k records processed)
- Cross-topic correlation — the same signal type firing across 3+ beats within 12 hours
Meta-observations are injected as additional context into narrative generation, giving the AI writer awareness of the broader landscape when writing about individual beats.
Narrative Generation
Every 6 hrs (8am–10pm UTC)Claude (claude-sonnet-4-6) generates all editorial content via structured output. Each narrative type receives different context depth and editorial instructions.
| Type | Scope | Context | Length |
|---|---|---|---|
| Lead Story | Cross-topic | All signals + top 3 topics' anchor posts | 3-5 paragraphs |
| Secondary Story | Per-topic | 15 discourse samples + signals + stats | 3-4 paragraphs |
| Beat Narrative | Per-topic | 30 discourse samples + full signal context | 4-6 paragraphs |
| Dispatch | Per-signal | Signal + 5 discourse samples | 1-2 sentences |
| Entity Narrative | Per-entity | Cross-beat entity mentions + sentiment | 3-5 paragraphs |
Dispatch Generation
3x daily (9:30, 15:30, 21:30 UTC)Dispatches are short wire-style observations — 1-2 sentences about a specific signal. Only editorially worthy signals trigger a dispatch: low-severity signals are filtered out, entity surges on generic terms (like "AI" dominating AI discourse) are rejected as tautological, and platform divergence is demoted to beat narrative territory.
If Claude determines the signal only supports a statistical pattern with no specific event behind it, it returns "SKIP" and no dispatch is published.
Daily Snapshots
After each narration cycleAfter narrative generation, the system computes and upserts one snapshot per active beat per day. Each snapshot materializes volume, sentiment distribution, platform mix, top entities, and signal count — replacing expensive time-bucketed scans of raw discourse for historical queries and sparkline charts.
Publishing & Social
Every 30 minContent is published to the site immediately via ISR revalidation. Social distribution to Bluesky uses a saliency-based curator that mimics a human editor — selecting the single best piece to share at each posting window based on a weighted score:
Saliency Formula
saliency = signalSeverity * 0.45 + volumeMagnitude * 0.25
+ crossTopicBonus * 0.15 + recencyDecay * 0.15Breaking (saliency >= 0.85): bypasses all cadence, 15-min minimum gap
Elevated (saliency >= 0.70): overrides daily budget, respects type gaps
Normal: respects all cadence rules (45-min gap, 4-6 posts/day, 11-15/week)
Editorial Principles
The system prompts that govern narrative generation encode specific editorial rules. These aren't suggestions — they're hard constraints that Claude must follow:
- Narrative leads, data supports. Data never announces itself. Numbers appear woven into prose where they support a claim. "Volume increased 340%" is forbidden; "the conversation tripled after the Senate hearing" is required.
- Anchor in specifics. Every story must reference specific posts, subreddits, or community dynamics. If you can remove all community names and the piece still reads the same, the grounding is decorative.
- No dashboards in prose. Platform-by-platform sentiment tours are banned. Stories organize by insight, not by source.
- Commit to claims. No hedging with "it remains to be seen" or "only time will tell." If the data supports a claim, the AI states it.
- No manufactured urgency. "SURGE DETECTED" framing is banned. Urgency is calibrated to magnitude.
What This System Does Not Do
- It does not access private messages, locked accounts, or content behind authentication walls.
- It does not predict outcomes or argue that AI is good or bad.
- It does not use human editors to write, review, or approve content before publication.
- It does not track individual users across platforms or store personally identifiable information.
- It does not use user-submitted content — all source material is publicly available discourse.