Methodology // AIDRAN

Overview

In plain English

AIDRAN is an automated corpus and story system. Public material becomes records, analysis, signals, and AI-generated stories served through a read-only API.

AIDRAN is a fully automated system for watching how public communities talk about artificial intelligence. No human writes or approves stories before publication. The system ingests public material, enriches it with analysis, detects signals, generates versioned stories, syncs public search documents, and serves the corpus through a read-only Delivery API consumed by the web app.

Background work runs on Cloudflare Workers, Queues, and Containers. Postgres with pgvector remains the corpus source of truth. Typesense is the public search projection for fast product search and discovery, while the Delivery API remains the read-only gateway that the website uses.

Stage 1 — Ingestion

In plain English

Cloudflare workflow tasks collect public posts, articles, papers, and video metadata from configured source adapters, then deduplicate them as records.

Ingestion tasks run through the Cloudflare workflow runtime. Each source adapter normalizes public material into source rows and record rows, then Postgres enforces deduplication by provider and provider-native id. The ingestion service owns the source and record write path; downstream services read those records rather than collecting their own copies.

Record family	Examples	How AIDRAN uses it
Discourse	Reddit, Bluesky, Hacker News, X, YouTube comments	Tracks public reaction, community framing, and conversational shifts.
Articles and papers	Google News, arXiv, Hugging Face, enrichment articles	Adds publisher context, research context, and source material for citations.
Enrichment sources	Exa results and curated webset articles	Supplements story context when the editorial workflow needs outside public articles.

AIDRAN does not access private messages, locked accounts, content behind authentication walls, or user-submitted material. The system follows topics and source records; it is not designed to profile individual people across platforms.

Stage 2 — Analysis

In plain English

Analysis turns records into embeddings, entities, sentiment, and topic assignments. Those outputs feed signals, stories, and search.

The Analysis service reads records and writes derived analysis tables. Embeddings are stored separately from source records so the corpus can support semantic similarity and future model rounds without rewriting the original record. Named entities are normalized into canonical entity rows with mention joins, and topic assignment links records to active beats.

Embeddings are stored in the embeddings table and indexed with pgvector for similarity and semantic comparison.
Entity extraction writes canonical entities and per-record mentions, giving entity pages and search a shared source of truth.
Sentiment is materialized on records for fast filtering and detector scoring.
Topic assignment combines embedding similarity against topic centroids with keyword fallback, producing beat membership for records.

Stage 3 — Signal Detection

In plain English

Signal workflows look for novelty, velocity, and divergence, then save evidence rows that downstream stories can cite.

Signals are structured evidence that something in the corpus changed. The current Signal service runs detector tasks for novelty, velocity, and divergence. These tasks compare recent windows against historical context, score candidate changes, and persist both the signal and the evidence rows that explain why it exists.

Novelty looks for records or entities that are unusually distant from recent topic context.
Velocity looks for topic or entity mention rates moving faster than their recent history.
Divergence looks for the same entity being framed differently across source groups.

Detector thresholds and calibration live in the Signal service, not in page copy. A signal is not an article by itself; it is a machine-readable finding with provenance that Editorial can turn into a story.

Stage 4 — Story Generation

In plain English

Editorial workflows create versioned stories from signal and record context, then enrich them with citations, sections, variants, arcs, and SEO metadata.

Editorial workflows read upstream corpus state and write the story surface. A triggering signal first creates a source-attributed story row, then an enrichment workflow rewrites it into the current structured story contract. Provider and model ids are recorded in story metadata so generation remains auditable.

Story type	Role
Lead story	Front-page story for the strongest cross-topic or high-consequence signal.
Beat story	Topic-level story for an active beat or sustained thread.
Secondary story	Full story for an important signal that is not the lead.
Dispatch	Short live signal format used by the wire and other live surfaces.
Entity story	Story centered on how a person, company, product, or concept is being discussed.

Stories are versioned rather than overwritten in place. Enrichment writes citations, structured sections, FAQs, content variants, related-story edges, story-arc membership, and SEO metadata. Live Wire dispatches are story rows; the /wire page reads published dispatch stories rather than signal rows or raw records.

Stage 5 — Search and Delivery

In plain English

Typesense powers the public search projection, while Postgres and pgvector remain authoritative. The web app reads through Delivery.

Search-sync owns the public search projection. It adapts stories, story citations, entities, and topics into Typesense documents, records push state in Postgres, and can reconcile drift. Raw records remain in Postgres for provenance, hydration, and pgvector similarity.

Delivery is the read-only corpus gateway. It serves the public /v1 API, OpenAPI, Scalar docs, story and entity routes, search routes, sitemap helpers, and SEO metadata reads. Delivery does not write corpus tables; its narrow write exception is user/API-key state for documented admin paths.

The web app consumes Delivery through server-side helpers and proxy routes. Browser search talks to web route handlers, which forward authenticated requests to Delivery without exposing backend keys. Public pages remain crawlable and indexable where appropriate, including stories, beats, entities, arcs, dispatch wire pages, and transparency pages like this one.

Editorial Principles

In plain English

The story prompt is claim-first, source-grounded, and anti-dashboard: the system explains patterns without pretending to be a human newsroom.

The story prompt encodes hard editorial constraints. The goal is not to summarize every source or platform in sequence. The goal is to make a grounded claim about what changed in the public conversation and support it with specific evidence.

Story leads, data supports. Numbers appear only where they explain the claim, not as dashboard prose.
Anchor in specifics. Stories must use concrete posts, publications, communities, entities, or cited articles when making factual claims.
No platform walk-throughs. The prose organizes by insight, not by a tour of Reddit, Hacker News, Bluesky, or any other source.
Separate citation from analysis. Source citations back factual claims; AIDRAN’s own analytical framing is not falsely attributed to a source.
No manufactured urgency. The system calibrates intensity to the evidence available in the corpus.

What This System Does Not Do

In plain English

No private messages. No human-written stories. No individual tracking. No claim that AI output is authoritative reporting.

It does not access private messages, locked accounts, or content behind authentication walls.
It does not use human editors to write, review, or approve stories before publication.
It does not treat the deprecated story active flag as the public truth of what should be listed.
It does not use raw records as wire dispatches; dispatches are story rows.
It does not use Typesense as the source of truth; Postgres and pgvector remain authoritative.
It does not predict outcomes or argue that AI is good or bad.
It does not track individual users across platforms or store private personal profiles.