Overview
In plain English
AIDRAN has seven scheduled ingestion sources, plus public Exa, Webset, and Hugging Face article records used for enrichment, watchlists, and imports. We do not access private messages, locked accounts, or paywalled article bodies.
AIDRAN’s scheduled ingestion layer currently has seven upstream adapters: arXiv, Bluesky, Hacker News, Google News, Reddit, Twitter/X, and YouTube. Those workers write source configuration rows and public record rows. The corpus also recognizes Exa, Websets, and Hugging Face as article-category source kinds for public web enrichment, curated article imports, and public AI repository watchlists, preserving the actual publisher domain for attribution when one is present. Analysis, signal detection, story generation, and the web app read from that corpus through the Delivery API.
The system is built around public evidence. It does not read private messages, locked accounts, authentication-only pages, or paywalled article bodies. When a public source exposes author handles, bylines, timestamps, links, or engagement metrics, those fields may be stored so AIDRAN can attribute and weigh the record.
Display Buckets
In plain English
The website may show article records as News. That is a display bucket, not a hidden source list.
Source kind and public display label are not always the same thing. The database keeps the upstream source kind on each record, while the web app groups article-category records under a reader-facing News label where that is clearer than naming a discovery or enrichment provider.
Google News is one of the seven scheduled ingestion sources. arXiv, Exa, and Websets are also article-category sources. Hugging Face is article-category too, but it keeps a distinct Hugging Face label because readers need to distinguish model, paper, and dataset watchlist records from generic article discovery. External articles surfaced during story enrichment may also appear in the same News bucket with the actual publisher domain shown when available. News is therefore a presentation bucket for public article records and web citations, not a separate private feed.
Cadence And Status
In plain English
Ingestion is run by source-specific Render Workflow tasks. Cadence and enabled status are operational settings, not public promises.
Each scheduled ingestion source has its own Render Workflow task and deployment cron. Cadence, limits, and whether a credentialed source is enabled can change as upstream APIs, quotas, and reliability change. Enrichment article records are created by story-enrichment work or curated imports, not by a public scrape schedule. Delivery exposes source rows with enabled status and recent record volume; public stories and citations are generated from the records and citation links actually present in the corpus.
In plain English
Public subreddit discussions provide structured community-level AI discourse.
Reddit records come from public AI-related subreddit listings. The ingestion worker iterates a maintained subreddit set, skips removed or deleted text, and stores public post fields such as title, text when present, URL, author handle, subreddit, score, comment count, flair, and permalink.
- Record category: Discourse
- Record type: Public posts from AI-related subreddits
- Context stored: Subreddit, link metadata, and public engagement fields
Bluesky
In plain English
Public Bluesky search results capture AT Protocol posts about AI.
Bluesky records come from public AT Protocol search results. The adapter searches for AI-related language, paginates through public posts, and stores text, author handle or DID, URL, language, and public reply, repost, and like counts.
- Record category: Discourse
- Record type: Public posts
- Context stored: Handles, language, and public engagement fields
Hacker News
In plain English
Hacker News provides public technical-community discussion through its read-only API.
Hacker News records come from the public Firebase API. The current worker reads the public top-stories feed, fetches item details, filters dead or deleted items, and stores title, text when present, URL, author, score, descendant count, and item metadata.
- Record category: Discourse
- Record type: Public story items
- Context stored: Points, descendant counts, item type, and public links
Google News
In plain English
Google News is a public RSS discovery source for AI-related articles.
Google News records come from public RSS search results. AIDRAN stores article titles, descriptions from the feed, publisher names when present, publication times, and the stable Google News redirect URL. It does not bypass publisher paywalls or claim to store the full article body from the linked site.
- Record category: Article
- Record type: Public RSS article entries
- Context stored: Publisher, description, publication time, and source URL
YouTube
In plain English
YouTube records capture public video metadata and engagement statistics.
YouTube records come from the YouTube Data API. The worker searches recent public videos about AI and stores video title, description, channel information, publication time, thumbnail URL, public view, like, and comment counts, and tags when the API provides them.
- Record category: Discourse
- Record type: Public video metadata
- Context stored: Channel, description, thumbnail, available tags, and public metrics
arXiv
In plain English
arXiv provides public research preprints in AI-relevant categories.
arXiv records come from the public Atom API. AIDRAN searches AI-relevant categories, including cs.AI, cs.LG, cs.CL, and stat.ML, and stores paper title, abstract, authors, categories, publication time, and canonical arXiv URL.
- Record category: Article
- Record type: Public preprint metadata and abstracts
- Context stored: Authors, categories, abstract, and canonical URL
X (Twitter)
In plain English
Twitter/X captures public recent-search posts when API access is configured.
Twitter/X records come from the API v2 recent-search endpoint when a bearer token is configured. The adapter searches public English-language AI posts, excludes retweets and replies in its query, and stores text, author id, public URL, publication time, language, public metrics, and expanded URLs.
- Record category: Discourse
- Record type: Public recent-search posts
- Context stored: Public metrics, language, linked URLs, and tweet URL
Exa and Websets
In plain English
Public web articles can supplement story context. The site displays them as News with publisher or domain attribution when available.
Story enrichment can use public web article results from Exa and curated Webset article imports when a story needs outside article context. Live Exa results may be stored as article records, and curated Webset entries can be imported as article records. Some cited web sources appear only as external citation links attached to a story rather than as scheduled ingestion rows.
- Record category: Article
- Record type: Public web article results and curated public article entries
- Context stored: Publisher or domain, title, snippet or excerpt, URL, publication date when available, and provider metadata
Hugging Face
In plain English
Hugging Face records track public AI model, paper, and dataset pages without treating them as generic News.
Hugging Face records come from public watchlist targets on huggingface.co, such as model, paper, organization, and dataset pages relevant to AI discourse. AIDRAN stores public page metadata, titles, URLs, timestamps when available, and provider metadata needed for attribution.
- Record category: Article
- Record type: Public AI repository and research watchlist items
- Context stored: Title, URL, public page metadata, and provider metadata
What We Don't Collect
In plain English
No private messages, locked accounts, or paywalled article bodies. Public attribution fields may be stored when the source provides them.
- Private messages, DMs, or non-public account content
- Content behind authentication walls that the public cannot access
- Paywalled article bodies or paywall bypasses
- Reader behavior profiles or user-submitted private material
- Content from locked or private accounts
- Private identity enrichment beyond public handles, bylines, and source attribution fields
Content Removal
In plain English
If your public post or article metadata appears in AIDRAN and you want it removed, send us the original URL or source identifier.
If you are the author or rights holder for content that appears in AIDRAN and would like it removed, please contact us at privacy@aidran.ai with a link to the original content or enough source information for us to identify the record. We will process removal requests within 30 days.