The pipeline

How PULSE turns raw worldwide noise into a single reasoned answer.

PULSE is not a chatbot bolted onto a news feed. It's a brain. Five stages run continuously — ingestion, enrichment, hybrid retrieval, two-stage reasoning, and a knowledge graph — so every question gets answered against a fresh, cross-domain map of the world.

01

Ingestion — listening to the whole world, not one beat

Hundreds of RSS sources across politics, economy, geopolitics, sports, entertainment, science, tech, weather, health and culture. A scheduled job pulls them on rotation.

Each new article gets a normalised record: title, URL, source, publish time, raw content. We deliberately keep ALL categories on. A drought, a football match, a celebrity scandal and a chip embargo can all bend the same outcome — siloed feeds miss those couplings.

  • Per-source rate limits + dedup on URL + cluster id for the same story.
  • Stored in the worldwide data table — the only ground truth downstream stages may cite.
  • Recency wins ties: every signal carries its published_at for later decay.
02

Enrichment — every signal becomes machine-thinkable

A Lovable AI pass extracts structure from each article so retrieval and reasoning can work on facts, not on prose.

For each article we produce: a tight AI title and summary, keywords, sentiment, urgency, an embedding vector, plus three structured layers — entities (people, orgs, places, products, with role), numbers (value, unit, kind, context), and claims (atomic statements with confidence).

  • Embedding → enables semantic search beyond keywords.
  • Entities + numbers + claims → enable cross-story joins (e.g. same company in two unrelated feeds).
  • Novelty score → boosts signals that are not already echoed by the rest of the feed.
03

Hybrid retrieval — picking what matters for THIS question

When you ask a question, PULSE does not LLM-scan everything. It runs a four-channel retrieval, then fuses results with Reciprocal Rank Fusion.

The question is first expanded by a tiny Flash-lite call into reformulations + named entities. Four rankers then race in parallel:

  • Vector — semantic match against the question and its reformulations.
  • Keyword — token overlap on titles, summaries, keywords.
  • Entity — direct hits on extracted entity names.
  • Diversity — at least 1–2 picks per category so other domains are never starved.

Results are merged with RRF, decayed by recency (14-day half-life), boosted by novelty, and collapsed by story cluster so the same event never crowds out the others. The output is ~22 worldwide data points, each tagged with which signals surfaced it.

04

Two-stage reasoning — first think, then write

A single prompt that does both retrieval-reading and prose-writing is brittle. PULSE splits them.

Stage A — Reasoner

Reads every data point. Classifies each as direct, indirect or speculative. Builds explicit causal chains (point → first-order effect → second-order effect → relevance). Produces 2–4 perspectives. Lists data gaps. Emits a knowledge graph of nodes + edges. Output is a single structured JSON via tool-calling — no prose.

Stage B — Writer

Receives the reasoning JSON and writes the briefing in plain English. Cites with [#n]. Never invents sources or chains beyond what Stage A produced. Stays calm and observational — maps reality, never recommends. Ends with the knowledge graph as a fenced block.

05

Knowledge graph — the reasoning, made visible

Every answer ships with a small graph: the entities, concepts and effects the reasoner actually connected to arrive at this response.

Click a [#n] citation or a node and the original data point opens in a side panel — title, source, summary, entities, numbers, claims, link to the origin. The graph is not decoration; it is a hand-off so you can audit the chain and decide for yourself.

  • Nodes are typed: question, entity, data_point, concept, effect, conclusion.
  • Edges are labelled with the relation (e.g. 'pressures', 'displaces', 'echoes').
  • Sources panel groups by cluster so you see how many independent sources back each lead.
Design principles

What PULSE is — and isn't.

Observational, not prescriptive

PULSE maps reality. It never tells you what to do. The language is "the data points suggest", "one signal indicates" — you decide.

Cross-domain by default

The world is coupled. A category filter at retrieval time is a feature, not a bug — but PULSE always reserves slots for the "wrong" domain to surface lateral links.

Grounded in cited data points

No claim without a [#n]. Speculative chains are allowed only when labelled speculative and built on top of real cited signals.

Auditable by design

The graph, the chain classification (direct / indirect / speculative), and the side-panel source view exist so you can reconstruct exactly how the answer was built.

Ask the brain.

Open a research thread and watch the pipeline run end-to-end on your question.