BREAKING
7h agoAnthropic introduces "dreaming," a system that lets AI agents learn from their own mistakes///7h agoZAYA1-8B Technical Report///7h agoEMO: Pretraining mixture of experts for emergent modularity///7h agoThe back office problem that explains why specialists never call you back///7h agoMojo 1.0 Beta///7h ago[AINews] GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs///7h agoCaligra c100 Developer Terminal///7h agoClojureScript Gets Async/Await///7h agoSee what happens when creative legends use AI to make ads for small businesses///7h agoClaude Code, Codex and Agentic Coding #8///7h agoResearchers discover advanced language processing in the unconscious human brain///7h agoPartial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems///7h agoPRISM: Perception Reasoning Interleaved for Sequential Decision Making///7h agoAgentic Retrieval-Augmented Generation for Financial Document Question Answering///7h agoFrom History to State: Constant-Context Skill Learning for LLM Agents///7h agoAgentic Discovery of Exchange-Correlation Density Functionals///7h agoLANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks///7h agoAre Flat Minima an Illusion?///7h agoSAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees///7h agoPhysics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning///7h agoHorizon-Constrained Rashomon Sets for Chaotic Forecasting///7h agoAdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation///7h agoCounterargument for Critical Thinking as Judged by AI and Humans///7h agoGenerating Query-Focused Summarization Datasets from Query-Free Summarization Datasets///7h agoSLAM: Structural Linguistic Activation Marking for Language Models///7h agoReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis///7h agoAuthorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure///7h agoGNU IFUNC is the real culprit behind CVE-2024-3094///7h agoMedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required///7h agoThe biggest U.S. power grid is under strain from AI — and no one is happy///7h ago5% GPU utilization: The $401 billion AI infrastructure problem enterprises can't keep ignoring///7h agoLaTA: A Drop-in, FERPA-Compliant Local-LLM Autograder for Upper-Division STEM Coursework///7h agoTwo Home Affairs officials suspended after AI 'hallucinations' found///7h agoShinyHunters claims data theft from 8,800 schools (Instructure/Canvas)///7h agoCanvas Breach Disrupts Schools & Colleges Nationwide///7h agoHardening Firefox with Claude Mythos Preview///7h agoUnderstanding Annotator Safety Policy with Interpretability///7h agoWhen Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models///7h agoThe Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias///7h agoIntentionality is a Design Decision: Measuring Functional Intentionality for Accountable AI Systems///7h agoHow Go Players Disempower Themselves to AI///7h agoThe New Wild West of AI Kids’ Toys///7h agoBehind the Blog: Storage Woes and RSS///7h agoDid xAI just concede the AI race?///7h agoMusk vs. Altman Evidence Shows What Microsoft Executives Thought of OpenAI///7h agoAnthropic introduces "dreaming," a system that lets AI agents learn from their own mistakes///7h agoZAYA1-8B Technical Report///7h agoEMO: Pretraining mixture of experts for emergent modularity///7h agoThe back office problem that explains why specialists never call you back///7h agoMojo 1.0 Beta///7h ago[AINews] GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs///7h agoCaligra c100 Developer Terminal///7h agoClojureScript Gets Async/Await///7h agoSee what happens when creative legends use AI to make ads for small businesses///7h agoClaude Code, Codex and Agentic Coding #8///7h agoResearchers discover advanced language processing in the unconscious human brain///7h agoPartial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems///7h agoPRISM: Perception Reasoning Interleaved for Sequential Decision Making///7h agoAgentic Retrieval-Augmented Generation for Financial Document Question Answering///7h agoFrom History to State: Constant-Context Skill Learning for LLM Agents///7h agoAgentic Discovery of Exchange-Correlation Density Functionals///7h agoLANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks///7h agoAre Flat Minima an Illusion?///7h agoSAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees///7h agoPhysics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning///7h agoHorizon-Constrained Rashomon Sets for Chaotic Forecasting///7h agoAdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation///7h agoCounterargument for Critical Thinking as Judged by AI and Humans///7h agoGenerating Query-Focused Summarization Datasets from Query-Free Summarization Datasets///7h agoSLAM: Structural Linguistic Activation Marking for Language Models///7h agoReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis///7h agoAuthorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure///7h agoGNU IFUNC is the real culprit behind CVE-2024-3094///7h agoMedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required///7h agoThe biggest U.S. power grid is under strain from AI — and no one is happy///7h ago5% GPU utilization: The $401 billion AI infrastructure problem enterprises can't keep ignoring///7h agoLaTA: A Drop-in, FERPA-Compliant Local-LLM Autograder for Upper-Division STEM Coursework///7h agoTwo Home Affairs officials suspended after AI 'hallucinations' found///7h agoShinyHunters claims data theft from 8,800 schools (Instructure/Canvas)///7h agoCanvas Breach Disrupts Schools & Colleges Nationwide///7h agoHardening Firefox with Claude Mythos Preview///7h agoUnderstanding Annotator Safety Policy with Interpretability///7h agoWhen Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models///7h agoThe Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias///7h agoIntentionality is a Design Decision: Measuring Functional Intentionality for Accountable AI Systems///7h agoHow Go Players Disempower Themselves to AI///7h agoThe New Wild West of AI Kids’ Toys///7h agoBehind the Blog: Storage Woes and RSS///7h agoDid xAI just concede the AI race?///7h agoMusk vs. Altman Evidence Shows What Microsoft Executives Thought of OpenAI///
/// Changelog

Changelog

Every PR merged by human or robot. Full transparency on what changed and why.

30 entries / generated May 8, 2026

Apr 19, 202618:51Human
#199Per-day lead theme + summary (regenerates per export)

Added `generate_day_summary()` function (mirroring `generate_week_theme`) that uses Haiku to generate JSON with `dayTheme` and `daySummary` per digest. The digest loop detects the latest day and regenerates both fields on each export; prior days reuse cached values from the previous JSON.

The weekly digest hero needs a headline that reflects the current day's top-story selection and refreshes hourly, but the existing week-scoped `weekTheme`/`weekSummary` are cached and don't change between exports.

Apr 10, 202622:04Evolve
#198feat: derive tweet hashtags from glossary entities

Added hashtag derivation logic that queries prediction-linked entities and top global entities (10+ mentions), converts them to cased hashtags (e.g., #OpenAI), and passes candidates to the tweet prompt so Haiku selects 2-4 most relevant ones.

Tweets lack relevant, consistent hashtags. By leveraging the entity glossary (already extracted during scoring), the system can automatically suggest contextually appropriate hashtags while preserving brand identity.

Apr 10, 202621:34Evolve
#197feat: enrich predictions export with signals, metrics and all fields

Added a LEFT JOIN with `prediction_metrics_snapshot` to include topic velocity and source metadata, populated `signals[]` with evidence stories via a separate query, and exposed 7 previously unexported columns (evaluatedAt, targetDate, evalHorizon, tweet, tweetReply, postedToXAt)—all additive, no breaking changes.

The predictions export was missing contextual data (topic velocity, source signals, evaluation timestamps) that would help the frontend and downstream consumers understand the grounding and temporal context of each prediction.

Apr 10, 202616:42Human
#196Emit tag/tier distribution and fix recent stories in push_stats

Added `get_tag_distribution()` (top 20 tags from last 30 days) and `get_tier_distribution()` (notable/essential split from last 7 days) functions to `push_stats.py`, and fixed `get_recent_scored_stories()` to include all digested stories and sort by `scored_at` instead of `fetched_at`. Hoisted `stories_per_hour_24h` and `backlog_trend_6h` to the top-level stats dict so the oracle-pi API receives them correctly.

The oracle-pi site needed insights into tag distribution, tier split, and recent scoring activity to populate the Data Quality dashboard. The `push_stats` pipeline was missing these metrics and had a bug where recent stories were incorrectly filtered and sorted, limiting visibility into what the system was scoring.

Apr 10, 202612:43Human
#195Add per-source daily cap and set-config command

fetch.py now enforces a `daily_cap` config key per RSS source, counting stories fetched today and skipping the source once the cap is reached. manage_sources.py validates `daily_cap` and `max_per_fetch` as positive integers and adds a `set-config` subcommand for updating source configuration from the CLI.

Noisy RSS sources can overwhelm the pipeline by fetching excessive stories per day. This change enables per-source rate limiting to control ingestion volume and prevent resource waste.

Apr 8, 202605:50Evolve
#182feat: enrich GlossaryEntity export with computed DB fields

- Adds `subjectCount`, `sourceCount`, `recentMentionCount`, `peakRelevanceScore` via a single batch aggregation JOIN (no N+1)

Apr 8, 202605:22Evolve
#181feat: add mentionsByDay to glossary export

- Adds `mentionsByDay` to the glossary JSON export — 30-day daily entity mention counts from `entity_mentions` + `stories` join

Apr 8, 202600:46Evolve
#180fix: normalize common relevance score strings

Added normalization logic that converts common relevance score strings to their numeric equivalents, while preserving validation failure behavior for unrecognized values so they retry through existing error handling.

Claude's relevance score field was sometimes returning string values ("high"/"medium"/"low") instead of numeric scores, causing validation failures and blocking the pipeline.

Apr 7, 202602:52Evolve
#174feat: count delivered stories from digest ledger

Switched delivery counting from story status to the digest ledger, updated "processed today" logic to check fetch/digest timestamps instead, and added regression tests for cases where story status has drifted.

Story status fields can drift away from actual delivery state, leading to inaccurate delivery counts. The digest ledger provides authoritative tracking of what was actually delivered.

Apr 7, 202602:18Human
#173Handle tuple-wrapped score outputs in process_one

Updated the `score_one` function's normalization logic to unwrap nested tuple wrappers from `score_story` outputs, and added regression test coverage for both direct and nested tuple-shaped outputs.

Score outputs were being wrapped in nested tuples, causing the normalization logic to fail during story scoring. This prevented the pipeline from handling certain score output formats.

Apr 7, 202601:22Evolve
#167fix: use timezone-aware prediction timestamps

Switched `evaluated_at` and `expiry` fields to timezone-aware UTC datetimes, updated prediction context-building to include same-day stories, and added regression tests for prediction evaluation and queue ordering.

Timezone-naive timestamps were causing incorrect prediction evaluation and expiry logic, leading to off-by-one errors in story context and missed predictions (issue #166).

Apr 6, 202617:20Evolve
#163feat: confidence-aware deep predictions prompt

Added confidence levels (high/medium/moonshot) to past predictions shown to Opus and replaced flat deduplication logic with confidence-aware guidance—high-confidence predictions can be revisited and refined, while medium/moonshot and resolved predictions maintain the existing dedup rule.

Prior predictions were deduplicated uniformly, preventing follow-up insights on high-confidence topics even when new signals emerged. The change enables the deep predictions engine to reinforce or challenge high-confidence prior predictions based on fresh data.

Apr 6, 202611:24Human
#162fix/entity backfill unknown first

Coerced missing score fields before validation to handle malformed data gracefully, tightened entity backfill prompts for better extraction accuracy, and updated deep predictions learnings based on operational feedback.

Stories with missing score fields were causing validation failures (#122), and entity backfill prompts needed to be more precise to handle unknown entities correctly.

Apr 6, 202611:25Human
#161fix/deep predictions skill docs

The fix coerces missing score fields to default values before validation runs, preventing validation errors. Deep predictions skill docs were updated to clarify usage and implementation details.

Article validation was failing when score fields were missing (issue #122), and the deep predictions skill documentation needed updating to reflect current learnings and best practices.

Apr 6, 202602:59Evolve
#156fix: coerce missing score fields before validation

Added field coercion with defaults before validation runs, so stories with partial Claude responses now score successfully as long as the essential fields are populated.

Claude responses for story scoring sometimes omitted optional fields (summary_long, summary_short, entities), causing validation to fail even when core scoring data was present.

Apr 6, 202602:54Evolve
#155fix: keep post_to_x dry-run read-only

Refactored the dry-run path to be purely read-only, eliminating Claude calls and DB writes during preview, and added regression tests to prevent future violations.

Dry-run mode was making Claude API calls and writing to the database despite being intended as a preview-only operation, wasting API quota and creating unintended side effects.

Apr 6, 202602:18Evolve
#154fix: handle tuple-wrapped filter output

Hardened `process_one` filter handling to normalize tuple-wrapped outputs, matching the existing score-path normalization, and added a regression test to verify stories advance cleanly through tuple-wrapped filter results.

The `process_one` filter path was not handling tuple-wrapped return values, while the score path already had normalization for this case. This inconsistency caused stories to fail advancement when filters returned tuples (issue #132).

Apr 6, 202602:11Evolve
#153fix: coerce digit-string relevance scores

Added type coercion in the validation pipeline to normalize digit-string scores to integers before validation; included a regression test covering the batch scoring path and updated the improvement log.

Claude's JSON output sometimes wraps `relevance_score` as a digit-string (e.g., `"4"` instead of `4`), causing valid scores to fail validation and block story ingestion.

Apr 6, 202602:07Evolve
#152fix: retry HN API fetches

Added retry logic (up to 3 attempts) for Hacker News top-stories and item JSON requests to tolerate temporary connection failures.

Hacker News API requests were failing due to transient network issues, causing unnecessary fetch errors and reducing pipeline reliability.

Apr 6, 202600:51Evolve
#151fix: use shared db connection in export site

Replaced `export_site.py`'s local connection wrapper with the shared DB connection helper from `utils/`, added regression tests for the shared connection/migration behavior, and updated tests to match the current JSON and predictions schema.

Eliminated duplicate database connection management in `export_site.py` to enforce the "one writer" architectural pattern documented in the repo doctrine, ensuring consistent connection behavior and migration handling across the codebase.

Apr 5, 202622:42Evolve
#148feat: push last 10 scored stories to Redis for status page

Added `get_recent_scored_stories()` to `push_stats.py` that queries the last 10 scored/digested stories and writes them to Redis (`oracle:stats:recent_stories`) on every push_stats run (every minute), including title, URL, score, timestamp, and source metadata.

The status page on oracle-pi needs access to recent scored stories to display live data without querying SQLite directly.

Apr 5, 202622:12Evolve
#147feat: expand Redis stats history and backfill tweet replies

Expanded `push_stats.py` to include additional pipeline/throughput/scoring/system fields, increased hourly history retention from 48 to 720 entries (~30 days), and added a new 90-day daily summary key; added backfill logic to `post_to_x.py` to generate missing tweet replies via Haiku for predictions that have tweets.

Extend observability of the pipeline by capturing more granular metrics over longer retention periods, and ensure all predictions have corresponding tweet replies.

Apr 5, 202622:10Evolve
#146perf: cache haikus, week theme, and PR summaries on re-export

Added three-tier caching for haikus, week themes, and PR summaries: check the SQLite DB and previously exported JSON files first, only generate via Haiku API for genuinely new dates/content.

Re-exports to oracle-pi were making ~37 unnecessary Haiku API calls per run when content hadn't changed, wasting quota and slowing the pipeline.

Apr 5, 202621:51Evolve
#145refactor: emit JSON data files instead of TS literals

Replaced `escape_ts_string`/`render_ts_file` with `json.dumps`, emit `manifest.json` instead of `index.ts` dynamic imports, and converted all downstream data files (glossary, trends, changelog, predictions) to JSON with support for historical re-exports via `--week` flag.

Hand-built TypeScript string rendering was complex and fragile; switching to JSON simplifies the data export pipeline and eliminates 311 lines of string-building code.

Apr 5, 202611:50Evolve
#144feat: tweet thread replies with analytical tone

Added `tweet_reply` column to predictions table to store signal reasoning; `post_to_x.py` now posts prediction as main tweet + self-reply thread explaining the supporting data points, with reply failure not blocking the main tweet.

Predictions were using overly declarative language ("bet", "will") that suggested false certainty; shifting to analytical framing ("signal suggests", "watching for") grounds predictions in data and makes them more credible.

Apr 5, 202603:06Evolve
#141fix: validate structured score output

Added validation to explicitly check that structured output is a dict before processing; non-dict payloads are now treated as scoring failures and blocked. Includes a regression test for malformed payloads.

Claude's structured output could return malformed payloads that weren't caught, allowing broken data to flow downstream instead of failing at the source.

Apr 5, 202602:21Human
#138Fix tuple-wrapped score outputs in process_one

Added normalisation to unwrap tuple/list-wrapped scores before validation, backfilled missing optional score fields for legacy compatibility, added regression tests for tuple-wrapped results, and implemented a sqlite_vec compatibility fallback for entity embedding deserialisation.

process_one was failing when score outputs came wrapped in tuples/lists, and legacy outputs lacked optional score fields. Additionally, entity embedding deserialisation was breaking on older sqlite_vec builds.

Apr 4, 202623:23Evolve
#136feat: semantic deduplication for entities and predictions

Implemented semantic embedding-based deduplication using all-MiniLM-L6-v2 (384-dim) embeddings with conservative thresholds: entities with similarity >0.90 are auto-merged, and predictions flagged as near-dupes within 7 days are excluded from exports.

Entities and predictions were being duplicated in the system due to naming variants (e.g., 'OpenAI' vs 'Open AI') that simple string matching couldn't catch, leading to redundant data in the knowledge base and exports.

Apr 4, 202620:53Evolve
#133feat: glossary entity edges for graph visualization

Added `compute_glossary_edges()` to extract co-occurrence patterns between entities and export them as weighted edges (source, target, weight) in `glossary.ts`. Also improved entity extraction prompts to emphasize canonical naming and widen extraction range from 3–8 to 2–10 entities.

The oracle-pi frontend needs relationship data to visualize entities as an interactive graph, showing how glossary entities co-occur and relate to each other.

Apr 4, 202612:24Evolve
#121fix: robust score output validation

Extracted a `validate_score_output()` helper in `score.py` that enforces all required fields and type constraints (`relevance_score` must be int 1–5), then applied it consistently in both `score_all()` and `score_one()`. Added comprehensive test suite covering invalid types, out-of-range values, and missing fields.

Score output validation was fragmented across the pipeline with inconsistent checks, allowing invalid Claude responses (wrong types, missing fields) to slip through or cause failures downstream. This PR centralizes validation to catch all issues upfront.