AI Safety

I built a semantic search engine for LessWrong

Developer builds Connected Papers for LessWrong using Voyage-3.5 embeddings on 372k posts and comments.

Deep Dive

Developer 'neo' has launched Situate.info, a semantic search engine specifically designed for the LessWrong rationalist community and adjacent publications. The tool functions similarly to Connected Papers, allowing users to input a LessWrong or Substack URL and receive a list of the top 100 semantically similar documents. The current database, updated April 19th, contains approximately 372,000 documents, including 36,220 LessWrong posts, 289,721 comments, and 40,289 posts from 173 Substack publications like Astral Codex Ten and Overcoming Bias.

The technical pipeline involves scraping content, filtering by karma (≥5), and splitting documents into ~360-token chunks. These chunks are embedded using Voyage-3.5's 1024-dimensional model, then averaged into a single document vector for efficient retrieval via cosine distance. The creator's motivation is to improve collective epistemics and help newcomers 'situate' themselves in the dense, rapidly evolving field of AI safety by connecting new ideas to past discussions.

Future plans include open-sourcing the ingestion and embedding pipeline, adding features like BGE reranking for improved accuracy, and potentially creating a connected force graph visualization. While acknowledging that the Lightcone team could build a similar tool, neo sees value in a third-party system that can ingest information from diverse corners of the internet to aid navigation and synthesis of complex ideas.

Key Points
  • Indexes 372,000 documents from LessWrong & 173 Substack publications using Voyage-3.5 embeddings
  • Returns top 100 similar documents via cosine similarity for any input URL
  • Aims to help newcomers navigate AI safety discourse by connecting posts to historical context

Why It Matters

Provides crucial context for AI safety debates, helping newcomers quickly understand complex, evolving discussions within a specialized community.