Hosted onhyper.mediavia theHypermedia Protocol

    Problem

      Search lives in a dropdown. You type, you get title matches, you pick one. It works for "I know the name of the thing I want." It falls apart for everything else.

      Try searching for a concept that spans multiple documents. Or finding something you wrote last week but can't remember the title of. Or figuring out which version of a doc had that specific paragraph.

      We need a dedicated search page. But the current search API wasn't built for that — Now with the addition of semantic search and RRF combining, we can make a full search experience super useful.

    Solution

      1. IRI Filter — Scope Search to a Document or Subpath

        Currently, search is scoped to either an entire account (for web search) or the whole library. However, With IRI filtering, users can narrow search to a specific document or folder subpath.

        Examples:

        Search only within a single document: hm://<account>/cars/honda

        Search within a subpath: hm://<account>/cars/* (all documents under "cars")

        Leave empty to search the entire account (current behavior)

        This lets the search page offer a "search within" dropdown scoped to the user's current location in the document tree.

      2. Content-Type Filter — Choose What Gets Searched

        Today the search either looks at titles only, or titles + document bodies. The new content-type filter gives fine-grained control over which content types are included:

          Document — document body content

          Comment — comments on documents

          Contact — contact/profile information

          Title — document titles

        The search page can expose these as checkboxes or a filter bar. When no filter is selected, the existing behavior is preserved.

      3. Authority Ranking — Citation-based Result Quality

      1

        Opt-in ranking signal that uses citation data to surface more authoritative results. Two signals are blended into the existing search ranking:

        Document authority — how many other documents cite/link to this document

        Author authority — how many external citations the document's author has received across all their work (self-citations excluded)

        When enabled, the ranking weights become:

        Semantic similarity 35%

        Keyword match 35%

        Document citations 20%

        Author citations 10%

        Why exclude self-citations? Testing showed one author had 98% self-citations, inflating their score from 4 to 227. Filtering self-citations keeps the signal honest.

        Performance: Authority scores are computed on-the-fly from existing indexed data — ~7ms for 200 documents. No precomputation or caching needed.

      4. Semantic Dedup — Remove Near-Duplicate Versions

        Problem: When a document has multiple versions with minor edits (e.g., "cars" changed to "cars."), search returns both versions as separate results even though they're semantically identical.

        Solution: For semantic and hybrid search modes, group results by document + block + content type, then compare how similarly each version matches the query. If two consecutive versions score within 20% of each other, only the newest version is kept.

        Versions with meaningfully different content (>20% score difference) are both preserved

        Keyword-only search keeps the existing exact-match dedup (appropriate since it's character-level matching)

        This reduces clutter from minor edits without hiding genuinely different content across versions.

    Results Visualization

      The dedicated search page displays results as a vertical scrollable list of cards. Each card contains:

      Document title — clickable, navigates to the document

      Full path breadcrumb — e.g. My Account / cars / honda / civic showing the document's position in the hierarchy

      Version indicator — which version matched (timestamp or version label)

      Content snippet — preview of the matched block with query terms highlighted inline

      1

      Cards stack vertically in a single column, optimized for scanability and variable-length snippets.

      Title-only matches show the first content block as a fallback snippet.

    Scope

      4 backward-compatible additions to the existing search API

      Backend-only changes — no database migrations, no new tables

    Rabbit Holes

      Materialized authority caches — not needed, batch queries are fast enough on indexed data

      Configurable ranking weights or A/B testing — premature; hardcoded constants for now

      Complex dedup strategies (e.g., per-paragraph diffing) — percentage threshold on query score is simpler and sufficient

    No Gos

    Open Question

      How should search type (keyword / semantic / hybrid) be exposed on the search page?

      Explicit toggle — full control, potentially confusing for non-technical users

      Smart defaults with override — hybrid for search page, keyword for dropdown

      Backend heuristic — auto-select based on query length

      Always hybrid — simplest ¿?

      This doesn't block backend work since the search type selection already exists in the API.