Full Stack Cloudflare Part 2 — Unified Content Discovery

Full Stack Cloudflare Part 2 — Unified Content Discovery

May 8, 2025Tory Briggs6 min read

Siloed Content

Finding the right information in a sea of rich media — whether it’s audio, video, or text — can be tough. Valuable insights get buried in formats that traditional search just can’t reach, leaving content isolated and hard to discover.

People expect more. They want to quickly uncover what matters most, no matter if it’s hidden in a podcast episode, tucked away in an article, or featured in a video. Meeting that need requires more than than keyword search, it requires real understanding.

At PressBox, we’re building a unified discovery layer. We’re harnessing semantic embeddings that allow us to connect content based on meaning and context, not just words. We’re doing this using a host of Cloudflare’s AI and data tools, to transform everything — text, audio, and video — into a common, discoverable format.

In this post, we’ll share how we process different types of media to extract meaningful text, generate and store embeddings using Cloudflare AI and Vectorize, and create a seamless discovery experience.

The Magic of Plain Text

To achieve this unified discovery layer, we built a smart backend using Cloudflare Queues that turns all forms of media into discoverable text. So no matter where the content comes from, we always end up with text, since that’s what we need to create an embedding. The steps to get there vary depending on the media type, but in the end, everything gets turned into an embedding that we can drop into our vector database.

Here’s what it looks like for each data type:

  1. Text (Topics & Articles): This one’s easy — we’re already working with text, so the main thing we do is generate summaries and tags based on the article’s content.

  2. Audio: For audio, we run it through a speech-to-text engine to turn it into text. After that, we treat it just like an article: we create summaries and tags from the transcription.

  3. Video (YouTube and Others): With video, we first pull out the audio, convert it to text using speech-to-text, and then — just like with everything else — we generate summaries and tags from the resulting text.

Once all the content types of been converted to text, Cloudflare makes the embedding creation and vector insertion easy. Here’s an example snippet for a video summary:

/**
 * Creates an embedding from a string and upserts it into Vectorize
 * @param env
 * @param textContent
 * @param vectorizeIndex
 */
async upsertEmbedding(env: Env, textContent: string, videoSummaryIndex): Promise<void> {
  const embedding = await createEmbedding(textContent, env);

  const inserted = await videoSummaryIndex.upsert([
    { values: embedding },
  ]);
  return inserted;
}

The createEmbedding function in this example is very important. You need to use the same embedding model across content to ensure consistency for retrieval. Cloudflare AI makes this easy as well:

/**
 * Create a vector embedding for a given text
 *
 * @param text
 * @param env
 */
export async function createEmbedding(text: string, env: Env) {
  const { data } = await env.AI.run("@cf/baai/bge-large-en-v1.5", {
    text: [text],
  });
  return data[0];
}

This consistency ensures that vectors from articles, videos, and audio all live in the same semantic space. The result: fans can search across all types of content and get truly relevant results, turning what used to be siloed media into a unified, intelligent knowledge base.

How We Use Embeddings for Discovery

Once we’ve turned all our different types of content into text embeddings and stored them in Cloudflare Vectorize, we can start finding interesting connections between them. This is where semantic search really shines.

At PressBox, we don’t use Vectorize just for storing embeddings — we also use it to quickly search for content that’s similar. For example, when someone’s looking at a piece of content, or when we want to recommend something related, we take the embedding for that content (the “source embedding”) and search Vectorize for other items that are close to it in meaning.

The key to figuring out what’s “close” is a math trick called cosine similarity. Basically, it measures how similar two vectors are by looking at the angle between them. If the vectors are pointing in almost the same direction, the score is close to 1, which means they’re very similar. If they’re pointing in totally different directions, the score is closer to 0.

Here’s a TypeScript example of what a cosine similarity function might look like:

/**
 * Calculate the cosine similarity between two vectors
 *
 * @param vectorA
 * @param vectorB
 */
export function cosineSimilarity(vectorA: number[], vectorB: number[]): number {
  const dotProduct = vectorA.reduce(
    (sum, _, i) => sum + vectorA[i] * vectorB[i],
    0,
  );
  const magnitudeA = Math.sqrt(
    vectorA.reduce((sum, val) => sum + val * val, 0),
  );
  const magnitudeB = Math.sqrt(
    vectorB.reduce((sum, val) => sum + val * val, 0),
  );
  // Ensure no division by zero
  if (magnitudeA === 0 || magnitudeB === 0) {
    return 0;
  }
  return dotProduct / (magnitudeA * magnitudeB);
}

And here’s a conceptual example, illustrating how we might use this to find related content:

/**
 * Finds content items with semantically similar embeddings to a reference item.
 * 
 * @param sourceEmbedding - The embedding vector of the reference content
 * @param contentEmbeddings - Array of content items with their embedding vectors
 * @param similarityThreshold - Only returns items with similarity scores at or above this value (0-1)
 * @returns Array of matching items with their similarity scores, sorted by relevance
 */
function findSimilarContent(
  sourceEmbedding: number[],
  contentEmbeddings: Array<{ id: string; values: number[] }>,
  similarityThreshold = 0.85
): Array<{ id: string; score: number }> {
  const similarItems: Array<{ id: string; score: number }> = [];

  for (const item of contentEmbeddings) {
    const similarity = cosineSimilarity(sourceEmbedding, item.values);
    
    if (similarity >= similarityThreshold) {
      similarItems.push({ id: item.id, score: similarity });
    }
  }
  
  // Sort by score, highest similarity first
  return similarItems.sort((a, b) => b.score - a.score);
}

Benefits of a Unified Multi-Modal Approach

This unified, embedding-driven architecture offers many advantages:

  • Enhanced Content Discoverability: We can proactively surface insights about a topic (like a specific athlete or event), whether they originate from an article, a video discussion, or podcast, dramatically increasing the chances of fans finding exactly what they need.

  • Increased Fan Engagement: We boost fan engagement by showing people content that’s actually relevant to what they’re already interested in. For example, if someone follows Stephen Curry, we can suggest a podcast episode and the specific timestamps where he’s mentioned. This helps fans dive deeper, stick around longer, and discover more of what they’re interested in.

  • Maximized Content Value: Every piece of content, regardless of its original format, becomes an accessible and searchable asset. This ensures that valuable information locked within video or audio is just as discoverable as our written articles, maximizing the return on our content creation efforts.

At PressBox, we’ve set out to make it easy for fans to discover content — whether it’s text, audio, or video — all in one place. There are a lot of services out there you could stitch together for this workflow, but it’s impactful having it all in one place with Cloudflare. We use Workers and Queues to coordinate everything behind the scenes, Workers AI to turn all our content into searchable text, and Vectorize to make finding similar content fast and seamless. The result is a media library where everything is connected and easy to search, no matter the original format.

This means people aren’t stuck searching by keywords alone. Instead, they can find what they’re looking for in a way that feels natural — surfaced across articles, videos, and audio all at once. We think this approach — bringing different types of content together and making them discoverable in the same way — is really powerful.

May 8, 2025