Use Cases & Examples
This page demonstrates practical implementations and use cases for the Open Embeddings specification.
Real-World Use Case: The Self-Improvement Video Pipeline
The Challenge: Building a script to collect youtube/vimeo/tiktok self-improvement videos for “the path to my best self” without overwhelming a laptop with embedding computation.
The Open Embeddings Solution:
- Video platforms expose embeddings via
open-embeddings.json - Your script encodes the query “self-improvement techniques”
- Calculate similarity against pre-computed embeddings
- Receive ranked, relevant videos for further screening
Benefits:
- No need to download every video to check relevance
- Reduced bandwidth and compute costs
- Publisher-agnostic discovery across platforms
Use Cases - Walk the modality ladder
Static Content Sites
As I publish content, I want to make searching it easier. This will require adding plugins to docusaurus, jekyll, and other static site generators to generate the open-embeddings.json file automatically . We’ll also want to provide support for eschewing that work to a remote GPU, as we can’t guarantee the local user has the compute power to generate embeddings.
Images
Google DeepMind just published the whole earth dataset with embeddings. How these can be queried dynamically has two problems:
- Intent - they were intended to be consumed by an engine, but that may not be why I’m asking
- Model access - I may not have access to the model that generated the embeddings, so I need a way to transform between embedding spaces.
Podcasts and Audio Content
Podcast publishers may want to expose both the transcript embeddings and the audio embeddings by major area.
Youtube, Vimeo, TikTok, and Other Video Platforms
Publisher discretion may want to expose embeddings of the audio transcript, the characters and time-indexes, it is really at the discretion of the publisher. The key is the fileformat (see RFC)