Project Charter & Roadmap

Mission Statement

Open Embeddings aims to create a sustainable, standardized way to lower the barrier to entry for new AI agents, scripts, and services while reducing the pressure on content providers from repeated re-embedding operations across multiple models.

Vision

We envision an internet where content discovery is AI-native by default, where:

  • Content providers can expose their material efficiently without bandwidth waste
  • Developers can build semantic search applications without re-embedding costs
  • End users can discover relevant content across platforms and models seamlessly
  • The open internet remains accessible and avoids capture behind walled gardens

Project Goals

When you leave this presentation, you will hopefully understand the thesis and some of you will further:

  1. Define a great Open Format / Spec for content-providers to leverage multiple models
  2. Update commonly used content publishing tools to support the format securely
  3. Generate a corpus of distributed cross-space materials to allow transitions between closed models and open model encoded data

Organizational Structure

Non-Profit Foundation

  • Run as a non-profit organization funded through donations and open-source grants
  • Best-in-class administrative fee structure for any paid work
  • Community volunteer-driven development for core functionality
  • Seek partnerships with organizations sharing our vision of an open, accessible internet

Target Partners

  • Mozilla and Google - Initial outreach for website and RFC development support
  • Tech companies committed to open standards
  • Educational institutions researching AI and semantic web technologies
  • Non-profits focused on open-source software and AI ethics

Development Roadmap

Phase 1: Foundation (Q1 2024)

Status: In Progress

  • Define core problem statement and solution approach
  • Create project documentation and website
  • Draft initial RFC specification
  • Build reference parsers in Python and JavaScript
  • Establish community feedback channels
  • Create validation tools and testing frameworks

Success Metrics:

  • RFC draft complete and published
  • Reference implementations available
  • Community engagement initiated

Phase 2: Adoption (Q2-Q3 2024)

Status: Planning

  • Integrate with popular CMS platforms (WordPress, Drupal, Jekyll)
  • Partner with content creators for pilot implementations
  • Develop browser extensions and developer tools
  • Create interactive demos and educational content
  • Submit to standards bodies for formal review

Success Metrics:

  • 10+ websites implementing Open Embeddings
  • CMS plugin ecosystem established
  • Standards body engagement initiated

Phase 3: Scale (Q4 2024)

Status: Future

  • Address hard implementation problems (model sprawl, cache invalidation)
  • Research cross-model embedding transformation frameworks
  • Develop enterprise-grade security and performance features
  • Create distributed embedding network protocols
  • Establish certification and compliance programs

Success Metrics:

  • 100+ websites using Open Embeddings
  • Academic research partnerships established
  • Industry adoption by major platforms

Technical Priorities

Hard Implementation Problems

  1. Model Sprawl
    • Research recent academic work on embedding space transformations
    • Develop framework for multi-modal model compatibility
    • Create standardized APIs for model conversion
  2. Cache-Invalidation
    • Design trust mechanisms for embedding freshness
    • Implement content change detection systems
    • Develop distributed validation networks
  3. Performance Optimization
    • Compression algorithms for embedding vectors
    • Pagination strategies for large content sets
    • CDN integration and caching strategies
  4. Security & Privacy
    • Prevent sensitive information leakage in embeddings
    • Access control mechanisms for private content
    • Audit trails and compliance monitoring

Community Engagement

Call to Action Hooks

  • Participate on draft spec - Review and contribute to RFC development
  • Build POCs - Create proof-of-concept implementations
  • Share your use cases - Help us understand real-world requirements
  • Contribute to the project - Join development, documentation, or outreach efforts

Competitive Landscape

Currently, no other known groups are pushing in this direction. Our strategy:

  • Encourage collaboration - Invite similar initiatives to join forces
  • Open development - Transparent, community-driven specification process
  • Broad participation - Welcome input from all stakeholders

Sustainability Model

Funding Sources

  • Individual donations from community members
  • Open-source grants from foundations and tech companies
  • Partnership agreements with aligned organizations
  • Revenue from optional certification and compliance services

Cost Management

  • Volunteer-driven core development
  • Minimal infrastructure costs (static site hosting, basic tooling)
  • Community-contributed documentation and examples
  • Grant funding for major development initiatives

Success Indicators

Short-term (6 months)

  • RFC specification stabilized and published
  • Reference implementations available and tested
  • Active community of 50+ contributors and users
  • Partnership agreements with 2+ major organizations

Medium-term (1 year)

  • 100+ websites implementing Open Embeddings
  • Integration with major CMS platforms
  • Academic research partnerships established
  • Standards body submission completed

Long-term (2+ years)

  • Widespread adoption across the web
  • Reduction in content re-embedding costs industry-wide
  • Thriving ecosystem of tools and services
  • Measurable impact on open internet accessibility

Get Involved

Ready to help shape the future of AI-native content discovery?

Together, we can ensure the spice continues to flow freely across the open internet.