Build Semantic Search with OpenAI and Weaviate
Semantic search is rapidly replacing traditional keyword-based search across enterprise applications, and combining OpenAI's embedding models with the Weaviate vector database offers one of the most accessible paths to building production-ready semantic search systems. This tutorial walks developers through the entire implementation pipeline — from generating vector embeddings to querying semantically similar results in milliseconds.
Unlike conventional search engines that match exact keywords, semantic search understands the meaning behind queries. A user searching for 'how to fix a broken screen' will surface results about 'display repair guides' even if those exact words never appear in the content.
Key Takeaways Before You Start
- OpenAI's text-embedding-3-small model costs just $0.02 per 1 million tokens — making it one of the most affordable embedding options available
- Weaviate is an open-source vector database that supports hybrid search combining vector and keyword matching
- The full implementation requires fewer than 150 lines of Python code
- Semantic search delivers 40-60% better relevance compared to traditional keyword search in most benchmarks
- Weaviate offers both self-hosted (Docker) and managed cloud deployment options
- This stack scales comfortably to millions of documents with proper indexing
Setting Up Your Development Environment
Before writing any code, you need to install the required dependencies and configure API access. The stack relies on 3 core components: Python 3.9+, the OpenAI Python SDK, and the Weaviate client library.
Run the following installation commands:
pip install openai— OpenAI's official Python client (v1.0+)pip install weaviate-client— Weaviate's Python client (v4.0+)pip install python-dotenv— for managing environment variables securelydocker pull semitechnologies/weaviate— to run Weaviate locally
Store your OpenAI API key in a .env file rather than hardcoding it. You can obtain an API key from OpenAI's platform dashboard at platform.openai.com for as little as $5 in prepaid credits.
Launch Weaviate locally using Docker Compose. Weaviate's default configuration listens on port 8080 and includes a built-in REST API alongside the gRPC interface used by the Python client. For production deployments, Weaviate Cloud Services (WCS) offers a managed alternative starting at $0 for sandbox environments.
Creating Your Weaviate Schema and Collection
Weaviate organizes data into collections (previously called 'classes'). Each collection defines the structure of your searchable objects, including which properties to store and how vectors should be indexed.
Start by connecting to your local Weaviate instance. The v4 Python client uses a connect_to_local() method that handles connection pooling automatically. For WCS deployments, use connect_to_weaviate_cloud() with your cluster URL and API key.
Define a collection named 'Article' with properties like title, body, and source. The critical configuration here is the vectorizer setting. While Weaviate supports built-in vectorizer modules, this tutorial uses a 'none' vectorizer because we generate embeddings externally through OpenAI's API. This approach gives you full control over the embedding process.
Set the vector index type to HNSW (Hierarchical Navigable Small World), which is Weaviate's default and most battle-tested indexing algorithm. HNSW delivers sub-millisecond query times even at millions of vectors, compared to brute-force search that slows linearly with dataset size.
Generating Embeddings with OpenAI's API
The embedding step converts raw text into dense numerical vectors that capture semantic meaning. OpenAI currently offers 3 embedding models, and the choice matters significantly for both cost and performance.
- text-embedding-3-small: 1,536 dimensions, $0.02/1M tokens — best for most use cases
- text-embedding-3-large: 3,072 dimensions, $0.13/1M tokens — higher accuracy for complex domains
- text-embedding-ada-002: 1,536 dimensions, $0.10/1M tokens — legacy model, now largely superseded
The text-embedding-3-small model offers the best price-to-performance ratio for semantic search. It outperforms the older ada-002 model on standard benchmarks while costing 80% less.
To generate embeddings, call client.embeddings.create() with your text and model name. The API returns a list of embedding objects, each containing a float array. For batch processing, you can send up to 2,048 text inputs in a single API call, dramatically reducing HTTP overhead.
One critical optimization: chunk your documents before embedding. OpenAI's embedding models have a context window of 8,191 tokens, but shorter chunks (200-500 tokens) typically produce more focused and accurate embeddings. Use overlapping chunks of approximately 50 tokens to preserve context across boundaries.
Inserting Data into Weaviate
With embeddings generated, the next step is inserting them alongside your original text into Weaviate. The v4 client provides a batch import API that dramatically accelerates ingestion compared to individual inserts.
Open a batch context using collection.batch.dynamic(). For each document, create a data object containing your text properties and pass the corresponding embedding vector. Weaviate automatically indexes the vector using the HNSW algorithm configured earlier.
Batch import rates typically reach 5,000-10,000 objects per second on modern hardware. For datasets exceeding 100,000 documents, consider these optimizations:
- Increase the batch size to 200 objects per request
- Disable automatic schema validation during bulk imports
- Use gRPC instead of REST for the client connection (enabled by default in v4)
- Monitor memory usage — HNSW indexes are memory-resident by default
After import completes, verify your data by running a simple object count query. Weaviate's aggregate endpoint returns collection-level statistics without scanning individual objects.
Querying for Semantically Similar Results
This is where the magic happens. To perform a semantic search, you embed the user's query using the same OpenAI model and pass the resulting vector to Weaviate's near_vector search.
The query workflow follows 3 steps. First, generate an embedding for the search query text. Second, call collection.query.near_vector() with the query embedding. Third, process the returned objects ranked by cosine similarity.
Weaviate returns a distance score with each result, where lower values indicate higher similarity. For text-embedding-3-small, distances below 0.3 typically represent strong semantic matches, while values above 0.7 indicate weak relevance.
You can also leverage hybrid search, which combines vector similarity with BM25 keyword matching. This approach catches edge cases where semantic search alone might miss exact terminology — particularly useful in legal, medical, or technical domains. Set the alpha parameter between 0 and 1 to control the balance: 0.75 favors vector search while 0.25 favors keyword matching.
Adding Filters and Metadata to Refine Results
Real-world search applications rarely rely on pure vector similarity. Users expect filtering by date, category, source, or other metadata. Weaviate supports pre-filtering, which applies metadata filters before the vector search — ensuring accurate result counts without sacrificing performance.
Use the Filter class from the Weaviate client to construct conditions. Common patterns include:
- Filter by date range:
Filter.by_property('published_date').greater_than('2024-01-01') - Filter by category:
Filter.by_property('category').equal('technology') - Combine multiple filters with
&(AND) or|(OR) operators - Filter by cross-reference properties for relational data
Pre-filtered vector search in Weaviate maintains sub-10ms latency for most workloads, compared to post-filtering approaches that waste compute on irrelevant vectors.
Performance Tuning for Production Deployments
Moving from prototype to production requires attention to several performance dimensions. HNSW tuning is the highest-impact optimization — the efConstruction parameter (default 128) controls index build quality, while ef (default -1, auto) controls query-time accuracy.
For datasets under 1 million vectors, the defaults work well. Beyond that threshold, consider increasing efConstruction to 256 and setting ef to 128 for a balanced accuracy-speed tradeoff.
Caching OpenAI embeddings is essential for controlling costs. Store generated embeddings in a local cache (Redis works well) to avoid re-embedding identical queries. At $0.02 per million tokens, costs are modest — but high-traffic applications serving 10 million queries monthly could see embedding costs reach $200+ without caching.
Monitor Weaviate's built-in metrics endpoint at /v1/metrics for latency percentiles, memory usage, and index health. Set alerts for p99 latency exceeding 50ms and memory utilization above 80%.
What This Means for Developers and Businesses
Semantic search is no longer experimental. The combination of affordable OpenAI embeddings and purpose-built vector databases like Weaviate makes it accessible to teams of any size. Compared to Elasticsearch's dense vector support or Pinecone's managed offering, the OpenAI-Weaviate stack stands out for its flexibility and transparent pricing.
E-commerce platforms using this approach report 25-35% improvements in search conversion rates. Internal knowledge bases see support ticket volumes drop as employees find answers faster. Content platforms surface more relevant recommendations, increasing engagement metrics.
Looking Ahead: The Evolving Search Stack
OpenAI continues to improve embedding model quality while reducing costs — the jump from ada-002 to text-embedding-3-small represented an 80% price reduction with better accuracy. Expect further improvements in 2025 as competition from Cohere, Google, and open-source models like Nomic Embed intensifies.
Weaviate's roadmap includes GPU-accelerated indexing and native support for multi-modal embeddings, enabling search across text, images, and audio simultaneously. The 1.27 release already introduced quantized vectors that reduce memory usage by 75% with minimal accuracy loss.
For developers starting today, this OpenAI-Weaviate stack provides a solid foundation that scales from prototype to production. The total cost for a modest deployment — 500,000 documents, 10,000 daily queries — runs under $50 per month including both embedding generation and Weaviate hosting.
📌 Source: GogoAI News (www.gogoai.xin)
🔗 Original: https://www.gogoai.xin/article/build-semantic-search-with-openai-and-weaviate
⚠️ Please credit GogoAI when republishing.