Important things

302 http response with Location header for url redirection(GET and Head) - 307 for temporary redirection ,==> Spring Sleuth - tracing in microservices, ==> https://astikanand.github.io/techblogs/high-level-system-design/design-bookmyshow, https://www.hellointerview.com/learn/system-design/in-a-hurry/introduction

Saturday, 26 April 2025

Vector Database

 

Vector databases are specialized databases designed to store, manage, and search high-dimensional vectors—often used in machine learning, artificial intelligence, and especially in applications like:

  • Semantic search

  • Recommendation systems

  • Image, video, or audio similarity

  • Natural language processing (e.g., embeddings from models like BERT or OpenAI's models)

🧠 What is a "vector" in this context?

A vector is basically a list of numbers that represents data in a numerical format. For instance, a sentence can be turned into a vector using an embedding model, like OpenAI’s embedding models or word2vec.

Example vector:

[0.21, -0.53, 0.88, ..., 0.05]

These vectors are often hundreds or thousands of dimensions long.


⚙️ How vector databases work

They use Approximate Nearest Neighbor (ANN) algorithms to find similar vectors quickly. This is key when you're doing things like:

  • "Find the most similar document to this one"

  • "Which image looks closest to this?"

Popular ANN algorithms:

  • HNSW (Hierarchical Navigable Small World)

  • IVF (Inverted File Index)

  • PQ (Product Quantization)


🔥 Popular Vector Databases

  • Pinecone – Fully managed, scalable, simple to use

  • Weaviate – Open-source with built-in ML features

  • Milvus – High-performance and scalable, also open-source

  • FAISS (by Meta) – Library for similarity search (not a full database, but often used with others)

  • Qdrant – Open-source, supports filtering and metadata

  • Chroma – Lightweight and often used for LLM apps