Vector databases are specialized databases designed to store, manage, and search high-dimensional vectors—often used in machine learning, artificial intelligence, and especially in applications like:
-
Semantic search
-
Recommendation systems
-
Image, video, or audio similarity
-
Natural language processing (e.g., embeddings from models like BERT or OpenAI's models)
🧠 What is a "vector" in this context?
A vector is basically a list of numbers that represents data in a numerical format. For instance, a sentence can be turned into a vector using an embedding model, like OpenAI’s embedding models or word2vec.
Example vector:
These vectors are often hundreds or thousands of dimensions long.
⚙️ How vector databases work
They use Approximate Nearest Neighbor (ANN) algorithms to find similar vectors quickly. This is key when you're doing things like:
-
"Find the most similar document to this one"
-
"Which image looks closest to this?"
Popular ANN algorithms:
-
HNSW (Hierarchical Navigable Small World)
-
IVF (Inverted File Index)
-
PQ (Product Quantization)
🔥 Popular Vector Databases
-
Pinecone – Fully managed, scalable, simple to use
-
Weaviate – Open-source with built-in ML features
-
Milvus – High-performance and scalable, also open-source
-
FAISS (by Meta) – Library for similarity search (not a full database, but often used with others)
-
Qdrant – Open-source, supports filtering and metadata
-
Chroma – Lightweight and often used for LLM apps