Algorithms, Data Structure, System Design Problems: April 2025

Saturday, 26 April 2025

Vector Database

Vector databases are specialized databases designed to store, manage, and search high-dimensional vectors—often used in machine learning, artificial intelligence, and especially in applications like:

Semantic search
Recommendation systems
Image, video, or audio similarity
Natural language processing (e.g., embeddings from models like BERT or OpenAI's models)

🧠 What is a "vector" in this context?

A vector is basically a list of numbers that represents data in a numerical format. For instance, a sentence can be turned into a vector using an embedding model, like OpenAI’s embedding models or word2vec.

Example vector:

[0.21, -0.53, 0.88, ..., 0.05]

These vectors are often hundreds or thousands of dimensions long.

⚙️ How vector databases work

They use Approximate Nearest Neighbor (ANN) algorithms to find similar vectors quickly. This is key when you're doing things like:

"Find the most similar document to this one"
"Which image looks closest to this?"

Popular ANN algorithms:

HNSW (Hierarchical Navigable Small World)
IVF (Inverted File Index)
PQ (Product Quantization)

🔥 Popular Vector Databases

Pinecone – Fully managed, scalable, simple to use
Weaviate – Open-source with built-in ML features
Milvus – High-performance and scalable, also open-source
FAISS (by Meta) – Library for similarity search (not a full database, but often used with others)
Qdrant – Open-source, supports filtering and metadata
Chroma – Lightweight and often used for LLM apps

Important things

Saturday, 26 April 2025

Vector Database

🧠 What is a "vector" in this context?

⚙️ How vector databases work

🔥 Popular Vector Databases

Database