how do computers understand meaning?
embeddings explained 🧮
dog vs. puppy have a similar meaning, but dog vs. car don’t. we know that intuitively, but words are just text without any meaning to computers.
but for modern llms and a lot of text-based applications (autofill/autocorrect, etc.), it’s necessary for computers to compare and reason about language.
this is done through embeddings, which is a way to turn text (or images) into numbers that capture meaning. more specifically, each piece of text is mapped to a list of numbers (vector), and words with similar meaning have similar numbers.
let’s say every word is a point in space. similar words are closer together, while different words are farther apart in this space. so, something like “cat” would be near “kitten” but far away from “chair.”
that location in space is the embedding
“cat” → [0.21, -0.44, 0.89 ...]
“kitten” → [0.18, -0.42, 0.87 ...]
“chair” → [-0.67, 0.11, 0.07 ...]
these numbers are learned by a model by observing patterns of how words appear in context from massive training data.
this is used in all sorts of ai tasks like semantic search (not just keyword matching), chatbots, and recommendations.
for example, with semantic search, a user’s query is converted via embeddings, then the documents are converted with embeddings. the distances are then compared, and the closest matches are returned.
so technically, embeddings don’t exactly understand meaning. instead, they encode patterns of similarity based on data.
today’s drops 🔎
check out the generator residency for ai safety ($6k/mo stipend)
technology internship @ accenture
20 days left of this $10,000 ai hackathon

