What are vector embeddings?
Vector embeddings are numeric representations of text that make it possible to relate semantic similarity between chunks of text to the proximity of the embeddings in vector space.
What is semantic similarity in the context of machine learning?
Semantics (noun): “the branch of linguistics and logic concerned with meaning.”1
“Semantics is the study of linguistic meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts.”2
From the Pinecone article Vector Embeddings : representing text as vectors “makes it possible to translate semantic similarity, as perceived by humans to proximity in vector space.” (emphasis mine.) But the article links to a Wikipedia definition of semantic similarity : “a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content….”3 (emphasis mine.) The point I’m making is that it’s not clear if the terms meaning and semantic similarity when used in the context of machine learning are referring to a metric or the human sense of the meaning of text.
There’s something interesting going on with the definitions. See notes/ The Meaning of Meaning (book).
I’ll stick with the normal human understanding of semantic similarity — how similar the meaning (in the non-academic sense) is between items of text, audio, video, etc.
Vector embeddings allow for comparison of the meaning of objects
Vector embeddings are numeric representations of objects, with objects being things along the lines of text, audio, video, etc. Vector embeddings are numeric representations of objects. The relatedness of meaning of objects is inferred from how close together the objects are in vector space.
What tasks are vector embeddings used for?
Vector embeddings are used for search, clustering, recommendation, classification….
Recommendation and semantic search may be the same task. In the context of search, the system recommends results that are semantically similar (in the mathematical sense) to the search query.
How are vector embeddings created?
Feature engineering
“in medical imaging, we use medical expertise to quantify a set of features such as shape, color, and regions in an image that capture the semantics.”4 Feature engineering requires domain knowledge and is difficult to scale.
Train models to translate objects to vectors
See notes / Differences between GPT and embedding models.
Images can be embedded using CNNs. Audio can be transformed into vectors using image embedding transformations over visual representations of audio frequencies (spectrogram).5 I wonder how much has changed in this area since 2023.
Resources
Tripathi, Rajat (Pinecone). “What are Vector Embeddings.” June 30, 2023. https://www.pinecone.io/learn/vector-embeddings/ .
Wikipedia Contributors. “Word embedding.” Last edit: November 19, 2025. https://en.wikipedia.org/wiki/Word_embedding .
Wikipedia Contributors. “Semantics.” Last edit: January 7, 2026. https://en.wikipedia.org/wiki/Semantics .
-
OxfordLanguages, “semantics”. ↩︎
-
Wikipedia Contributors, “Semantics,” Last edit: January 7, 2026, https://en.wikipedia.org/wiki/Semantics . ↩︎
-
Wikipedia Contributors, “Semantic similarity,” last edit: October 3, 2025, https://en.wikipedia.org/wiki/Semantic_similarity . ↩︎
-
Rajat Tripathi, “What are Vector Embeddings,” June 30, 2023, https://www.pinecone.io/learn/vector-embeddings/ . ↩︎
-
Rajat Tripathi, “What are Vector Embeddings,” June 30, 2023, https://www.pinecone.io/learn/vector-embeddings/ . ↩︎