Vector embeddings: Difference between revisions

no edit summary
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 4: Line 4:


==Understanding Vector Embeddings==
==Understanding Vector Embeddings==
===Dog and Puppy Example===
In the context of text data, words with similar meanings, such as "dog" and "puppy", must be represented to capture their [[semantic similarity]]. [[Vector representation]]s achieve this by transforming data objects into arrays of real numbers with a fixed length, typically ranging from hundreds to thousands of elements. These arrays are generated by machine learning models through a process called [[vectorization]].
In the context of text data, words with similar meanings, such as "dog" and "puppy", must be represented to capture their [[semantic similarity]]. [[Vector representation]]s achieve this by transforming data objects into arrays of real numbers with a fixed length, typically ranging from hundreds to thousands of elements. These arrays are generated by machine learning models through a process called [[vectorization]].


Line 9: Line 10:


<poem style="border: 1px solid; padding: 1rem">
<poem style="border: 1px solid; padding: 1rem">
cat = [1.5, -0.4, 7.2, 19.6, 3.1, ..., 20.2]
dog = [1.5, -0.4, 7.2, 19.6, 3.1, ..., 20.2]
kitty = [1.5, -0.4, 7.2, 19.5, 3.2, ..., 20.8]
puppy = [1.5, -0.4, 7.2, 19.5, 3.2, ..., 20.8]
</poem>
</poem>


These vectors exhibit a high similarity, while vectors for words like "banjo" or "comedy" would not be similar to either of these. In this way, vector embeddings capture the semantic similarity of words. The specific meaning of each number in a vector depends on the machine learning model that generated the vectors, and is not always clear in terms of human understanding of language and meaning.
These vectors exhibit a high similarity, while vectors for words like "banjo" or "comedy" would not be similar to either of these. In this way, vector embeddings capture the semantic similarity of words. The specific meaning of each number in a vector depends on the machine learning model that generated the vectors, and is not always clear in terms of human understanding of language and meaning.


===King and Queen Example===
Vector-based representation of meaning has gained attention due to its ability to perform mathematical operations between words, revealing semantic relationships. A famous example is:
Vector-based representation of meaning has gained attention due to its ability to perform mathematical operations between words, revealing semantic relationships. A famous example is:


Line 23: Line 25:
This result suggests that the difference between "king" and "man" represents some sort of "royalty", which is analogously applicable to "queen" minus "woman". Various concepts, such as "woman", "girl", "boy", etc., can be vectorized into arrays of numbers, often referred to as dimensions. These arrays can be visualized and correlated to familiar words, giving insight into their meaning.
This result suggests that the difference between "king" and "man" represents some sort of "royalty", which is analogously applicable to "queen" minus "woman". Various concepts, such as "woman", "girl", "boy", etc., can be vectorized into arrays of numbers, often referred to as dimensions. These arrays can be visualized and correlated to familiar words, giving insight into their meaning.


===Example of 5 Words===
Objects (data): words such as kitty, puppy, orange, blueberry, structure, motorbike
Search term: fruit
A basic set of vector embeddings (limited to 5 dimensions) for the objects and the search term might appear as follows:
{| class="wikitable"
! Word !! Vector Embedding
|-
| kitty || [1.5, -0.4, 7.2, 19.6, 20.2]
|-
| puppy || [1.7, -0.3, 6.9, 19.1, 21.1]
|-
| orange || [-5.2, 3.1, 0.2, 8.1, 3.5]
|-
| blueberry || [-4.9, 3.6, 0.9, 7.8, 3.6]
|-
| strcuture || [60.1, -60.3, 10, -12.3, 9.2]
|-
| motorbike || [81.6, -72.1, 16, -20.2, 102]
|-
| fruit || [-5.1, 2.9, 0.8, 7.9, 3.1]
|}
Upon examining each of the 5 components of the vectors, it's evident that kitty and puppy are much closer than puppy and orange (we don't even have to determine the distances). Similarly, fruit is significantly closer to orange and blueberry compared to the other words, making them the top results for the "fruit" search.
===More Than Just Words===
Vector embeddings can represent more than just word meanings. They can effectively be generated from any data object, including [[text]], [[images]], [[audio]], [[time series data]], [[3D models]], [[video]], and [[molecules]]. Embeddings are constructed such that two objects with similar semantics have vectors that are "close" to each other in vector space, with a "small" distance between them.
Vector embeddings can represent more than just word meanings. They can effectively be generated from any data object, including [[text]], [[images]], [[audio]], [[time series data]], [[3D models]], [[video]], and [[molecules]]. Embeddings are constructed such that two objects with similar semantics have vectors that are "close" to each other in vector space, with a "small" distance between them.


Line 56: Line 86:


===Example: Image Embedding with a Convolutional Neural Network===
===Example: Image Embedding with a Convolutional Neural Network===
[[File:image embedding with cnn1.png|400px|right]]
In this example, raw images are represented as greyscale pixels, which correspond to a matrix of integer values ranging from 0 to 255, where 0 signifies black and 255 represents white. The matrix values define a vector embedding, with the first coordinate being the matrix's upper-left cell and the last coordinate corresponding to the lower-right matrix cell.
In this example, raw images are represented as greyscale pixels, which correspond to a matrix of integer values ranging from 0 to 255, where 0 signifies black and 255 represents white. The matrix values define a vector embedding, with the first coordinate being the matrix's upper-left cell and the last coordinate corresponding to the lower-right matrix cell.


While such embeddings effectively maintain the semantic information of a pixel's neighborhood in an image, they are highly sensitive to transformations like [[shifts]], [[scaling]], [[cropping]], and other [[image manipulation]] operations. Consequently, they are often used as raw inputs to learn more robust embeddings.
While such embeddings effectively maintain the semantic information of a pixel's neighborhood in an image, they are highly sensitive to transformations like [[shifts]], [[scaling]], [[cropping]], and other [[image manipulation]] operations. Consequently, they are often used as raw inputs to learn more robust embeddings.


A Convolutional Neural Network (CNN or ConvNet) is a class of deep learning architectures typically applied to visual data, transforming images into embeddings. CNNs process input through hierarchical small local sub-inputs known as receptive fields. Each neuron in each network layer processes a specific receptive field from the previous layer. Each layer either applies a convolution on the receptive field or reduces the input size through a process called subsampling.
A Convolutional Neural Network (CNN or [[ConvNet]]) is a class of [[deep learning architecture]]s typically applied to visual data, transforming images into embeddings. CNNs process input through hierarchical small local sub-inputs known as receptive fields. Each neuron in each network layer processes a specific receptive field from the previous layer. Each layer either applies a convolution on the receptive field or reduces the input size through a process called subsampling.


A typical CNN structure includes receptive fields as sub-squares in each layer, serving as input to a single [[neuron]] within the preceding layer. Subsampling operations reduce layer size, while convolution operations expand layer size. The resulting vector embedding is obtained through a fully connected layer.
A typical CNN structure includes receptive fields as sub-squares in each layer, serving as input to a single [[neuron]] within the preceding layer. Subsampling operations reduce layer size, while convolution operations expand layer size. The resulting vector embedding is obtained through a fully connected layer.
Line 74: Line 105:


Even if embeddings are not directly used for an application, many popular machine learning models and methods rely on them internally. For instance, in [[encoder-decoder architectures]], the embeddings generated by the encoder contain the required information for the decoder to produce a result. This architecture is widely employed in applications like [[machine translation]] and [[caption generation]].
Even if embeddings are not directly used for an application, many popular machine learning models and methods rely on them internally. For instance, in [[encoder-decoder architectures]], the embeddings generated by the encoder contain the required information for the decoder to produce a result. This architecture is widely employed in applications like [[machine translation]] and [[caption generation]].
===Products===
[[Vector database]]
==Explain {{PAGENAME}} Like I'm 5 (ELI5)==
Imagine you have a box of different toys like cars, dolls, and balls. Now, we want to sort these toys based on how similar they are. We can use something called "vector embedding" to help us with this. Vector embedding is like giving each toy a secret code made of numbers. Toys that are similar will have secret codes that are very close to each other, and toys that are not similar will have secret codes that are very different.
For example, let's say we have a red car, a blue car, and a doll. We can give them secret codes like this:
<poem style="border: 1px solid; padding: 1rem">
Red car: [1, 2, 3]
Blue car: [1, 2, 4]
Doll: [5, 6, 7]
</poem>
See how the red car and the blue car have secret codes that are very close to each other, while the doll has a different secret code? That's because the cars are more similar to each other than the doll.
Vector embedding can also be used for words, pictures, sounds, and many other things. It helps computers understand and sort these things by how similar they are, just like we sorted the toys!


[[Category:Terms]] [[Category:Artificial intelligence terms]]
[[Category:Terms]] [[Category:Artificial intelligence terms]]
370

edits