Jump to content

Vector database: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 8: Line 8:
In a [[relational database]], data is organized in rows and columns, while in a [[document database]], it is organized in documents and collections. In contrast, a vector database stores arrays of numbers that are clustered based on [[similarity]]. These databases can be queried with [[ultra-low latency]], making them ideal for AI-driven applications.
In a [[relational database]], data is organized in rows and columns, while in a [[document database]], it is organized in documents and collections. In contrast, a vector database stores arrays of numbers that are clustered based on [[similarity]]. These databases can be queried with [[ultra-low latency]], making them ideal for AI-driven applications.


==Vector Database Solutions==
==Vector Database Products==
Several vector databases have emerged to cater to the growing demand for AI applications. Some of the popular native vector databases include open-source options like [[Weaviate]] and [[Milvus]], both written in Go. [[Pinecone]] is another popular vector database, although it is not open source. [[Chroma]], based on Clickhouse, is an open-source project with a growing following. Relational databases like Postgres have tools like pgVector, and Redis has first-class vector support to accommodate this type of functionality.
Several vector databases have emerged to cater to the growing demand for AI applications. Some of the popular native vector databases include open-source options like [[Weaviate]] and [[Milvus]], both written in Go. [[Pinecone]] is another popular vector database, although it is not open source. [[Chroma]], based on Clickhouse, is an open-source project with a growing following. Relational databases like [[Postgres]] have tools like [[pgVector]], and [[Redis]] has first-class vector support to accommodate this type of functionality.


==Using Vector Databases with Large Language Models==
==Using Vector Databases with Large Language Models==
One of the primary reasons for the increasing popularity of vector databases is their ability to extend large language models (LLMs) with long-term memory. By providing a general-purpose model, such as OpenAI's GPT-4, Meta's LLMA, or Google's Lambda, users can store their own data in a vector database. When prompted, the database can query relevant documents to update the context, customizing the final response and providing the AI with long-term memory.
One of the primary reasons for the increasing popularity of vector databases is their ability to extend large language models (LLMs) with long-term memory. By providing a general-purpose model, such as [[OpenAI]]'s [[GPT-4]], [[Meta]]'s [[LLaMA]], or [[Google]]'s [[LaMDA]], users can store their own data in a vector database. When [[prompt]]ed, the database can query relevant documents to update the context, customizing the final response and providing the AI with long-term memory.


In addition, vector databases can integrate with tools like LangChain, which combine multiple LLMs together for more advanced applications.
In addition, vector databases can integrate with tools like [[LangChain]], which combine multiple LLMs together for more advanced applications.


==Example Code==
==Example Code==
To demonstrate the usage of a vector database, the following example shows how to use Chroma with JavaScript. First, create the client and define an embedding function. In this case, the OpenAI API is used to update the embeddings whenever a new data point is added. Each data point is a document with an ID and some text. The database can be queried by passing a string of text, and the result includes the data along with an array of distances, where smaller numbers indicate higher degrees of similarity.
To demonstrate the usage of a vector database, the following example shows how to use [[Chroma]] with [[JavaScript]]. First, create the client and define an [[embedding function]]. In this case, the [[OpenAI API]] is used to update the embeddings whenever a new data point is added. Each data point is a document with an ID and some text. The database can be queried by passing a string of text, and the result includes the data along with an array of distances, where smaller numbers indicate higher degrees of similarity.
 
==Related Developments==
The growing interest in vector databases can be seen in the top trending repositories on GitHub, with many of them focusing on creating artificial general intelligence. Examples include Microsoft's Jarvis, AutoGPT, and BabyAGI, which are tools that use vector databases and LLMs to prompt themselves and expand their capabilities.
370

edits