Tokens: Difference between revisions

118 bytes added ,  6 April 2023
No edit summary
 
(One intermediate revision by the same user not shown)
Line 25: Line 25:


*[https://platform.openai.com/tokenizer OpenAI's interactive Tokenizer tool]
*[https://platform.openai.com/tokenizer OpenAI's interactive Tokenizer tool]
*[[Tiktoken]], a fast BPE tokenizer specifically for OpenAI models
*[https://github.com/openai/tiktoken Tiktoken], a fast BPE tokenizer specifically for OpenAI models
*[[Transformers]] package for Python
*[[Transformers]] package for Python
*[https://www.npmjs.com/package/gpt-3-encoder gpt-3-encoder package for node.js]
*[https://www.npmjs.com/package/gpt-3-encoder gpt-3-encoder package for node.js]
Line 45: Line 45:
*Uppercase at the beginning of a sentence: "Red" (token: "7738")
*Uppercase at the beginning of a sentence: "Red" (token: "7738")
*The more likely or common a token is, the lower the token number assigned to it. For example, the token for the period ("13") remains consistent in all three sentences because its usage is similar throughout the corpus data.
*The more likely or common a token is, the lower the token number assigned to it. For example, the token for the period ("13") remains consistent in all three sentences because its usage is similar throughout the corpus data.
<gallery mode=packed>
File:tokens3.png
File:tokens2.png
File:tokens1.png
</gallery>


==Prompt Design and Token Knowledge==
==Prompt Design and Token Knowledge==
370

edits