SentencePieceTokenizer.GetIndexByTokenCountFromEnd Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| GetIndexByTokenCountFromEnd(ReadOnlySpan<Char>, Boolean, Boolean, Int32, Boolean, String, Int32) |
Find the index of the maximum encoding capacity from the end within the text without surpassing the token limit. |
| GetIndexByTokenCountFromEnd(String, Boolean, Boolean, Int32, Boolean, String, Int32) |
Find the index of the maximum encoding capacity from the end within the text without surpassing the token limit. |
GetIndexByTokenCountFromEnd(ReadOnlySpan<Char>, Boolean, Boolean, Int32, Boolean, String, Int32)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Find the index of the maximum encoding capacity from the end within the text without surpassing the token limit.
public int GetIndexByTokenCountFromEnd(ReadOnlySpan<char> text, bool addBeginningOfSentence, bool addEndOfSentence, int maxTokenCount, bool considerNormalization, out string? normalizedText, out int tokenCount);
override this.GetIndexByTokenCountFromEnd : ReadOnlySpan<char> * bool * bool * int * bool * string * int -> int
Public Function GetIndexByTokenCountFromEnd (text As ReadOnlySpan(Of Char), addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, maxTokenCount As Integer, considerNormalization As Boolean, ByRef normalizedText As String, ByRef tokenCount As Integer) As Integer
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- addBeginningOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- maxTokenCount
- Int32
The maximum token count to limit the encoding capacity.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
- normalizedText
- String
If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.
- tokenCount
- Int32
The token count can be generated which should be smaller than the maximum token count.
Returns
The start index of the maximum encoding capacity within the processed text without surpassing the token limit.
It represents the index at the first character to be included. In cases where no tokens fit, the result will be length of the normalizedText; conversely, if all tokens fit, the result will be 0.
Applies to
GetIndexByTokenCountFromEnd(String, Boolean, Boolean, Int32, Boolean, String, Int32)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Find the index of the maximum encoding capacity from the end within the text without surpassing the token limit.
public int GetIndexByTokenCountFromEnd(string text, bool addBeginningOfSentence, bool addEndOfSentence, int maxTokenCount, bool considerNormalization, out string? normalizedText, out int tokenCount);
override this.GetIndexByTokenCountFromEnd : string * bool * bool * int * bool * string * int -> int
Public Function GetIndexByTokenCountFromEnd (text As String, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, maxTokenCount As Integer, considerNormalization As Boolean, ByRef normalizedText As String, ByRef tokenCount As Integer) As Integer
Parameters
- text
- String
The text to encode.
- addBeginningOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- maxTokenCount
- Int32
The maximum token count to limit the encoding capacity.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
- normalizedText
- String
If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.
- tokenCount
- Int32
The token count can be generated which should be smaller than the maximum token count.
Returns
The start index of the maximum encoding capacity within the processed text without surpassing the token limit.
It represents the index at the first character to be included. In cases where no tokens fit, the result will be length of the normalizedText; conversely, if all tokens fit, the result will be 0.