Tokenizer.EncodeToTokens Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings) |
Encodes input text to a list of EncodedTokens. |
| EncodeToTokens(ReadOnlySpan<Char>, String, Boolean, Boolean) |
Encodes input text to a list of EncodedTokens. |
| EncodeToTokens(String, String, Boolean, Boolean) |
Encodes input text to a list of EncodedTokens. |
EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)
- Source:
- Tokenizer.cs
- Source:
- Tokenizer.cs
- Source:
- Tokenizer.cs
Encodes input text to a list of EncodedTokens.
protected abstract Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string? text, ReadOnlySpan<char> textSpan, Microsoft.ML.Tokenizers.EncodeSettings settings);
abstract member EncodeToTokens : string * ReadOnlySpan<char> * Microsoft.ML.Tokenizers.EncodeSettings -> Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken>
Protected MustOverride Function EncodeToTokens (text As String, textSpan As ReadOnlySpan(Of Char), settings As EncodeSettings) As EncodeResults(Of EncodedToken)
Parameters
- text
- String
The text to encode.
- textSpan
- ReadOnlySpan<Char>
The span of the text to encode which will be used if the text is null.
- settings
- EncodeSettings
The settings used to encode the text.
Returns
Applies to
EncodeToTokens(ReadOnlySpan<Char>, String, Boolean, Boolean)
- Source:
- Tokenizer.cs
- Source:
- Tokenizer.cs
- Source:
- Tokenizer.cs
Encodes input text to a list of EncodedTokens.
public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(ReadOnlySpan<char> text, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);
member this.EncodeToTokens : ReadOnlySpan<char> * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>
Public Function EncodeToTokens (text As ReadOnlySpan(Of Char), ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- normalizedText
- String
If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded EncodedTokens.
Applies to
EncodeToTokens(String, String, Boolean, Boolean)
- Source:
- Tokenizer.cs
- Source:
- Tokenizer.cs
- Source:
- Tokenizer.cs
Encodes input text to a list of EncodedTokens.
public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string text, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);
member this.EncodeToTokens : string * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>
Public Function EncodeToTokens (text As String, ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)
Parameters
- text
- String
The text to encode.
- normalizedText
- String
If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded EncodedTokens.