Udostępnij przez


Tokenizer.EncodeToTokens Method

Definition

Overloads

EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)

Source:
Tokenizer.cs
Source:
Tokenizer.cs
Source:
Tokenizer.cs

Encodes input text to a list of EncodedTokens.

protected abstract Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string? text, ReadOnlySpan<char> textSpan, Microsoft.ML.Tokenizers.EncodeSettings settings);
abstract member EncodeToTokens : string * ReadOnlySpan<char> * Microsoft.ML.Tokenizers.EncodeSettings -> Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken>
Protected MustOverride Function EncodeToTokens (text As String, textSpan As ReadOnlySpan(Of Char), settings As EncodeSettings) As EncodeResults(Of EncodedToken)

Parameters

text
String

The text to encode.

textSpan
ReadOnlySpan<Char>

The span of the text to encode which will be used if the text is null.

settings
EncodeSettings

The settings used to encode the text.

Returns

Applies to

EncodeToTokens(ReadOnlySpan<Char>, String, Boolean, Boolean)

Source:
Tokenizer.cs
Source:
Tokenizer.cs
Source:
Tokenizer.cs

Encodes input text to a list of EncodedTokens.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(ReadOnlySpan<char> text, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);
member this.EncodeToTokens : ReadOnlySpan<char> * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>
Public Function EncodeToTokens (text As ReadOnlySpan(Of Char), ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)

Parameters

text
ReadOnlySpan<Char>

The text to encode.

normalizedText
String

If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.

considerPreTokenization
Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization
Boolean

Indicate whether to consider normalization before tokenization.

Returns

The list of encoded EncodedTokens.

Applies to

EncodeToTokens(String, String, Boolean, Boolean)

Source:
Tokenizer.cs
Source:
Tokenizer.cs
Source:
Tokenizer.cs

Encodes input text to a list of EncodedTokens.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string text, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);
member this.EncodeToTokens : string * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>
Public Function EncodeToTokens (text As String, ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)

Parameters

text
String

The text to encode.

normalizedText
String

If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.

considerPreTokenization
Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization
Boolean

Indicate whether to consider normalization before tokenization.

Returns

The list of encoded EncodedTokens.

Applies to