CodeGenTokenizer.EncodeToTokens Method

Definition

Namespace:: Microsoft.ML.Tokenizers

Assembly:: Microsoft.ML.Tokenizers.dll

Package:: Microsoft.ML.Tokenizers v1.0.1

Package:: Microsoft.ML.Tokenizers v0.22.0

Package:: Microsoft.ML.Tokenizers v2.0.0-preview.1.25125.4

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Overloads

EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)	Encodes input text to a list of EncodedTokens.
EncodeToTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, String, Boolean, Boolean)	Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.
EncodeToTokens(String, Boolean, Boolean, Boolean, String, Boolean, Boolean)	Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)

Source:: CodeGenTokenizer.cs

Source:: CodeGenTokenizer.cs

Source:: CodeGenTokenizer.cs

Encodes input text to a list of EncodedTokens.

protected override Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string? text, ReadOnlySpan<char> textSpan, Microsoft.ML.Tokenizers.EncodeSettings settings);

override this.EncodeToTokens : string * ReadOnlySpan<char> * Microsoft.ML.Tokenizers.EncodeSettings -> Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken>

Protected Overrides Function EncodeToTokens (text As String, textSpan As ReadOnlySpan(Of Char), settings As EncodeSettings) As EncodeResults(Of EncodedToken)

Parameters

text: String

The text to encode.

textSpan: ReadOnlySpan<Char>

The span of the text to encode which will be used if the text is null.

settings: EncodeSettings

The settings used to encode the text.

Returns

EncodeResults<EncodedToken>

Applies to

EncodeToTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Source:: CodeGenTokenizer.cs

Source:: CodeGenTokenizer.cs

Source:: CodeGenTokenizer.cs

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(ReadOnlySpan<char> text, bool addPrefixSpace, bool addBeginningOfSentence, bool addEndOfSentence, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);

override this.EncodeToTokens : ReadOnlySpan<char> * bool * bool * bool * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>

Public Function EncodeToTokens (text As ReadOnlySpan(Of Char), addPrefixSpace As Boolean, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)

Parameters

text: ReadOnlySpan<Char>

The text to encode.

addPrefixSpace: Boolean

Indicate whether to include a leading space before encoding the text.

addBeginningOfSentence: Boolean

Indicate whether to include the beginning of sentence token in the encoding.

addEndOfSentence: Boolean

Indicate whether to include the end of sentence token in the encoding.

normalizedText: String

If the tokenizer's normalization is enabled, the input text will be represented in its normalization form; otherwise, it will null.

considerPreTokenization: Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization: Boolean

Indicate whether to consider normalization before tokenization.

Returns

IReadOnlyList<EncodedToken>

The tokenization result includes the tokens list, tokens Ids, tokens offset mapping.

Applies to

EncodeToTokens(String, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Source:: CodeGenTokenizer.cs

Source:: CodeGenTokenizer.cs

Source:: CodeGenTokenizer.cs

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string text, bool addPrefixSpace, bool addBeginningOfSentence, bool addEndOfSentence, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);

override this.EncodeToTokens : string * bool * bool * bool * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>

Public Function EncodeToTokens (text As String, addPrefixSpace As Boolean, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)

Parameters

text: String

The text to encode.

addPrefixSpace: Boolean

Indicate whether to include a leading space before encoding the text.

addBeginningOfSentence: Boolean

Indicate whether to include the beginning of sentence token in the encoding.

addEndOfSentence: Boolean

Indicate whether to include the end of sentence token in the encoding.

normalizedText: String

If the tokenizer's normalization is enabled, the input text will be represented in its normalization form; otherwise, it will null.

considerPreTokenization: Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization: Boolean

Indicate whether to consider normalization before tokenization.

Returns

IReadOnlyList<EncodedToken>

The tokenization result includes the tokens list, tokens Ids, tokens offset mapping.

Applies to

Udostępnij przez

CodeGenTokenizer.EncodeToTokens Method

Definition

Overloads

EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)

Parameters

Returns

Applies to

EncodeToTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Parameters

Returns

Applies to

EncodeToTokens(String, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Parameters

Returns

Applies to