Udostępnij przez


CodeGenTokenizer.EncodeToTokens Method

Definition

Overloads

EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)

Encodes input text to a list of EncodedTokens.

EncodeToTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

EncodeToTokens(String, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

EncodeToTokens(String, ReadOnlySpan<Char>, EncodeSettings)

Source:
CodeGenTokenizer.cs
Source:
CodeGenTokenizer.cs
Source:
CodeGenTokenizer.cs

Encodes input text to a list of EncodedTokens.

protected override Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string? text, ReadOnlySpan<char> textSpan, Microsoft.ML.Tokenizers.EncodeSettings settings);
override this.EncodeToTokens : string * ReadOnlySpan<char> * Microsoft.ML.Tokenizers.EncodeSettings -> Microsoft.ML.Tokenizers.EncodeResults<Microsoft.ML.Tokenizers.EncodedToken>
Protected Overrides Function EncodeToTokens (text As String, textSpan As ReadOnlySpan(Of Char), settings As EncodeSettings) As EncodeResults(Of EncodedToken)

Parameters

text
String

The text to encode.

textSpan
ReadOnlySpan<Char>

The span of the text to encode which will be used if the text is null.

settings
EncodeSettings

The settings used to encode the text.

Returns

Applies to

EncodeToTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Source:
CodeGenTokenizer.cs
Source:
CodeGenTokenizer.cs
Source:
CodeGenTokenizer.cs

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(ReadOnlySpan<char> text, bool addPrefixSpace, bool addBeginningOfSentence, bool addEndOfSentence, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToTokens : ReadOnlySpan<char> * bool * bool * bool * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>
Public Function EncodeToTokens (text As ReadOnlySpan(Of Char), addPrefixSpace As Boolean, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)

Parameters

text
ReadOnlySpan<Char>

The text to encode.

addPrefixSpace
Boolean

Indicate whether to include a leading space before encoding the text.

addBeginningOfSentence
Boolean

Indicate whether to include the beginning of sentence token in the encoding.

addEndOfSentence
Boolean

Indicate whether to include the end of sentence token in the encoding.

normalizedText
String

If the tokenizer's normalization is enabled, the input text will be represented in its normalization form; otherwise, it will null.

considerPreTokenization
Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization
Boolean

Indicate whether to consider normalization before tokenization.

Returns

The tokenization result includes the tokens list, tokens Ids, tokens offset mapping.

Applies to

EncodeToTokens(String, Boolean, Boolean, Boolean, String, Boolean, Boolean)

Source:
CodeGenTokenizer.cs
Source:
CodeGenTokenizer.cs
Source:
CodeGenTokenizer.cs

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken> EncodeToTokens(string text, bool addPrefixSpace, bool addBeginningOfSentence, bool addEndOfSentence, out string? normalizedText, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToTokens : string * bool * bool * bool * string * bool * bool -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.Tokenizers.EncodedToken>
Public Function EncodeToTokens (text As String, addPrefixSpace As Boolean, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, ByRef normalizedText As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of EncodedToken)

Parameters

text
String

The text to encode.

addPrefixSpace
Boolean

Indicate whether to include a leading space before encoding the text.

addBeginningOfSentence
Boolean

Indicate whether to include the beginning of sentence token in the encoding.

addEndOfSentence
Boolean

Indicate whether to include the end of sentence token in the encoding.

normalizedText
String

If the tokenizer's normalization is enabled, the input text will be represented in its normalization form; otherwise, it will null.

considerPreTokenization
Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization
Boolean

Indicate whether to consider normalization before tokenization.

Returns

The tokenization result includes the tokens list, tokens Ids, tokens offset mapping.

Applies to