DirectX 中的語音輸入

注意事項

本文與舊版 WinRT 原生 API 相關。對於新的原生應用程式項目，建議您使用 OpenXR API。

本文說明如何在 DirectX 應用程式中實作語音命令，以及適用於 Windows Mixed Reality 的小型片語和句子辨識。

注意事項

本文中的代碼段使用 C++/CX，而不是C++ 17 相容的 C++/WinRT，這會用於 C++全像攝影項目範本中。這些概念相當於C++/WinRT 專案，但您需要翻譯程序代碼。

使用SpeechRecognizer進行連續語音辨識

本節說明如何使用連續語音識別，在您的應用程式中啟用語音命令。本逐步解說會使用 HolographicVoiceInput 範例中的程序代碼。當範例正在執行時，請說出其中一個已註冊色彩命令的名稱，以變更旋轉立方體的色彩。

首先，建立新的 Windows：：Media：：SpeechRecognition：：SpeechRecognizer 實例 。

從 HolographicVoiceInputSampleMain：：CreateSpeechConstraintsForCurrentState：

m_speechRecognizer = ref new SpeechRecognizer();

建立要接聽之辨識器的語音命令清單。在這裡，我們會建構一組命令來變更全像投影的色彩。為了方便起見，我們也會建立稍後將用於命令的數據。

m_speechCommandList = ref new Platform::Collections::Vector<String^>();
   m_speechCommandData.clear();
   m_speechCommandList->Append(StringReference(L"white"));
   m_speechCommandData.push_back(float4(1.f, 1.f, 1.f, 1.f));
   m_speechCommandList->Append(StringReference(L"grey"));
   m_speechCommandData.push_back(float4(0.5f, 0.5f, 0.5f, 1.f));
   m_speechCommandList->Append(StringReference(L"green"));
   m_speechCommandData.push_back(float4(0.f, 1.f, 0.f, 1.f));
   m_speechCommandList->Append(StringReference(L"black"));
   m_speechCommandData.push_back(float4(0.1f, 0.1f, 0.1f, 1.f));
   m_speechCommandList->Append(StringReference(L"red"));
   m_speechCommandData.push_back(float4(1.f, 0.f, 0.f, 1.f));
   m_speechCommandList->Append(StringReference(L"yellow"));
   m_speechCommandData.push_back(float4(1.f, 1.f, 0.f, 1.f));
   m_speechCommandList->Append(StringReference(L"aquamarine"));
   m_speechCommandData.push_back(float4(0.f, 1.f, 1.f, 1.f));
   m_speechCommandList->Append(StringReference(L"blue"));
   m_speechCommandData.push_back(float4(0.f, 0.f, 1.f, 1.f));
   m_speechCommandList->Append(StringReference(L"purple"));
   m_speechCommandData.push_back(float4(1.f, 0.f, 1.f, 1.f));

您可以使用可能不在字典中的注音文字來指定命令。

m_speechCommandList->Append(StringReference(L"SpeechRecognizer"));
   m_speechCommandData.push_back(float4(0.5f, 0.1f, 1.f, 1.f));

若要將命令清單載入語音辨識器的條件約束清單中，請使用 SpeechRecognitionListConstraint 物件。

SpeechRecognitionListConstraint^ spConstraint = ref new SpeechRecognitionListConstraint(m_speechCommandList);
   m_speechRecognizer->Constraints->Clear();
   m_speechRecognizer->Constraints->Append(spConstraint);
   create_task(m_speechRecognizer->CompileConstraintsAsync()).then([this](SpeechRecognitionCompilationResult^ compilationResult)
   {
       if (compilationResult->Status == SpeechRecognitionResultStatus::Success)
       {
           m_speechRecognizer->ContinuousRecognitionSession->StartAsync();
       }
       else
       {
           // Handle errors here.
       }
   });

訂閱語音辨識器之 SpeechContinuousRecognitionSession 上的 ResultGenerated 事件。當您的其中一個命令已辨識時，此事件會通知您的應用程式。

m_speechRecognizer->ContinuousRecognitionSession->ResultGenerated +=
       ref new TypedEventHandler<SpeechContinuousRecognitionSession^, SpeechContinuousRecognitionResultGeneratedEventArgs^>(
           std::bind(&HolographicVoiceInputSampleMain::OnResultGenerated, this, _1, _2)
           );

OnResultGenerated 事件處理程式會接收 SpeechContinuousRecognitionResultGeneratedEventArgs 實例中的事件數據。如果信賴度大於您定義的閾值，您的應用程式應該會注意到事件已發生。儲存事件數據，以便在稍後的更新迴圈中使用它。

從 HolographicVoiceInputSampleMain.cpp：

// Change the cube color, if we get a valid result.
   void HolographicVoiceInputSampleMain::OnResultGenerated(SpeechContinuousRecognitionSession ^sender, SpeechContinuousRecognitionResultGeneratedEventArgs ^args)
   {
       if (args->Result->RawConfidence > 0.5f)
       {
           m_lastCommand = args->Result->Text;
       }
   }

在我們的範例程式代碼中，我們會根據使用者的命令變更旋轉全像投影 Cube 的色彩。

從 HolographicVoiceInputSampleMain：：Update：

// Check for new speech input since the last frame.
   if (m_lastCommand != nullptr)
   {
       auto command = m_lastCommand;
       m_lastCommand = nullptr;

       int i = 0;
       for each (auto& iter in m_speechCommandList)
       {
           if (iter == command)
           {
               m_spinningCubeRenderer->SetColor(m_speechCommandData[i]);
               break;
           }

           ++i;
       }
   }

使用「單次」辨識

您可以設定語音辨識器來接聽使用者說出的片語或句子。在此情況下，我們會套用 SpeechRecognitionTopicConstraint ，告知語音辨識器預期的輸入類型。以下是此案例的應用程式工作流程：

您的應用程式會建立SpeechRecognizer、提供UI提示，並開始接聽語音命令。
使用者會說出片語或句子。
系統會辨識用戶的語音，並將結果傳回應用程式。此時，您的應用程式應該會提供UI提示，指出已發生辨識。
根據您想要回應的信賴等級和語音辨識結果的信賴等級，您的應用程式可以適當地處理結果並回應。

本節說明如何建立 SpeechRecognizer、編譯條件約束，以及接聽語音輸入。

下列程式代碼會編譯主題條件約束，在此案例中，該條件約束已針對 Web 搜尋優化。

auto constraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::WebSearch, L"webSearch");
   m_speechRecognizer->Constraints->Clear();
   m_speechRecognizer->Constraints->Append(constraint);
   return create_task(m_speechRecognizer->CompileConstraintsAsync())
       .then([this](task<SpeechRecognitionCompilationResult^> previousTask)
   {

如果編譯成功，我們可以繼續進行語音識別。

try
       {
           SpeechRecognitionCompilationResult^ compilationResult = previousTask.get();

           // Check to make sure that the constraints were in a proper format and the recognizer was able to compile it.
           if (compilationResult->Status == SpeechRecognitionResultStatus::Success)
           {
               // If the compilation succeeded, we can start listening for the user's spoken phrase or sentence.
               create_task(m_speechRecognizer->RecognizeAsync()).then([this](task<SpeechRecognitionResult^>& previousTask)
               {

然後，結果會傳回給應用程式。如果我們對結果有足夠的信心，我們可以處理命令。此程式代碼範例會處理至少具有中度信賴度的結果。

try
                   {
                       auto result = previousTask.get();

                       if (result->Status != SpeechRecognitionResultStatus::Success)
                       {
                           PrintWstringToDebugConsole(
                               std::wstring(L"Speech recognition was not successful: ") +
                               result->Status.ToString()->Data() +
                               L"\n"
                               );
                       }

                       // In this example, we look for at least medium confidence in the speech result.
                       if ((result->Confidence == SpeechRecognitionConfidence::High) ||
                           (result->Confidence == SpeechRecognitionConfidence::Medium))
                       {
                           // If the user said a color name anywhere in their phrase, it will be recognized in the
                           // Update loop; then, the cube will change color.
                           m_lastCommand = result->Text;

                           PrintWstringToDebugConsole(
                               std::wstring(L"Speech phrase was: ") +
                               m_lastCommand->Data() +
                               L"\n"
                               );
                       }
                       else
                       {
                           PrintWstringToDebugConsole(
                               std::wstring(L"Recognition confidence not high enough: ") +
                               result->Confidence.ToString()->Data() +
                               L"\n"
                               );
                       }
                   }

每當您使用語音辨識時，watch 可能會指出使用者已關閉系統隱私權設定中麥克風的例外狀況。這可能會在初始化或辨識期間發生。

catch (Exception^ exception)
                   {
                       // Note that if you get an "Access is denied" exception, you might need to enable the microphone
                       // privacy setting on the device and/or add the microphone capability to your app manifest.

                       PrintWstringToDebugConsole(
                           std::wstring(L"Speech recognizer error: ") +
                           exception->ToString()->Data() +
                           L"\n"
                           );
                   }
               });

               return true;
           }
           else
           {
               OutputDebugStringW(L"Could not initialize predefined grammar speech engine!\n");

               // Handle errors here.
               return false;
           }
       }
       catch (Exception^ exception)
       {
           // Note that if you get an "Access is denied" exception, you might need to enable the microphone
           // privacy setting on the device and/or add the microphone capability to your app manifest.

           PrintWstringToDebugConsole(
               std::wstring(L"Exception while trying to initialize predefined grammar speech engine:") +
               exception->Message->Data() +
               L"\n"
               );

           // Handle exceptions here.
           return false;
       }
   });

注意事項

有數個預先定義的 SpeechRecognitionScenarios 可用來優化語音辨識。

若要優化聽寫，請使用聽寫案例。

// Compile the dictation topic constraint, which optimizes for speech dictation.
auto dictationConstraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::Dictation, "dictation");
m_speechRecognizer->Constraints->Append(dictationConstraint);

針對語音 Web 搜尋，請使用下列 Web 特定案例條件約束。

// Add a web search topic constraint to the recognizer.
auto webSearchConstraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::WebSearch, "webSearch");
speechRecognizer->Constraints->Append(webSearchConstraint);

使用表單條件約束填寫表單。在此情況下，最好套用已針對填寫窗體優化的自有文法。

// Add a form constraint to the recognizer.
auto formConstraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::FormFilling, "formFilling");
speechRecognizer->Constraints->Append(formConstraint );

您可以使用 SRGS 格式提供自己的文法。

使用連續辨識

如需持續聽寫案例，請參閱 Windows 10 UWP 語音程式代碼範例。

處理品質降低

環境狀況有時會干擾語音辨識。例如，會議室可能太雜，或使用者說話太大聲。可能的話，語音辨識 API 會提供造成品質降低之條件的相關信息。這項資訊會透過 WinRT 事件推送至您的應用程式。下列範例示範如何訂閱此事件。

m_speechRecognizer->RecognitionQualityDegrading +=
       ref new TypedEventHandler<SpeechRecognizer^, SpeechRecognitionQualityDegradingEventArgs^>(
           std::bind(&HolographicVoiceInputSampleMain::OnSpeechQualityDegraded, this, _1, _2)
           );

在我們的程式代碼範例中，我們會將條件資訊寫入偵錯控制台。應用程式可能想要透過UI、語音合成和其他方法，提供意見反應給使用者。或者，當語音因暫時降低品質而中斷時，可能需要以不同的方式運作。

void HolographicSpeechPromptSampleMain::OnSpeechQualityDegraded(SpeechRecognizer^ recognizer, SpeechRecognitionQualityDegradingEventArgs^ args)
   {
       switch (args->Problem)
       {
       case SpeechRecognitionAudioProblem::TooFast:
           OutputDebugStringW(L"The user spoke too quickly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooSlow:
           OutputDebugStringW(L"The user spoke too slowly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooQuiet:
           OutputDebugStringW(L"The user spoke too softly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooLoud:
           OutputDebugStringW(L"The user spoke too loudly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooNoisy:
           OutputDebugStringW(L"There is too much noise in the signal.\n");
           break;

       case SpeechRecognitionAudioProblem::NoSignal:
           OutputDebugStringW(L"There is no signal.\n");
           break;

       case SpeechRecognitionAudioProblem::None:
       default:
           OutputDebugStringW(L"An error was reported with no information.\n");
           break;
       }
   }

如果您不是使用 ref 類別來建立 DirectX 應用程式，則必須先取消訂閱事件，才能發行或重新建立語音辨識器。 HolographicSpeechPromptSample 具有停止辨識和取消訂閱事件的例程。

Concurrency::task<void> HolographicSpeechPromptSampleMain::StopCurrentRecognizerIfExists()
   {
       return create_task([this]()
       {
           if (m_speechRecognizer != nullptr)
           {
               return create_task(m_speechRecognizer->StopRecognitionAsync()).then([this]()
               {
                   m_speechRecognizer->RecognitionQualityDegrading -= m_speechRecognitionQualityDegradedToken;

                   if (m_speechRecognizer->ContinuousRecognitionSession != nullptr)
                   {
                       m_speechRecognizer->ContinuousRecognitionSession->ResultGenerated -= m_speechRecognizerResultEventToken;
                   }
               });
           }
           else
           {
               return create_task([this]() { m_speechRecognizer = nullptr; });
           }
       });
   }

使用語音合成來提供聽聽的提示

全像攝影語音範例會使用語音合成，為使用者提供可聽見的指示。本節說明如何建立合成的語音範例，然後透過 HRTF 音訊 API 播放。

建議您在要求片語輸入時，提供您自己的語音提示。提示也有助於指出何時可以針對連續辨識案例說出語音命令。下列範例示範如何使用語音合成器來執行這項作。您也可以使用預先錄製的語音剪輯、視覺效果 UI，或其他要說的指標，例如，在提示不是動態的案例中。

首先，建立SpeechSynthesizer物件。

auto speechSynthesizer = ref new Windows::Media::SpeechSynthesis::SpeechSynthesizer();

您也需要包含要合成之文字的字串。

// Phrase recognition works best when requesting a phrase or sentence.
   StringReference voicePrompt = L"At the prompt: Say a phrase, asking me to change the cube to a specific color.";

語音會透過 SynthesizeTextToStreamAsync 以異步方式合成。在這裡，我們會開始異步工作來合成語音。

create_task(speechSynthesizer->SynthesizeTextToStreamAsync(voicePrompt), task_continuation_context::use_current())
       .then([this, speechSynthesizer](task<Windows::Media::SpeechSynthesis::SpeechSynthesisStream^> synthesisStreamTask)
   {
       try
       {

語音合成會以位元組數據流形式傳送。我們可以使用該位元組數據流來初始化 XAudio2 語音。針對全像攝影程式代碼範例，我們會將其播放為 HRTF 音訊效果。

Windows::Media::SpeechSynthesis::SpeechSynthesisStream^ stream = synthesisStreamTask.get();

           auto hr = m_speechSynthesisSound.Initialize(stream, 0);
           if (SUCCEEDED(hr))
           {
               m_speechSynthesisSound.SetEnvironment(HrtfEnvironment::Small);
               m_speechSynthesisSound.Start();

               // Amount of time to pause after the audio prompt is complete, before listening
               // for speech input.
               static const float bufferTime = 0.15f;

               // Wait until the prompt is done before listening.
               m_secondsUntilSoundIsComplete = m_speechSynthesisSound.GetDuration() + bufferTime;
               m_waitingForSpeechPrompt = true;
           }
       }

如同語音辨識，如果發生錯誤，語音合成會擲回例外狀況。

catch (Exception^ exception)
       {
           PrintWstringToDebugConsole(
               std::wstring(L"Exception while trying to synthesize speech: ") +
               exception->Message->Data() +
               L"\n"
               );

           // Handle exceptions here.
       }
   });

另請參閱

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-02-18

共用方式為

DirectX 中的語音輸入

使用SpeechRecognizer進行連續語音辨識

使用「單次」辨識

使用連續辨識

處理品質降低

使用語音合成來提供聽聽的提示

另請參閱

意見反應

其他資源