Nota
O acesso a esta página requer autorização. Podes tentar iniciar sessão ou mudar de diretório.
O acesso a esta página requer autorização. Podes tentar mudar de diretório.
In this article, you do the following steps.
Run a custom script to install Microsoft Cognitive Toolkit on an Azure HDInsight Spark cluster.
Upload a Jupyter Notebook to the Apache Spark cluster to see how to apply a trained Microsoft Cognitive Toolkit deep learning model to files in an Azure Blob Storage Account using the Spark Python API (PySpark)
Pré-requisitos
Um cluster do Apache Spark no HDInsight. Consulte Criar um cluster do Apache Spark.
Familiarity with using Jupyter Notebooks with Spark on HDInsight. Para obter mais informações, consulte Carregar dados e executar consultas com o Apache Spark no HDInsight.
How does this solution flow?
This solution is divided between this article and a Jupyter Notebook that you upload as part of this article. In this article, you complete the following steps:
- Run a script action on an HDInsight Spark cluster to install Microsoft Cognitive Toolkit and Python packages.
- Upload the Jupyter Notebook that runs the solution to the HDInsight Spark cluster.
The following remaining steps are covered in the Jupyter Notebook.
- Load sample images into a Spark Resilient Distributed Dataset or RDD.
- Load modules and define presets.
- Download the dataset locally on the Spark cluster.
- Convert the dataset into an RDD.
- Score the images using a trained Cognitive Toolkit model.
- Download the trained Cognitive Toolkit model to the Spark cluster.
- Define functions to be used by worker nodes.
- Score the images on worker nodes.
- Evaluate model accuracy.
Install Microsoft Cognitive Toolkit
You can install Microsoft Cognitive Toolkit on a Spark cluster using script action. Script action uses custom scripts to install components on the cluster that aren't available by default. You can use the custom script from the Azure portal, by using HDInsight .NET SDK, or by using Azure PowerShell. You can also use the script to install the toolkit either as part of cluster creation, or after the cluster is up and running.
In this article, we use the portal to install the toolkit, after the cluster has been created. For other ways to run the custom script, see Customize HDInsight clusters using Script Action.
Utilizar o portal do Azure
For instructions on how to use the Azure portal to run script action, see Customize HDInsight clusters using Script Action. Make sure you provide the following inputs to install Microsoft Cognitive Toolkit. Use the following values for your script action:
| Propriedade | Value |
|---|---|
| Tipo de script | - Personalizado |
| Nome | Install MCT |
| URI de script Bash | https://raw.githubusercontent.com/Azure-Samples/hdinsight-pyspark-cntk-integration/master/cntk-install.sh |
| Node type(s): | Chefe, Trabalhador |
| Parâmetros | Nenhum |
Upload the Jupyter Notebook to Azure HDInsight Spark cluster
To use the Microsoft Cognitive Toolkit with the Azure HDInsight Spark cluster, you must load the Jupyter Notebook CNTK_model_scoring_on_Spark_walkthrough.ipynb to the Azure HDInsight Spark cluster. This notebook is available on GitHub at https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration.
Download and unzip https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration.
Em um navegador da Web, navegue até
https://CLUSTERNAME.azurehdinsight.net/jupyter, ondeCLUSTERNAMEé o nome do cluster.From the Jupyter Notebook, select Upload in the top-right corner and then navigate to the download and select file
CNTK_model_scoring_on_Spark_walkthrough.ipynb.
Select Upload again.
After the notebook is uploaded, click the name of the notebook and then follow the instructions in the notebook itself on how to load the data set and perform the article.
Ver também
Cenários
- Apache Spark com BI: execute análise de dados interativa usando o Spark no HDInsight com ferramentas de BI
- Apache Spark com Machine Learning: use o Spark no HDInsight para analisar a temperatura do edifício usando dados de HVAC
- Apache Spark com Machine Learning: use o Spark no HDInsight para prever resultados de inspeção de alimentos
- Análise de log do site usando o Apache Spark no HDInsight
- Application Insight telemetry data analysis using Apache Spark in HDInsight
Criar e executar aplicações
- Criar uma aplicação autónoma com o Scala
- Executar trabalhos remotamente em um cluster Apache Spark usando o Apache Livy
Ferramentas e extensões
- Utilizar o Plug-in das Ferramentas do HDInsight para o IntelliJ IDEA para criar e submeter aplicações do Spark Scala
- Use o plug-in HDInsight Tools para o IntelliJ IDEA para depurar aplicações Apache Spark remotamente
- Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight
- Kernels disponíveis para o Jupyter Notebook no cluster Apache Spark do HDInsight
- Utilizar pacotes externos com Jupyter Notebooks
- Instalar o Jupyter no computador e ligar a um cluster do Spark do HDInsight