如何使用 Microsoft Fabric REST API 创建和更新 Spark 作业定义

Microsoft Fabric REST API 为 Fabric 项的 CRUD作提供服务终结点。在本教程中，我们将逐步演示介绍如何创建和更新 Spark 作业定义项目的整个场景。涉及三个大致的步骤：

创建具有某种初始状态的 Spark 作业定义项。
上传主定义文件和其他库文件。
用主定义文件的 OneLake URL 和其他库文件来更新 Spark 作业定义项。

先决条件

访问 Fabric REST API 需要Microsoft Entra 令牌。建议使用 MSAL 库来获取令牌。有关详细信息，请参阅 MSAL 中的身份验证流支持。
访问 OneLake API 需要存储令牌。有关详细信息，请参阅适用于 Python 的 MSAL。

创建具有初始状态的 Spark 作业定义项

Microsoft Fabric REST API 定义了一个用于 Fabric 项 CRUD 操作的统一终结点。终结点为 https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items。

项目信息的详细内容是在请求正文中指定的。下面是用于创建 Spark 作业定义项的请求正文示例：

{
    "displayName": "SJDHelloWorld",
    "type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload": "<REDACTED>",
                "payloadType": "InlineBase64"
            }
        ]
    }
}

在此示例中，Spark 作业定义项命名 SJDHelloWorld。该 payload 字段是详细设置的 base64 编码内容。解码后，内容为：

{
    "executableFile":null,
    "defaultLakehouseArtifactId":"",
    "mainClass":"",
    "additionalLakehouseIds":[],
    "retryPolicy":null,
    "commandLineArguments":"",
    "additionalLibraryUris":[],
    "language":"",
    "environmentArtifactId":null
}

下面是两个帮助程序函数，用于对详细设置进行编码和解码：

import base64

def json_to_base64(json_data):
    # Serialize the JSON data to a string
    json_string = json.dumps(json_data)
    
    # Encode the JSON string as bytes
    json_bytes = json_string.encode('utf-8')
    
    # Encode the bytes as Base64
    base64_encoded = base64.b64encode(json_bytes).decode('utf-8')
    
    return base64_encoded

def base64_to_json(base64_data):
    # Decode the Base64-encoded string to bytes
    base64_bytes = base64_data.encode('utf-8')
    
    # Decode the bytes to a JSON string
    json_string = base64.b64decode(base64_bytes).decode('utf-8')
    
    # Deserialize the JSON string to a Python dictionary
    json_data = json.loads(json_string)
    
    return json_data

下面是用于创建 Spark 作业定义项的代码片段：

import requests

bearerToken = "<REDACTED>"  # Replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

payload = "<REDACTED>"

# Define the payload data for the POST request
payload_data = {
    "displayName": "SJDHelloWorld",
    "Type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload": payload,
                "payloadType": "InlineBase64"
            }
        ]
    }
}

# Make the POST request with Bearer authentication
sjdCreateUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items"
response = requests.post(sjdCreateUrl, json=payload_data, headers=headers)

上传主定义文件和其他 lib 文件

将文件上传到 OneLake 需要存储令牌。下面是用于获取存储令牌的帮助程序函数：

import msal

def getOnelakeStorageToken():
    app = msal.PublicClientApplication(
        "<REDACTED>",  # This field should be the client ID 
        authority="https://login.microsoftonline.com/microsoft.com")

    result = app.acquire_token_interactive(scopes=["https://storage.azure.com/.default"])

    print(f"Successfully acquired AAD token with storage audience:{result['access_token']}")

    return result['access_token']

现在已创建 Spark 作业定义项。若要使其可运行，我们需要设置主定义文件和所需属性。用于上传此 SJD 项的文件的终结点是 https://onelake.dfs.fabric.microsoft.com/{workspaceId}/{sjdartifactid}。应使用上一步中的同一个“workspaceId”。可以在上一步的响应消息体中找到“sjdartifactid”的值。下面是用于设置主定义文件的代码片段：

import requests

# Three steps are required: create file, append file, flush file

onelakeEndPoint = "https://onelake.dfs.fabric.microsoft.com/workspaceId/sjdartifactid"  # Replace the ID of workspace and artifact with the right one
mainExecutableFile = "main.py"  # The name of the main executable file
mainSubFolder = "Main"  # The sub folder name of the main executable file. Don't change this value


onelakeRequestMainFileCreateUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?resource=file"  # The URL for creating the main executable file via the 'file' resource type
onelakePutRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}",  # The storage token can be achieved from the helper function above
}

onelakeCreateMainFileResponse = requests.put(onelakeRequestMainFileCreateUrl, headers=onelakePutRequestHeaders)
if onelakeCreateMainFileResponse.status_code == 201:
    # Request was successful
    print(f"Main File '{mainExecutableFile}' was successfully created in OneLake.")

# With the previous step, the main executable file is created in OneLake. Now we need to append the content of the main executable file

appendPosition = 0
appendAction = "append"

### Main File Append.
mainExecutableFileSizeInBytes = 83  # The size of the main executable file in bytes
onelakeRequestMainFileAppendUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={appendPosition}&action={appendAction}"
mainFileContents = "<REDACTED>"  # The content of the main executable file, please replace this with the real content of the main executable file
mainExecutableFileSizeInBytes = 83  # The size of the main executable file in bytes, this value should match the size of the mainFileContents

onelakePatchRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}",
    "Content-Type": "text/plain"
}

onelakeAppendMainFileResponse = requests.patch(onelakeRequestMainFileAppendUrl, data = mainFileContents, headers=onelakePatchRequestHeaders)
if onelakeAppendMainFileResponse.status_code == 202:
    # Request was successful
    print(f"Successfully accepted main file '{mainExecutableFile}' append data.")

# With the previous step, the content of the main executable file is appended to the file in OneLake. Now we need to flush the file

flushAction = "flush"

### Main File flush
onelakeRequestMainFileFlushUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={mainExecutableFileSizeInBytes}&action={flushAction}"
print(onelakeRequestMainFileFlushUrl)
onelakeFlushMainFileResponse = requests.patch(onelakeRequestMainFileFlushUrl, headers=onelakePatchRequestHeaders)
if onelakeFlushMainFileResponse.status_code == 200:
    print(f"Successfully flushed main file '{mainExecutableFile}' contents.")
else:
    print(onelakeFlushMainFileResponse.json())

请遵循相同的过程根据需要上传其他 lib 文件。

使用主定义文件和其他 lib 文件的 OneLake URL 更新 Spark 作业定义项

到目前为止，我们已经创建了一个 Spark 作业定义项，其中包含一些初始状态，并上传了主定义文件和其他库文件。最后一步是更新 Spark 作业定义项，以设置主定义文件的 URL 属性和其他库文件。用于更新 Spark 作业定义项的终结点为 https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{sjdartifactid}。应使用前面步骤中的同一“workspaceId”和“sjdartifactid”。下面是用于更新 Spark 作业定义项的代码片段：

mainAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Main/{mainExecutableFile}"  # The workspaceId and sjdartifactid are the same as previous steps, the mainExecutableFile is the name of the main executable file
libsAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Libs/{libsFile}"  # The workspaceId and sjdartifactid are the same as previous steps, the libsFile is the name of the libs file
defaultLakehouseId = '<REDACTED>'  # Replace this with the real default lakehouse ID

updateRequestBodyJson = {
    "executableFile": mainAbfssPath,
    "defaultLakehouseArtifactId": defaultLakehouseId,
    "mainClass": "",
    "additionalLakehouseIds": [],
    "retryPolicy": None,
    "commandLineArguments": "",
    "additionalLibraryUris": [libsAbfssPath],
    "language": "Python",
    "environmentArtifactId": None}

# Encode the bytes as a Base64-encoded string
base64EncodedUpdateSJDPayload = json_to_base64(updateRequestBodyJson)

# Print the Base64-encoded string
print("Base64-encoded JSON payload for SJD Update:")
print(base64EncodedUpdateSJDPayload)

# Define the API URL
updateSjdUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items/{sjdartifactid}/updateDefinition"

updatePayload = base64EncodedUpdateSJDPayload
payloadType = "InlineBase64"
path = "SparkJobDefinitionV1.json"
format = "SparkJobDefinitionV1"
Type = "SparkJobDefinition"

# Define the headers with Bearer authentication
bearerToken = "<REDACTED>"  # Replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

# Define the payload data for the POST request
payload_data = {
    "displayName": "sjdCreateTest11",
    "Type": Type,
    "definition": {
        "format": format,
        "parts": [
            {
                "path": path,
                "payload": updatePayload,
                "payloadType": payloadType
            }
        ]
    }
}


# Make the POST request with Bearer authentication
response = requests.post(updateSjdUrl, json=payload_data, headers=headers)
if response.status_code == 200:
    print("Successfully updated SJD.")
else:
    print(response.json())
    print(response.status_code)

若要回顾整个过程，需要使用 Fabric REST API 和 OneLake API 来创建和更新 Spark 作业定义项。 Fabric REST API 用于创建和更新 Spark 作业定义项。 OneLake API 用于上传主定义文件和其他 lib 文件。首先将主定义文件和其他 lib 文件上传到 OneLake。然后在 Spark 作业定义项中设置主定义文件和其他 lib 文件的 URL 属性。

计划和运行 Apache Spark 作业定义

反馈

此页面是否有帮助？

Last updated on 2025-12-05