Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This multi-part tutorial covers the creation of a new data source extension for Power Query. The tutorial is meant to be done sequentially—each lesson builds on the connector created in previous lessons, incrementally adding new capabilities to your connector.
In this lesson, you:
- Define a fixed schema for a REST API
- Dynamically set data types for columns
- Enforce a table structure to avoid transformation errors due to missing columns
- Hide columns from the result set
One of the large advantages of an OData service over a standard REST API is its $metadata definition. The $metadata document describes the data found on this service, including the schema for all of its Entities (Tables) and Fields (Columns). The OData.Feed function uses this schema definition to automatically set data type information. So instead of getting all text and number fields (like you would from Json.Document), end users get dates, whole numbers, times, and so on, providing a better overall user experience.
Many REST APIs don't have a way to programmatically determine their schema. In these cases, you need to include schema definitions within your connector. In this lesson, you define a simple, hardcoded schema for each of your tables, and enforce the schema on the data you read from the service.
Note
The approach described here should work for many REST services. Future lessons build upon this approach by recursively enforcing schemas on structured columns (record, list, table). They also provide sample implementations that can programmatically generate a schema table from CSDL or JSON Schema documents.
Overall, enforcing a schema on the data returned by your connector has multiple benefits, such as:
- Setting the correct data types
- Removing columns that don't need to be shown to end users (such as internal IDs or state information)
- Ensuring that each page of data has the same shape by adding any columns that might be missing from a response (a common way for REST APIs to indicate a field should be
null)
Viewing the existing schema with Table.Schema
The connector created in the previous lesson displays three tables from the TripPin service: Airlines, Airports, and People. Run the following query to view the Airlines table:
let
source = TripPin.Contents(),
data = source{[Name="Airlines"]}[Data]
in
data
In the results, four columns are returned:
- @odata.id
- @odata.editLink
- AirlineCode
- Name
The "@odata.*" columns are part of OData protocol, and not something you want or need to show to the end users of your connector. AirlineCode and Name are the two columns you want to keep. If you look at the schema of the table (using the handy Table.Schema function), all of the columns in the table have a data type of Any.Type.
let
source = TripPin.Contents(),
data = source{[Name="Airlines"]}[Data]
in
Table.Schema(data)
Table.Schema returns many metadata about the columns in a table, including names, positions, type information, and many advanced properties, such as Precision, Scale, and MaxLength. Future lessons provide design patterns for setting these advanced properties, but for now you need only concern yourself with the ascribed type (TypeName), primitive type (Kind), and whether the column value might be null (IsNullable).
Defining a simple schema table
Your schema table is going to be composed of two columns:
| Column | Details |
|---|---|
| Name | The name of the column. This name must match the name in the results returned by the service. |
| Type | The M data type you're going to set. This type can be a primitive type (text, number, datetime, and so on), or an ascribed type (Int64.Type, Currency.Type, and so on). |
The hardcoded schema table for the Airlines table sets its AirlineCode and Name columns to text, and looks like this:
Airlines = #table({"Name", "Type"}, {
{"AirlineCode", type text},
{"Name", type text}
});
The Airports table has four fields you want to keep (including one of type record):
Airports = #table({"Name", "Type"}, {
{"IcaoCode", type text},
{"Name", type text},
{"IataCode", type text},
{"Location", type record}
});
Finally, the People table has seven fields, including lists (Emails, AddressInfo), a nullable column (Gender), and a column with an ascribed type (Concurrency).
People = #table({"Name", "Type"}, {
{"UserName", type text},
{"FirstName", type text},
{"LastName", type text},
{"Emails", type list},
{"AddressInfo", type list},
{"Gender", type nullable text},
{"Concurrency", Int64.Type}
})
The SchemaTransformTable helper function
The SchemaTransformTable helper function described in this section is used to enforce schemas on your data. It takes the following parameters:
| Parameter | Type | Description |
|---|---|---|
| table | table | The table of data you want to enforce your schema on. |
| schema | table | The schema table to read column information from, with the following type: type table [Name = text, Type = type]. |
| enforceSchema | number | (optional) An enum that controls behavior of the function. The default value ( EnforceSchema.Strict = 1) ensures that the output table matches the provided schema table by adding any missing columns, and removing extra columns. The EnforceSchema.IgnoreExtraColumns = 2 option can be used to preserve extra columns in the result. When EnforceSchema.IgnoreMissingColumns = 3 is used, both missing columns and extra columns are ignored. |
The logic for this function looks something like this:
- Determine if there are any missing columns from the source table.
- Determine if there are any extra columns.
- Ignore structured columns (of type
list,record, andtable), and columns set totype any. - Use Table.TransformColumnTypes to set each column type.
- Reorder columns based on the order they appear in the schema table.
- Set the type on the table itself using Value.ReplaceType.
Note
The last step to set the table type removes the need for the Power Query UI to infer type information when viewing the results in the query editor. This setting removes the double request issue you saw at the end of the previous tutorial.
The following helper code can be copy and pasted into your extension:
EnforceSchema.Strict = 1; // Add any missing columns, remove extra columns, set table type
EnforceSchema.IgnoreExtraColumns = 2; // Add missing columns, do not remove extra columns
EnforceSchema.IgnoreMissingColumns = 3; // Do not add or remove columns
SchemaTransformTable = (table as table, schema as table, optional enforceSchema as number) as table =>
let
// Default to EnforceSchema.Strict
_enforceSchema = if (enforceSchema <> null) then enforceSchema else EnforceSchema.Strict,
// Applies type transforms to a given table
EnforceTypes = (table as table, schema as table) as table =>
let
map = (t) => if Type.Is(t, type list) or Type.Is(t, type record) or t = type any then null else t,
mapped = Table.TransformColumns(schema, {"Type", map}),
omitted = Table.SelectRows(mapped, each [Type] <> null),
existingColumns = Table.ColumnNames(table),
removeMissing = Table.SelectRows(omitted, each List.Contains(existingColumns, [Name])),
primativeTransforms = Table.ToRows(removeMissing),
changedPrimatives = Table.TransformColumnTypes(table, primativeTransforms)
in
changedPrimatives,
// Returns the table type for a given schema
SchemaToTableType = (schema as table) as type =>
let
toList = List.Transform(schema[Type], (t) => [Type=t, Optional=false]),
toRecord = Record.FromList(toList, schema[Name]),
toType = Type.ForRecord(toRecord, false)
in
type table (toType),
// Determine if we have extra/missing columns.
// The enforceSchema parameter determines what we do about them.
schemaNames = schema[Name],
foundNames = Table.ColumnNames(table),
addNames = List.RemoveItems(schemaNames, foundNames),
extraNames = List.RemoveItems(foundNames, schemaNames),
tmp = Text.NewGuid(),
added = Table.AddColumn(table, tmp, each []),
expanded = Table.ExpandRecordColumn(added, tmp, addNames),
result = if List.IsEmpty(addNames) then table else expanded,
fullList =
if (_enforceSchema = EnforceSchema.Strict) then
schemaNames
else if (_enforceSchema = EnforceSchema.IgnoreMissingColumns) then
foundNames
else
schemaNames & extraNames,
// Select the final list of columns.
// These will be ordered according to the schema table.
reordered = Table.SelectColumns(result, fullList, MissingField.Ignore),
enforcedTypes = EnforceTypes(reordered, schema),
withType = if (_enforceSchema = EnforceSchema.Strict) then Value.ReplaceType(enforcedTypes, SchemaToTableType(schema)) else enforcedTypes
in
withType;
Updating the TripPin connector
To make use of the new schema enforcement code, make the following changes to your connector.
- Define a master schema table (
SchemaTable) that holds all of your schema definitions. - Update the
TripPin.Feed,GetPage, andGetAllPagesByNextLinkto accept aschemaparameter. - Enforce your schema in
GetPage. - Update your navigation table code to wrap each table with a call to a new function (
GetEntity). This function gives you more flexibility to manipulate the table definitions in the future.
Master schema table
Now consolidate your schema definitions into a single table, and add a helper function (GetSchemaForEntity) that lets you look up the definition based on an entity name (for example, GetSchemaForEntity("Airlines")).
SchemaTable = #table({"Entity", "SchemaTable"}, {
{"Airlines", #table({"Name", "Type"}, {
{"AirlineCode", type text},
{"Name", type text}
})},
{"Airports", #table({"Name", "Type"}, {
{"IcaoCode", type text},
{"Name", type text},
{"IataCode", type text},
{"Location", type record}
})},
{"People", #table({"Name", "Type"}, {
{"UserName", type text},
{"FirstName", type text},
{"LastName", type text},
{"Emails", type list},
{"AddressInfo", type list},
{"Gender", type nullable text},
{"Concurrency", Int64.Type}
})}
});
GetSchemaForEntity = (entity as text) as table => try SchemaTable{[Entity=entity]}[SchemaTable] otherwise error "Couldn't find entity: '" & entity &"'";
Adding schema support to data functions
Now add an optional schema parameter to the TripPin.Feed, GetPage, and GetAllPagesByNextLink functions.
This parameter allows you to pass down the schema (when you want to) to the paging functions, where it's applied to the results you get back from the service.
TripPin.Feed = (url as text, optional schema as table) as table => ...
GetPage = (url as text, optional schema as table) as table => ...
GetAllPagesByNextLink = (url as text, optional schema as table) as table => ...
You also need to update all of the calls to these functions to make sure that you pass the schema through correctly.
Enforcing the schema
The actual schema enforcement is done in your GetPage function.
GetPage = (url as text, optional schema as table) as table =>
let
response = Web.Contents(url, [ Headers = DefaultRequestHeaders ]),
body = Json.Document(response),
nextLink = GetNextLink(body),
data = Table.FromRecords(body[value]),
// enforce the schema
withSchema = if (schema <> null) then SchemaTransformTable(data, schema) else data
in
withSchema meta [NextLink = nextLink];
Note
This GetPage implementation uses Table.FromRecords to convert the list of records in the JSON response to a table. A major downside to using Table.FromRecords is that it assumes all records in the list have the same set of fields. This behavior works for the TripPin service, since the OData records are guaranteed to contain the same fields, but might not be the case for all REST APIs.
A more robust implementation would use a combination of Table.FromList and Table.ExpandRecordColumn. Later tutorials demonstrate how to change the implementation to get the column list from the schema table, ensuring that no columns are lost or missing during the JSON to M translation.
Adding the GetEntity function
The GetEntity function wraps your call to TripPin.Feed. It looks up a schema definition based on the entity name, and builds the full request URL.
GetEntity = (url as text, entity as text) as table =>
let
fullUrl = Uri.Combine(url, entity),
schemaTable = GetSchemaForEntity(entity),
result = TripPin.Feed(fullUrl, schemaTable)
in
result;
You then update your TripPinNavTable function to call GetEntity, rather than making all of the calls inline.
The main advantage to this update is that it lets you continue modifying your entity building code, without having to touch your nav table logic.
TripPinNavTable = (url as text) as table =>
let
entitiesAsTable = Table.FromList(RootEntities, Splitter.SplitByNothing()),
rename = Table.RenameColumns(entitiesAsTable, {{"Column1", "Name"}}),
// Add Data as a calculated column
withData = Table.AddColumn(rename, "Data", each GetEntity(url, [Name]), type table),
// Add ItemKind and ItemName as fixed text values
withItemKind = Table.AddColumn(withData, "ItemKind", each "Table", type text),
withItemName = Table.AddColumn(withItemKind, "ItemName", each "Table", type text),
// Indicate that the node should not be expandable
withIsLeaf = Table.AddColumn(withItemName, "IsLeaf", each true, type logical),
// Generate the nav table
navTable = Table.ToNavigationTable(withIsLeaf, {"Name"}, "Name", "Data", "ItemKind", "ItemName", "IsLeaf")
in
navTable;
Putting it all together
Once all of the code changes are made, compile and rerun the test query that calls Table.Schema for the Airlines table.
let
source = TripPin.Contents(),
data = source{[Name="Airlines"]}[Data]
in
Table.Schema(data)
Your Airlines table now only has the two columns you defined in its schema:
If you run the same code against the People table...
let
source = TripPin.Contents(),
data = source{[Name="People"]}[Data]
in
Table.Schema(data)
Note that the ascribed type you used (Int64.Type) was also set correctly.
An important thing to note is that this implementation of SchemaTransformTable doesn't modify the types of list and record columns, but the Emails and AddressInfo columns are still typed as list. This behavior occurs because Json.Document correctly maps JSON arrays to M lists, and JSON objects to M records. If you expand the list or record column in Power Query, all of the expanded columns are of type any. Future tutorials improve the implementation to recursively set type information for nested complex types.
Conclusion
This tutorial provided a sample implementation for enforcing a schema on JSON data returned from a REST service. While this sample uses a simple hardcoded schema table format, the approach could be expanded upon by dynamically building a schema table definition from another source, such as a JSON schema file, or metadata service/endpoint exposed by the data source.
In addition to modifying column types (and values), your code is also setting the correct type information on the table itself. Setting this type information benefits performance when running inside of Power Query, as the user experience always attempts to infer type information to display the right UI queues to the end user, and the inference calls can end up triggering other calls to the underlying data APIs.
If you view the People table using the TripPin connector from the previous lesson, all of the columns have a 'type any' icon (even the columns that contain lists):
If you run the same query with the TripPin connector from this lesson, the type information is now displayed correctly.