Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
APPLIES TO:
Azure Data Factory
Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
Data flows are available in both Azure Data Factory pipelines and Azure Synapse Analytics pipelines. This article applies to mapping data flows. If you're new to transformations, refer to the introductory article Transform data using mapping data flows.
This article provides details about the use of all expressions and functions supported by Azure Data Factory and Azure Synapse Analytics in mapping data flows. For summaries of each type of function supported, refer to the following articles:
- Aggregate functions
- Array functions
- Cached lookup functions
- Conversion functions
- Date and time functions
- Expression functions
- Map functions
- Metafunctions
- Window functions
Alphabetical list of all functions
The following alphabetical list includes all the functions that are available in mapping data flows.
A
abs
abs(<value1> : number) => number
Returns the absolute value of a number.
abs(-20) -> 20abs(10) -> 10
acos
acos(<value1> : number) => double
Calculates a cosine inverse value.
acos(1) -> 0.0
add
add(<value1> : any, <value2> : any) => any
Adds a pair of strings or numbers. Adds a date to many days. Adds a duration to a time stamp. Appends one array of a similar type to another. Same as the + operator.
add(10, 20) -> 3010 + 20 -> 30add('ice', 'cream') -> 'icecream''ice' + 'cream' + ' cone' -> 'icecream cone'add(toDate('2012-12-12'), 3) -> toDate('2012-12-15')toDate('2012-12-12') + 3 -> toDate('2012-12-15')[10, 20] + [30, 40] -> [10, 20, 30, 40]toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')
addDays
addDays(<date/timestamp> : datetime, <days to add> : integral) => datetime
Adds days to a date or time stamp. Same as the + operator for dates.
addDays(toDate('2016-08-08'), 1) -> toDate('2016-08-09')
addMonths
addMonths(<date/timestamp> : datetime, <months to add> : integral, [<value3> : string]) => datetime
Adds months to a date or time stamp. You can optionally pass a time zone.
addMonths(toDate('2016-08-31'), 1) -> toDate('2016-09-30')addMonths(toTimestamp('2016-09-30 10:10:10'), -1) -> toTimestamp('2016-08-31 10:10:10')
and
and(<value1> : boolean, <value2> : boolean) => boolean
Uses the logical AND operator. Same as &&.
and(true, false) -> falsetrue && false -> false
approxDistinctCount
approxDistinctCount(<value1> : any, [ <value2> : double ]) => long
Gets the approximate aggregate count of distinct values for a column. The optional second parameter is to control the estimation error.
approxDistinctCount(ProductID, .05) => long
array
array([<value1> : any], ...) => array
Creates an array of items. All items should be of the same type. If no items are specified, an empty string array is the default. Same as a [] creation operator.
array('Seattle', 'Washington')['Seattle', 'Washington']['Seattle', 'Washington'][1]'Washington'
ascii
ascii(<Input> : string) => number
Returns the numeric value of the input character. If the input string has more than one character, the numeric value of the first character is returned.
ascii('A') -> 65ascii('a') -> 97
asin
asin(<value1> : number) => double
Calculates an inverse sine value.
asin(0) -> 0.0
assertErrorMessages
assertErrorMessages() => map
Returns a map of all error messages for the row with assert ID as the key.
Examples
assertErrorMessages() => ['assert1': 'This row failed on assert1.', 'assert2': 'This row failed on assert2.']. In this example, at(assertErrorMessages(), 'assert1') would return 'This row failed on assert1.'
associate
reassociate(<value1> : map, <value2> : binaryFunction) => map
Creates a map of key/values. All the keys and values should be of the same type. If no items are specified, it defaults to a map of string to string type. Same as a [ -> ] creation operator. Keys and values should alternate with each other.
associate('fruit', 'apple', 'vegetable', 'carrot' )=> ['fruit' -> 'apple', 'vegetable' -> 'carrot']
at
at(<value1> : array/map, <value2> : integer/key type) => array
Finds the element at an array index. The index is 1 based. An out-of-bounds index results in a null value. Finds a value in a map when given a key. If the key isn't found, it returns a null value.
at(['apples', 'pears'], 1) => 'apples'at(['fruit' -> 'apples', 'vegetable' -> 'carrot'], 'fruit') => 'apples'
atan
atan(<value1> : number) => double
Calculates an inverse tangent value.
atan(0) -> 0.0
atan2
atan2(<value1> : number, <value2> : number) => double
Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates.
atan2(0, 0) -> 0.0
avg
avg(<value1> : number) => number
Gets the average of values of a column.
avg(sales)
avgIf
avgIf(<value1> : boolean, <value2> : number) => number
Gets the average of values of a column, based on criteria.
avgIf(region == 'West', sales)
B
between
between(<value1> : any, <value2> : any, <value3> : any) => boolean
Checks if the first value is in between two other values inclusively. You can compare numeric, string, and datetime values.
between(10, 5, 24)truebetween(currentDate(), currentDate() + 10, currentDate() + 20)false
bitwiseAnd
bitwiseAnd(<value1> : integral, <value2> : integral) => integral
Uses the bitwise And operator across integral types. Same as the & operator.
bitwiseAnd(0xf4, 0xef)0xe4(0xf4 & 0xef)0xe4
bitwiseOr
bitwiseOr(<value1> : integral, <value2> : integral) => integral
Uses the bitwise Or operator across integral types. Same as the | operator.
bitwiseOr(0xf4, 0xef)0xff(0xf4 | 0xef)0xff
bitwiseXor
bitwiseXor(<value1> : any, <value2> : any) => any
Uses the bitwise Or operator across integral types. Same as the | operator
bitwiseXor(0xf4, 0xef)0x1b(0xf4 ^ 0xef)0x1b(true ^ false)true(true ^ true)false
blake2b
blake2b(<value1> : integer, <value2> : any, ...) => string
Calculates the Blake2 digest of a set of columns of varying primitive data types when given a bit length, which can only be multiples of 8 between 8 and 512. You can use it to calculate a fingerprint for a row.
blake2b(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))'c9521a5080d8da30dffb430c50ce253c345cc4c4effc315dab2162dac974711d'
blake2bBinary
blake2bBinary(<value1> : integer, <value2> : any, ...) => binary
Calculates the Blake2 digest of a set of columns of varying primitive datatypes when given a bit length, which can only be multiples of 8 between 8 and 512. You can use it to calculate a fingerprint for a row.
blake2bBinary(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))unHex('c9521a5080d8da30dffb430c50ce253c345cc4c4effc315dab2162dac974711d')
byItem
byItem(<parent column> : any, <column name> : string) => any
Finds a sub item within a structure or array of structure. If there are multiple matches, the first match is returned. If there are no matches, a NULL value is returned. The returned value must be type converted by one of the type conversion actions (such as ? date and ? string). Address column names known at design time by their names. Computed inputs aren't supported, but you can use parameter substitutions.
byItem( byName('customer'), 'orderItems') ? (itemName as string, itemQty as integer)byItem( byItem( byName('customer'), 'orderItems'), 'itemName') ? string
byName
byName(<column name> : string, [<stream name> : string]) => any
Selects a column value by name in the stream. You can pass an optional stream name as the second argument. If there are multiple matches, the first match is returned. If there are no matches, a NULL value is returned. The returned value must be type converted by one of the type conversion functions (such as TO_DATE and TO_STRING). Address column names known at design time by their names. Computed inputs aren't supported, but you can use parameter substitutions.
toString(byName('parent'))toLong(byName('income'))toBoolean(byName('foster'))toLong(byName($debtCol))toString(byName('Bogus Column'))toString(byName('Bogus Column', 'DeriveStream'))
byNames
byNames(<column names> : array, [<stream name> : string]) => any
Select an array of columns by name in the stream. You can pass an optional stream name as the second argument. If there are multiple matches, the first match is returned. If there are no matches for a column, the entire output is a NULL value. The returned value requires a type conversion function (such as toDate and toString). Address column names known at design time by their names. Computed inputs aren't supported, but you can use parameter substitutions.
toString(byNames(['parent', 'child']))byNames(['parent']) ? stringtoLong(byNames(['income']))byNames(['income']) ? longtoBoolean(byNames(['foster']))toLong(byNames($debtCols))toString(byNames(['a Column']))toString(byNames(['a Column'], 'DeriveStream'))byNames(['orderItem']) ? (itemName as string, itemQty as integer)
byOrigin
byOrigin(<column name> : string, [<origin stream name> : string]) => any
Selects a column value by name in the origin stream. The second argument is the origin stream name. If there are multiple matches, the first match is returned. If there are no matches, a NULL value is returned. The returned value must be type converted by one of the type conversion functions (such as TO_DATE and TO_STRING). Address column names known at design time by their names. Computed inputs aren't supported, but you can use parameter substitutions.
toString(byOrigin('ancestor', 'ancestorStream'))
byOrigins
byOrigins(<column names> : array, [<origin stream name> : string]) => any
Selects an array of columns by name in the stream. The second argument is the stream from where it originated. If there are multiple matches, the first match is returned. If there are no matches, a NULL value is returned. The returned value must be type converted by one of the type conversion functions (such as TO_DATE and TO_STRING). Address column names known at design time by their names. Computed inputs aren't supported, but you can use parameter substitutions.
toString(byOrigins(['ancestor1', 'ancestor2'], 'ancestorStream'))
byPath
byPath(<value1> : string, [<streamName> : string]) => any
Finds a hierarchical path by name in the stream. You can pass an optional stream name as the second argument. If no such path is found, it returns NULL. Column names/paths known at design time should be addressed only by using their name or dot notation path. Computed inputs aren't supported, but you can use parameter substitutions.
byPath('grandpa.parent.child') => column
byPosition
byPosition(<position> : integer) => any
Selects a column value by its relative position (1 based) in the stream. If the position is out of bounds, it returns a NULL value. The returned value must be type converted by one of the type conversion functions (such as TO_DATE or TO_STRING). Computed inputs aren't supported, but you can use parameter substitutions.
toString(byPosition(1))toDecimal(byPosition(2), 10, 2)toBoolean(byName(4))toString(byName($colName))toString(byPosition(1234))
C
case
case(<condition> : boolean, <true_expression> : any, <false_expression> : any, ...) => any
Based on alternating conditions, the case function applies one value or the other. If the number of inputs is even, the other is defaulted to NULL for the last condition.
case(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'case(10 + 20 == 25, 'bojjus', 'do' < 'go', 'gunchus') -> 'gunchus'isNull(case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus')) -> truecase(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus', 'dumbo') -> 'dumbo'
cbrt
cbrt(<value1> : number) => double
Calculates the cube root of a number.
cbrt(8) -> 2.0
ceil
ceil(<value1> : number) => number
Returns the smallest integer not smaller than the number.
ceil(-0.1) -> 0
char
char(<Input> : number) => string
Returns the ASCII character represented by the input number. If the number is greater than 256, the result is equivalent to char (number % 256).
char(65) -> 'A'char(97) -> 'a'
coalesce
coalesce(<value1> : any, ...) => any
Returns the first not null value from a set of inputs. All inputs should be of the same type.
coalesce(10, 20) -> 10coalesce(toString(null), toString(null), 'dumbo', 'bo', 'go') -> 'dumbo'
collect
collect(<value1> : any) => array
Collects all values of the expression in the aggregated group into an array. You can collect and transform structures to alternate structures during this process. The number of items is equal to the number of rows in that group and can contain null values. The number of collected items should be small.
collect(salesPerson)collect(firstName + lastName))collect(@(name = salesPerson, sales = salesAmount) )
collectUnique
collectUnique(<value1> : any) => array
Collects all values of the expression in the aggregated group into a unique array. You can collect and transform structures to alternate structures during this process. The number of items is equal to the number of rows in that group and can contain null values. The number of collected items should be small.
collect(salesPerson)collect(firstName + lastName))collect(@(name = salesPerson, sales = salesAmount) )
columnNames
columnNames(<value1> : string, i><value1> : boolean) => array
Gets the names of all output columns for a stream. You can pass an optional stream name as the first argument. The second argument is also optional, with false as the default. If you set the second argument to true(), Data Factory returns only columns that are drifted via schema drift.
columnNames()columnNames('DeriveStream')columnNames('DeriveStream', true())columnNames('', true())
columns
columns([<stream name> : string]) => any
Gets the values of all output columns for a stream. You can pass an optional stream name as the second argument.
columns()columns('DeriveStream')
compare
compare(<value1> : any, <value2> : any) => integer
Compares two values of the same type. Returns a negative integer if value1 < value2, 0 if value1 == value2, and a positive value if value1 > value2.
(compare(12, 24) < 1) -> true(compare('dumbo', 'dum') > 0) -> true
concat
concat(<this> : string, <that> : string, ...) => string
Concatenates a variable number of strings together. Same as the + operator with strings.
concat('dataflow', 'is', 'awesome') -> 'dataflowisawesome''dataflow' + 'is' + 'awesome' -> 'dataflowisawesome'isNull('sql' + null) -> true
concatWS
concatWS(<separator> : string, <this> : string, <that> : string, ...) => string
Concatenates a variable number of strings together with a separator. The first parameter is the separator.
concatWS(' ', 'dataflow', 'is', 'awesome') -> 'dataflow is awesome'isNull(concatWS(null, 'dataflow', 'is', 'awesome')) -> trueconcatWS(' is ', 'dataflow', 'awesome') -> 'dataflow is awesome'
contains
contains(<value1> : array, <value2> : unaryfunction) => boolean
Returns true if any element in the provided array evaluates as true in the provided predicate. Th contains function expects a reference to one element in the predicate function as #item.
contains([1, 2, 3, 4], #item == 3) -> truecontains([1, 2, 3, 4], #item > 5) -> false
cos
cos(<value1> : number) => double
Calculates a cosine value.
cos(10) -> -0.8390715290764524
cosh
cosh(<value1> : number) => double
Calculates a hyperbolic cosine of a value.
cosh(0) -> 1.0
count
count([<value1> : any]) => long
Gets the aggregate count of values. If one or more optional columns are specified, it ignores NULL values in the count.
count(custId)count(custId, custName)count()count(iif(isNull(custId), 1, NULL))
countAll
countAll([<value1> : any]) => long
Gets the aggregate count of values including null values.
countAll(custId)countAll()
countDistinct
countDistinct(<value1> : any, [<value2> : any], ...) => long
Gets the aggregate count of distinct values of a set of columns.
countDistinct(custId, custName)
countAllDistinct
countAllDistinct(<value1> : any, [<value2> : any], ...) => long
Gets the aggregate count of distinct values of a set of columns including null values.
countAllDistinct(custId, custName)
countIf
countIf(<value1> : boolean, [<value2> : any]) => long
Gets the aggregate count of values, based on criteria. If the optional column is specified, it ignores NULL values in the count.
countIf(state == 'CA' && commission < 10000, name)
covariancePopulation
covariancePopulation(<value1> : number, <value2> : number) => double
Gets the population covariance between two columns.
covariancePopulation(sales, profit)
covariancePopulationIf
covariancePopulationIf(<value1> : boolean, <value2> : number, <value3> : number) => double
Gets the population covariance of two columns, based on criteria.
covariancePopulationIf(region == 'West', sales)
covarianceSample
covarianceSample(<value1> : number, <value2> : number) => double
Gets the sample covariance of two columns.
covarianceSample(sales, profit)
covarianceSampleIf
covarianceSampleIf(<value1> : boolean, <value2> : number, <value3> : number) => double
Gets the sample covariance of two columns, based on criteria.
covarianceSampleIf(region == 'West', sales, profit)
crc32
crc32(<value1> : any, ...) => long
Calculates the CRC32 hash of a set of columns of varying primitive data types when given a bit length, which can only be of values 0(256), 224, 256, 384, and 512. You can use it to calculate a fingerprint for a row.
crc32(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 3630253689L
cumeDist
cumeDist() => integer
The cumeDist function computes the position of a value relative to all values in the partition. The result is the number of rows preceding or equal to the current row in the ordering of the partition divided by the total number of rows in the window partition. Any tie values in the ordering evaluate to the same position.
cumeDist()
currentDate
currentDate([<value1> : string]) => date
Gets the current date when this job starts to run. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The local time zone of the data factory's data center/region is used as the default. Refer to Java's SimpleDateFormat class for available formats.
currentDate() == toDate('2250-12-31') -> falsecurrentDate('PST') == toDate('2250-12-31') -> falsecurrentDate('America/New_York') == toDate('2250-12-31') -> false
currentTimestamp
currentTimestamp() => timestamp
Gets the current time stamp when the job starts to run with a local time zone.
currentTimestamp() == toTimestamp('2250-12-31 12:12:12') -> false
currentUTC
currentUTC([<value1> : string]) => timestamp
Gets the current time stamp as UTC. If you want your current time to be interpreted in a different time zone than your cluster time zone, you can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. It defaults to the current time zone. Refer to Java's SimpleDateFormat class for available formats. To convert the UTC time to a different time zone, use fromUTC().
currentUTC() == toTimestamp('2050-12-12 19:18:12') -> falsecurrentUTC() != toTimestamp('2050-12-12 19:18:12') -> truefromUTC(currentUTC(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true
D
dayOfMonth
dayOfMonth(<value1> : datetime) => integer
Gets the day of the month when given a date.
dayOfMonth(toDate('2018-06-08')) -> 8
dayOfWeek
dayOfWeek(<value1> : datetime) => integer
Gets the day of the week when given a date. For example, 1 is Sunday, 2 is Monday, ... and 7 is Saturday.
dayOfWeek(toDate('2018-06-08')) -> 6
dayOfYear
dayOfYear(<value1> : datetime) => integer
Gets the day of the year when given a date.
dayOfYear(toDate('2016-04-09')) -> 100
days
days(<value1> : integer) => long
Duration in milliseconds for the number of days.
days(2) -> 172800000L
decode
decode(<Input> : any, <Charset> : string) => binary
Decodes the encoded input data into a string based on the given charset. You can use a second (optional) argument to specify which charset to use. Examples are US-ASCII, ISO-8859-1, UTF-8 (default), UTF-16BE, UTF-16LE, and UTF-16.
decode(array(toByte(97),toByte(98),toByte(99)), 'US-ASCII') -> abc
degrees
degrees(<value1> : number) => double
Converts radians to degrees.
degrees(3.141592653589793) -> 180
denseRank
denseRank() => integer
Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values don't produce gaps in the sequence. The denseRank function works even when data isn't sorted and looks for change in values.
denseRank()
distinct
distinct(<value1> : array) => array
Returns a distinct set of items from an array.
distinct([10, 20, 30, 10]) => [10, 20, 30]
divide
divide(<value1> : any, <value2> : any) => any
Divides pair of numbers. Same as the / operator.
divide(20, 10) -> 220 / 10 -> 2
dropLeft
dropLeft(<value1> : string, <value2> : integer) => string
Removes as many characters from the left of the string. If the drop requested exceeds the length of the string, an empty string is returned.
dropLeft('bojjus', 2) => 'jjus'dropLeft('cake', 10) => ''
dropRight
dropRight(<value1> : string, <value2> : integer) => string
Removes as many characters from the right of the string. If the drop requested exceeds the length of the string, an empty string is returned.
dropRight('bojjus', 2) => 'bojj'dropRight('cake', 10) => ''
E
encode
encode(<Input> : string, <Charset> : string) => binary
Encodes the input string data into binary based on a charset. A second (optional) argument can be used to specify which charset to use. Examples are US-ASCII, ISO-8859-1, UTF-8 (default), UTF-16BE, UTF-16LE, and UTF-16.
encode('abc', 'US-ASCII') -> array(toByte(97),toByte(98),toByte(99))
endsWith
endsWith(<string> : string, <substring to check> : string) => boolean
Checks if the string ends with the supplied string.
endsWith('dumbo', 'mbo') -> true
equals
equals(<value1> : any, <value2> : any) => boolean
Uses the comparison equals operator. Same as the == operator.
equals(12, 24) -> false12 == 24 -> false'bad' == 'bad' -> trueisNull('good' == toString(null)) -> trueisNull(null == null) -> true
equalsIgnoreCase
equalsIgnoreCase(<value1> : string, <value2> : string) => boolean
Uses the comparison equals operator ignoring case. Same as the <=> operator.
'abc'<=>'Abc' -> trueequalsIgnoreCase('abc', 'Abc') -> true
escape
escape(<string_to_escape> : string, <format> : string) => string
Escapes a string according to a format. Literal values for acceptable format are json, xml, ecmascript, html, and java.
except
except(<value1> : array, <value2> : array) => array
Returns a difference set of one array from another dropping duplicates.
except([10, 20, 30], [20, 40]) => [10, 30]
expr
expr(<expr> : string) => any
Results in an expression from a string, which is the same as writing this expression in a nonliteral form. You can use it to pass parameters as string representations.
expr('price * discount') => any
F
factorial
factorial(<value1> : number) => long
Calculates the factorial of a number.
factorial(5) -> 120
false
false() => boolean
Always returns a false value. Use the function syntax(false()) if a column is named false.
(10 + 20 > 30) -> false(10 + 20 > 30) -> false()
filter
filter(<value1> : array, <value2> : unaryfunction) => array
Filters elements out of the array that don't meet the provided predicate. Filter expects a reference to one element in the predicate function as #item.
filter([1, 2, 3, 4], #item > 2) -> [3, 4]filter(['a', 'b', 'c', 'd'], #item == 'a' || #item == 'b') -> ['a', 'b']
find
find(<value1> : array, <value2> : unaryfunction) => any
Find the first item from an array that matches the condition. It takes a filter function where you can address the item in the array as #item. For deeply nested maps, you can refer to the parent maps by using the #item_n (#item_1, #item_2...) notation.
find([10, 20, 30], #item > 10) -> 20find(['azure', 'data', 'factory'], length(#item) > 4) -> 'azure'find([ @( name = 'Daniel', types = [ @(mood = 'jovial', behavior = 'terrific'), @(mood = 'grumpy', behavior = 'bad') ] ), @( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] ) ], contains(#item.types, #item.mood=='happy') /*Filter out the happy kid*/ )@( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] )
first
first(<value1> : any, [<value2> : boolean]) => any
Gets the first value of a column group. If the second parameter ignoreNulls is omitted, Data Factory assumes false.
first(sales)first(sales, false)
flatten
flatten(<array> : array, <value2> : array ..., <value2> : boolean) => array
Flattens an array or arrays into a single array. Arrays of atomic items are returned unaltered. The last argument is optional and is defaulted to false to flatten recursively more than one level deep.
flatten([['bojjus', 'girl'], ['gunchus', 'boy']]) => ['bojjus', 'girl', 'gunchus', 'boy']flatten([[['bojjus', 'gunchus']]] , true) => ['bojjus', 'gunchus']
floor
floor(<value1> : number) => number
Returns the largest integer not greater than the number.
floor(-0.1) -> -1
fromBase64
fromBase64(<value1> : string, <encoding type> : string) => string
Decodes the specific base64-encoded string. You can optionally pass the encoding type.
fromBase64('Z3VuY2h1cw==') -> 'gunchus'fromBase64('SGVsbG8gV29ybGQ=', 'Windows-1252') -> 'Hello World'
fromUTC
fromUTC(<value1> : timestamp, [<value2> : string]) => timestamp
Converts to the time stamp from UTC. You can optionally pass the time zone in the form of GMT, PST, UTC, or America/Cayman. It defaults to the current time zone. Refer to Java's SimpleDateFormat class for available formats.
fromUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> falsefromUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true
G
greater
greater(<value1> : any, <value2> : any) => boolean
Uses the comparison greater operator. Same as > operator.
greater(12, 24) -> false('dumbo' > 'dum') -> true(toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS') > toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true
greaterOrEqual
greaterOrEqual(<value1> : any, <value2> : any) => boolean
Uses the comparison greater than or equal to operator. Same as >= operator.
greaterOrEqual(12, 12) -> true('dumbo' >= 'dum') -> true
greatest
greatest(<value1> : any, ...) => any
Returns the greatest value among the list of values as input, skipping null values. Returns null if all inputs are null.
greatest(10, 30, 15, 20) -> 30greatest(10, toInteger(null), 20) -> 20greatest(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2011-12-12')greatest(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS'), toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')
H
hasColumn
hasColumn(<column name> : string, [<stream name> : string]) => boolean
Checks for a column value by name in the stream. You can pass an optional stream name as the second argument. Address column names known at design time by their names. Computed inputs aren't supported, but you can use parameter substitutions.
hasColumn('parent')
hasError
hasError([<value1> : string]) => boolean
Checks if the asset with a provided ID is marked as an error.
Examples
hasError('assert1')hasError('assert2')
hasPath
hasPath(<value1> : string, [<streamName> : string]) => boolean
Checks if a certain hierarchical path exists by name in the stream. You can pass an optional stream name as the second argument. Column names/paths known at design time should be addressed only by using their name or dot notation path. Computed inputs aren't supported, but you can use parameter substitutions.
hasPath('grandpa.parent.child') => boolean
hex
hex(<value1>: binary) => string
Returns a hex string representation of a binary value
hex(toBinary([toByte(0x1f), toByte(0xad), toByte(0xbe)])) -> '1fadbe'
hour
hour(<value1> : timestamp, [<value2> : string]) => integer
Gets the hour value of a time stamp. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The local time zone is used as the default. Refer to Java's SimpleDateFormat class for available formats.
hour(toTimestamp('2009-07-30 12:58:59')) -> 12hour(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 12
hours
hours(<value1> : integer) => long
Gets the duration in milliseconds for the number of hours.
hours(2) -> 7200000L
I
iif
iif(<condition> : boolean, <true_expression> : any, [<false_expression> : any]) => any
Applies one value or the other based on a condition. If the other is unspecified, the value is considered NULL. Both the values must be compatible (such as numeric and string).
iif(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'iif(10 > 30, 'dumbo', 'gumbo') -> 'gumbo'iif(month(toDate('2018-12-01')) == 12, 345.12, 102.67) -> 345.12
iifNull
iifNull(<value1> : any, [<value2> : any], ...) => any
Returns the first not null item when given two or more inputs. This function is equivalent to the coalesce function.
iifNull(10, 20) -> 10iifNull(null, 20, 40) -> 20iifNull('azure', 'data', 'factory') -> 'azure'iifNull(null, 'data', 'factory') -> 'data'
in
in(<array of items> : array, <item to find> : any) => boolean
Checks if an item is in the array.
in([10, 20, 30], 10) -> truein(['good', 'kid'], 'bad') -> false
initCap
initCap(<value1> : string) => string
Converts the first letter of every word to uppercase. Words are identified as separated by whitespace.
initCap('cool iceCREAM') -> 'Cool Icecream'
instr
instr(<string> : string, <substring to find> : string) => integer
Finds the position (1 based) of the substring within a string. If not found, 0 is returned.
instr('dumbo', 'mbo') -> 3instr('microsoft', 'o') -> 5instr('good', 'bad') -> 0
intersect
intersect(<value1> : array, <value2> : array) => array
Returns an intersection set of distinct items from two arrays.
intersect([10, 20, 30], [20, 40]) => [20]
isBitSet
isBitSet (<value1> : array, <value2>:integer ) => boolean
Checks if a bit position is set in this bitset.
isBitSet(toBitSet([10, 32, 98]), 10) => true
isBoolean
isBoolean(<value1>: string) => boolean
Checks if the string value is a Boolean value according to the rules of toBoolean().
isBoolean('true') -> trueisBoolean('no') -> trueisBoolean('microsoft') -> false
isByte
isByte(<value1> : string) => boolean
Checks if the string value is a byte value when given an optional format according to the rules of toByte().
isByte('123') -> trueisByte('chocolate') -> false
isDate
isDate (<value1> : string, [<format>: string]) => boolean
Checks if the input date string is a date by using an optional input date format. Refer to Java's SimpleDateFormat class for available formats. If the input date format is omitted, the default format is yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].
isDate('2012-8-18') -> trueisDate('12/18--234234' -> 'MM/dd/yyyy') -> false
isDecimal
isDecimal (<value1> : string) => boolean
Checks if the string value is a decimal value when given an optional format according to the rules of toDecimal().
isDecimal('123.45') -> trueisDecimal('12/12/2000') -> false
isDelete
isDelete([<value1> : integer]) => boolean
Checks if the row should be deleted. For transformations that take more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2, and the default value is 1.
isDelete()isDelete(1)
isDistinct
isDistinct(<value1> : any , <value1> : any) => boolean
Finds if a column or set of columns is distinct. It doesn't count null as a distinct value.
isDistinct(custId, custName) => boolean
isDouble
isDouble (<value1> : string, [<format>: string]) => boolean
Checks if the string value is a double value when given an optional format according to the rules of toDouble().
isDouble('123') -> trueisDouble('$123.45' -> '$###.00') -> trueisDouble('icecream') -> false
isError
isError([<value1> : integer]) => boolean
Checks if the row is marked as an error. For transformations that take more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isError()isError(1)
isFloat
isFloat (<value1> : string, [<format>: string]) => boolean
Checks if the string value is a float value when given an optional format according to the rules of toFloat().
isFloat('123') -> trueisFloat('$123.45' -> '$###.00') -> trueisFloat('icecream') -> false
isIgnore
isIgnore([<value1> : integer]) => boolean
Checks if the row should be ignored. For transformations taking more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2, and the default value is 1.
isIgnore()isIgnore(1)
isInsert
isInsert([<value1> : integer]) => boolean
Checks if the row is marked for insert. For transformations taking more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2, and the default value is 1.
isInsert()isInsert(1)
isInteger
isInteger (<value1> : string, [<format>: string]) => boolean
Checks if the string value is an integer value when given an optional format according to the rules of toInteger().
isInteger('123') -> trueisInteger('$123' -> '$###') -> trueisInteger('microsoft') -> false
isLong
isLong (<value1> : string, [<format>: string]) => boolean
Checks if the string value is a long value when given an optional format according to the rules of toLong().
isLong('123') -> trueisLong('$123' -> '$###') -> trueisLong('gunchus') -> false
isMatch
isMatch([<value1> : integer]) => boolean
Checks if the row is matched at lookup. For transformations taking more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2, and the default value is 1.
isMatch()isMatch(1)
isNan
isNan (<value1> : integral) => boolean
Checks if the value isn't a number.
isNan(10.2) => false
isNull
isNull(<value1> : any) => boolean
Checks if the value is NULL.
isNull(NULL()) -> trueisNull('') -> false
isShort
isShort (<value1> : string, [<format>: string]) => boolean
Checks if the string value is a short value when given an optional format according to the rules of toShort().
isShort('123') -> trueisShort('$123' -> '$###') -> trueisShort('microsoft') -> false
isTimestamp
isTimestamp (<value1> : string, [<format>: string]) => boolean
Checks if the input date string is a time stamp by using an optional input time stamp format. Refer to Java's SimpleDateFormat class for available formats. If the time stamp is omitted, the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The timestamp function supports up to millisecond accuracy with a value of 999. Refer to Java's SimpleDateFormat class for available formats.
isTimestamp('2016-12-31 00:12:00') -> trueisTimestamp('2016-12-31T00:12:00' -> 'yyyy-MM-dd\\'T\\'HH:mm:ss' -> 'PST') -> trueisTimestamp('2012-8222.18') -> false
isUpdate
isUpdate([<value1> : integer]) => boolean
Checks if the row is marked for update. For transformations taking more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2, and the default value is 1.
isUpdate()isUpdate(1)
isUpsert
isUpsert([<value1> : integer]) => boolean
Checks if the row is marked for insert. For transformations taking more than one input stream, you can pass the (1-based) index of the stream. The stream index should be either 1 or 2, and the default value is 1.
isUpsert()isUpsert(1)
J
jaroWinkler
jaroWinkler(<value1> : string, <value2> : string) => double
Gets the JaroWinkler distance between two strings.
jaroWinkler('frog', 'frog') => 1.0
K
keyValues
keyValues(<value1> : array, <value2> : array) => map
Creates a map of key/values. The first parameter is an array of keys, and the second is the array of values. Both arrays should have equal length.
keyValues(['bojjus', 'appa'], ['gunchus', 'ammi']) => ['bojjus' -> 'gunchus', 'appa' -> 'ammi']
kurtosis
kurtosis(<value1> : number) => double
Gets the kurtosis of a column.
kurtosis(sales)
kurtosisIf
kurtosisIf(<value1> : boolean, <value2> : number) => double
Gets the kurtosis of a column, based on criteria.
kurtosisIf(region == 'West', sales)
L
lag
lag(<value> : any, [<number of rows to look before> : number], [<default value> : any]) => any
Gets the value of the first parameter evaluated n rows before the current row. The second parameter is the number of rows to look back, and the default value is 1. If there aren't as many rows, a value of null is returned unless a default value is specified.
lag(amount, 2)lag(amount, 2000, 100)
last
last(<value1> : any, [<value2> : boolean]) => any
Gets the last value of a column group. If the second parameter ignoreNulls is omitted, it's assumed false.
last(sales)last(sales, false)
lastDayOfMonth
lastDayOfMonth(<value1> : datetime) => date
Gets the last date of the month when given a date.
lastDayOfMonth(toDate('2009-01-12')) -> toDate('2009-01-31')
lead
lead(<value> : any, [<number of rows to look after> : number], [<default value> : any]) => any
Gets the value of the first parameter evaluated n rows after the current row. The second parameter is the number of rows to look forward, and the default value is 1. If there aren't as many rows, a value of null is returned unless a default value is specified.
lead(amount, 2)lead(amount, 2000, 100)
least
least(<value1> : any, ...) => any
Uses the comparison lesser than or equal to operator. Same as the <= operator.
least(10, 30, 15, 20) -> 10least(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2000-12-12')
left
left(<string to subset> : string, <number of characters> : integral) => string
Extracts a substring start at index 1 with the number of characters. Same as SUBSTRING(str, 1, n).
left('bojjus', 2) -> 'bo'left('bojjus', 20) -> 'bojjus'
length
length(<value1> : string) => integer
Returns the length of the string.
length('dumbo') -> 5
lesser
lesser(<value1> : any, <value2> : any) => boolean
Uses the comparison less operator. Same as the < operator.
lesser(12, 24) -> true('abcd' < 'abc') -> false(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') < toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true
lesserOrEqual
lesserOrEqual(<value1> : any, <value2> : any) => boolean
Uses the comparison lesser than or equal to operator. Same as the <= operator.
lesserOrEqual(12, 12) -> true('dumbo' <= 'dum') -> false
levenshtein
levenshtein(<from string> : string, <to string> : string) => integer
Gets the levenshtein distance between two strings.
levenshtein('boys', 'girls') -> 4
like
like(<string> : string, <pattern match> : string) => boolean
Uses a string pattern that's matched literally. The exceptions are the following special symbols: _ matches any one character in the input (similar to .* in posix regular expressions).
% matches zero or more characters in the input (similar to .* in posix regular expressions).
The escape character is ''. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It's invalid to escape any other character.
like('icecream', 'ice%') -> true
locate
locate(<substring to find> : string, <string> : string, [<from index - 1-based> : integral]) => integer
Finds the position (1 based) of the substring within a string starting at a certain position. If the position is omitted, it starts from the beginning of the string. If not found, 0 is returned.
locate('mbo', 'dumbo') -> 3locate('o', 'microsoft', 6) -> 7locate('bad', 'good') -> 0
log
log(<value1> : number, [<value2> : number]) => double
Calculates the log value. You can supply an optional base or else a Euler number if used.
log(100, 10) -> 2
log10
log10(<value1> : number) => double
Calculates the log value based on 10 base.
log10(100) -> 2
lookup
lookup(key, key2, ...) => complex[]
Looks up the first row from the cached sink by using the specified keys that match the keys from the cached sink.
cacheSink#lookup(movieId)
lower
lower(<value1> : string) => string
Lowercases a string.
lower('GunChus') -> 'gunchus'
lpad
lpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string
Left pads the string by the supplied padding until the string is of a certain length. If the string is equal to or greater than the length, the string is trimmed to the length.
lpad('dumbo', 10, '-') -> '-----dumbo'lpad('dumbo', 4, '-') -> 'dumb'
ltrim
ltrim(<string to trim> : string, [<trim characters> : string]) => string
Left trims a string of leading characters. If the second parameter is unspecified, it trims whitespace. Otherwise, it trims any character specified in the second parameter.
ltrim(' dumbo ') -> 'dumbo 'ltrim('!--!du!mbo!', '-!') -> 'du!mbo!'
M
map
map(<value1> : array, <value2> : unaryfunction) => any
Maps each element of the array to a new element by using the provided expression. The map function expects a reference to one element in the expression function as #item.
map([1, 2, 3, 4], #item + 2) -> [3, 4, 5, 6]map(['a', 'b', 'c', 'd'], #item + '_processed') -> ['a_processed', 'b_processed', 'c_processed', 'd_processed']
mapAssociation
mapAssociation(<value1> : map, <value2> : binaryFunction) => array
Transforms a map by associating the keys to new values. Returns an array. It takes a mapping function where you can address the item as #key and the current value as #value.
mapAssociation(['bojjus' -> 'gunchus', 'appa' -> 'ammi'], @(key = #key, value = #value)) => [@(key = 'bojjus', value = 'gunchus'), @(key = 'appa', value = 'ammi')]
mapIf
mapIf (<value1> : array, <value2> : binaryfunction, <value3>: binaryFunction) => any
Conditionally maps an array to another array of the same or smaller length. The values can be of any data type, including structTypes. It takes a mapping function where you can address the item in the array as #item and the current index as #index. For deeply nested maps, you can refer to the parent maps by using the ``#item_[n] (#item_1, #index_1`) notation.
mapIf([10, 20, 30], #item > 10, #item + 5) -> [25, 35]mapIf(['icecream', 'cake', 'soda'], length(#item) > 4, upper(#item)) -> ['ICECREAM', 'CAKE']
mapIndex
mapIndex(<value1> : array, <value2> : binaryfunction) => any
Maps each element of the array to a new element by using the provided expression. The map function expects a reference to one element in the expression function as #item and a reference to the element index as #index.
mapIndex([1, 2, 3, 4], #item + 2 + #index) -> [4, 6, 8, 10]
mapLoop
mapLoop(<value1> : integer, <value2> : unaryfunction) => any
Loops through from 1 to the length to create an array of that length. It takes a mapping function where you can address the index in the array as #index. For deeply nested maps, you can refer to the parent maps by using the #index_n (#index_1, #index_2) notation.
mapLoop(3, #index * 10) -> [10, 20, 30]
max
max(<value1> : any) => any
Gets the maximum value of a column.
max(sales)
maxIf
maxIf(<value1> : boolean, <value2> : any) => any
Gets the maximum value of a column, based on criteria.
maxIf(region == 'West', sales)
md5
md5(<value1> : any, ...) => string
Calculates the MD5 digest of a set of columns of varying primitive data types and returns a 32-character hex string. You can use it to calculate a fingerprint for a row.
md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '4ce8a880bd621a1ffad0bca905e1bc5a'
mean
mean(<value1> : number) => number
Gets the mean of values of a column. Same as AVG.
mean(sales)
meanIf
meanIf(<value1> : boolean, <value2> : number) => number
Gets the mean of values of a column, based on criteria. Same as avgIf.
meanIf(region == 'West', sales)
millisecond
millisecond(<value1> : timestamp, [<value2> : string]) => integer
Gets the millisecond value of a date. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The local time zone is used as the default. Refer to Java's SimpleDateFormat class for available formats.
millisecond(toTimestamp('2009-07-30 12:58:59.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871
milliseconds
milliseconds(<value1> : integer) => long
Gets the duration in milliseconds for the number of milliseconds.
milliseconds(2) -> 2L
min
min(<value1> : any) => any
Gets the minimum value of a column.
min(sales)
minIf
minIf(<value1> : boolean, <value2> : any) => any
Gets the minimum value of a column, based on criteria.
minIf(region == 'West', sales)
minus
minus(<value1> : any, <value2> : any) => any
Subtracts numbers. Subtracts the number of days from a date. Subtracts duration from a time stamp. Subtracts two time stamps to get the difference in milliseconds. Same as the - operator.
minus(20, 10) -> 1020 - 10 -> 10minus(toDate('2012-12-15'), 3) -> toDate('2012-12-12')toDate('2012-12-15') - 3 -> toDate('2012-12-12')toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')toTimestamp('2019-02-03 05:21:34.851', 'yyyy-MM-dd HH:mm:ss.SSS') - toTimestamp('2019-02-03 05:21:36.923', 'yyyy-MM-dd HH:mm:ss.SSS') -> -2072
minute
minute(<value1> : timestamp, [<value2> : string]) => integer
Gets the minute value of a time stamp. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The local time zone is used as the default. Refer to Java's SimpleDateFormat class for available formats.
minute(toTimestamp('2009-07-30 12:58:59')) -> 58minute(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 58
minutes
minutes(<value1> : integer) => long
Gets the duration in milliseconds for the number of minutes.
minutes(2) -> 120000L
mlookup
mlookup(key, key2, ...) => complex[]
Looks up all the matching rows from the cached sink by using the specified keys that match the keys from the cached sink.
cacheSink#mlookup(movieId)
mod
mod(<value1> : any, <value2> : any) => any
Gets the modulus of a pair of numbers. Same as the % operator.
mod(20, 8) -> 420 % 8 -> 4
month
month(<value1> : datetime) => integer
Gets the month value of a date or time stamp.
month(toDate('2012-8-8')) -> 8
monthsBetween
monthsBetween(<from date/timestamp> : datetime, <to date/timestamp> : datetime, [<roundoff> : boolean], [<time zone> : string]) => double
Gets the number of months between two dates. You can round off the calculation. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The local time zone is used as the default. Refer to Java's SimpleDateFormat class for available formats.
monthsBetween(toTimestamp('1997-02-28 10:30:00'), toDate('1996-10-30')) -> 3.94959677
multiply
multiply(<value1> : any, <value2> : any) => any
Multiplies a pair of numbers. Same as the * operator.
multiply(20, 10) -> 20020 * 10 -> 200
N
negate
negate(<value1> : number) => number
Negates a number. Turns positive numbers to negative and vice versa.
negate(13) -> -13
nextSequence
nextSequence() => long
Returns the next unique sequence. The number is consecutive only within a partition and is prefixed by the partition ID.
nextSequence() == 12313112 -> false
normalize
normalize(<String to normalize> : string) => string
Normalizes the string value to separate accented Unicode characters.
regexReplace(normalize('bo²s'), `\p{M}`, '') -> 'boys'
not
not(<value1> : boolean) => boolean
Uses the logical negation operator.
not(true) -> falsenot(10 == 20) -> true
notEquals
notEquals(<value1> : any, <value2> : any) => boolean
Uses the comparison not equals operator. Same as the != operator.
12 != 24 -> true'bojjus' != 'bo' + 'jjus' -> false
nTile
nTile([<value1> : integer]) => integer
The nTile function divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values differ by, at most, 1. If the number of rows in the partition doesn't divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket. The nTile function is useful for the calculation of tertiles, quartiles, deciles, and other common summary statistics. The function calculates two variables during initialization. The size of a regular bucket has one extra row added to it. Both variables are based on the size of the current partition. During the calculation process, the function keeps track of the current row number, the current bucket number, and the row number at which the bucket changes (bucketThreshold). When the current row number reaches the bucket threshold, the bucket value increases by one. The threshold increases by the bucket size (plus one extra if the current bucket is padded).
nTile()nTile(numOfBuckets)
null
null() => null
Returns a NULL value. Use the function syntax(null()) if a column is named null. Any operation that uses null results in a NULL value.
isNull('dumbo' + `null`) -> trueisNull(10 * `null`) -> trueisNull('') -> falseisNull(10 + 20) -> falseisNull(10/0) -> true
O
or
or(<value1> : boolean, <value2> : boolean) => boolean
Uses the logical OR operator. Same as ||.
or(true, false) -> truetrue || false -> true
originColumns
originColumns(<streamName> : string) => any
Gets all output columns for an origin stream where columns were created. Must be enclosed in another function.
array(toString(originColumns('source1')))
output
output() => any
Returns the first row of the results of the cache sink.
cacheSink#output()
outputs
output() => any
Returns the entire output row set of the results of the cache sink.
cacheSink#outputs()
P
partitionId
partitionId() => integer
Returns the current partition ID that the input row is in.
partitionId()
pMod
pMod(<value1> : any, <value2> : any) => any
Gives the positive modulus of a pair of numbers.
pmod(-20, 8) -> 4
power
power(<value1> : number, <value2> : number) => double
Raises one number to the power of another.
power(10, 2) -> 100
R
radians
radians(<value1> : number) => double
Converts degrees to radians.
radians(180) => 3.141592653589793
random
random(<value1> : integral) => double
Returns a random number when given a seed within a partition. The seed should be a fixed value and is used with the partition ID to produce random values in a range (0.0-1.0).
random(1) == 1 -> false
rank
rank() => integer
Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values produce gaps in the sequence. The rank function works even when data isn't sorted and looks for change in values.
rank()
reassociate
reassociate(<value1> : map, <value2> : binaryFunction) => map
Transforms a map by associating the keys to new values. It takes a mapping function where you can address the item as #key and the current value as #value.
reassociate(['fruit' -> 'apple', 'vegetable' -> 'tomato'], substring(#key, 1, 1) + substring(#value, 1, 1)) => ['fruit' -> 'fa', 'vegetable' -> 'vt']
reduce
reduce(<value1> : array, <value2> : any, <value3> : binaryfunction, <value4> : unaryfunction) => any
Accumulates elements in an array. The reduce function expects a reference to an accumulator and one element in the first expression function as #acc and #item. It expects the resulting value as #result to be used in the second expression function.
toString(reduce(['1', '2', '3', '4'], '0', #acc + #item, #result)) -> '01234'
regexExtract
regexExtract(<string> : string, <regex to find> : string, [<match group 1-based index> : integral]) => string
Extracts a matching substring for a given regex pattern. The last parameter identifies the match group and defaults to 1 if omitted. Use <regex> to match a string without escaping. Index 0 returns all matches. Without match groups, index 1 and above don't return any result.
regexExtract('Cost is between 600 and 800 dollars', '(\\d+) and (\\d+)', 2) -> '800'regexExtract('Cost is between 600 and 800 dollars', `(\d+) and (\d+)`, 2) -> '800'
regexMatch
regexMatch(<string> : string, <regex to match> : string) => boolean
Checks if the string matches the given regex pattern. Use <regex> to match a string without escaping.
regexMatch('200.50', '(\\d+).(\\d+)') -> trueregexMatch('200.50', `(\d+).(\d+)`) -> true
regexReplace
regexReplace(<string> : string, <regex to find> : string, <substring to replace> : string) => string
Replaces all occurrences of a regex pattern with another substring in the specific string. Use <regex> to match a string without escaping.
regexReplace('100 and 200', '(\\d+)', 'bojjus') -> 'bojjus and bojjus'regexReplace('100 and 200', `(\d+)`, 'gunchus') -> 'gunchus and gunchus'
regexSplit
regexSplit(<string to split> : string, <regex expression> : string) => array
Splits a string based on a delimiter based on regex and returns an array of strings.
regexSplit('bojjusAgunchusBdumbo', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo']regexSplit('bojjusAgunchusBdumboC', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo', ''](regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[1]) -> 'bojjus'isNull(regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[20]) -> true
replace
replace(<string> : string, <substring to find> : string, [<substring to replace> : string]) => string
Replaces all occurrences of a substring with another substring in the specific string. If the last parameter is omitted, it defaults to an empty string.
replace('doggie dog', 'dog', 'cat') -> 'catgie cat'replace('doggie dog', 'dog', '') -> 'gie 'replace('doggie dog', 'dog') -> 'gie '
reverse
reverse(<value1> : string) => string
Reverses a string.
reverse('gunchus') -> 'suhcnug'
right
right(<string to subset> : string, <number of characters> : integral) => string
Extracts a substring with a number of characters from the right. Same as SUBSTRING(str, LENGTH(str) - n, n).
right('bojjus', 2) -> 'us'right('bojjus', 20) -> 'bojjus'
rlike
rlike(<string> : string, <pattern match> : string) => boolean
Checks if the string matches the given regex pattern.
rlike('200.50', `(\d+).(\d+)`) -> truerlike('bogus', `M[0-9]+.*`) -> false
round
round(<number> : number, [<scale to round> : number], [<rounding option> : integral]) => double
Rounds a number when given an optional scale and an optional rounding mode. If the scale is omitted, it defaults to 0. If the mode is omitted, it defaults to ROUND_HALF_UP(5). The values for rounding include:
ROUND_UP: Rounding mode to round away from zero.ROUND_DOWN: Rounding mode to round toward zero.ROUND_CEILING: Rounding mode to round toward positive infinity. (Same asROUND_UPif input is positive. If negative, it behaves asROUND_DOWN. For example, -1.1 would be -1.0 withROUND_CEILINGand -2 withROUND_UP.)ROUND_FLOOR: Rounding mode to round toward negative infinity. (Same asROUND_DOWNif input is positive. If negative, it behaves asROUND_UP.)ROUND_HALF_UP: Rounding mode to round toward "nearest neighbor" unless both neighbors are equidistant, in which case it behaves asROUND_UP. (Most common + default for Dataflow.)ROUND_HALF_DOWN: Rounding mode to round toward "nearest neighbor" unless both neighbors are equidistant, in which caseROUND_DOWN.ROUND_HALF_EVEN: Rounding mode to round toward the "nearest neighbor" unless both neighbors are equidistant, in which case, round toward the even neighbor.ROUND_UNNECESSARY: Rounding mode to assert that the round operation has an exact result, so no rounding is necessary.round(100.123) -> 100.0round(2.5, 0) -> 3.0round(5.3999999999999995, 2, 7) -> 5.40
rowNumber
rowNumber() => integer
Assigns a sequential row numbering for rows in a window starting with 1.
rowNumber()
rpad
rpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string
Right pads the string by the supplied padding until the string is of a certain length. If the string is equal to or greater than the length, the string is trimmed to the length.
rpad('dumbo', 10, '-') -> 'dumbo-----'rpad('dumbo', 4, '-') -> 'dumb'rpad('dumbo', 8, '<>') -> 'dumbo<><'
rtrim
rtrim(<string to trim> : string, [<trim characters> : string]) => string
Right trims a string of trailing characters. If the second parameter is unspecified, it trims whitespace. Otherwise, it trims any character specified in the second parameter.
rtrim(' dumbo ') -> ' dumbo'rtrim('!--!du!mbo!', '-!') -> '!--!du!mbo'
S
second
second(<value1> : timestamp, [<value2> : string]) => integer
Gets the second value of a date. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The local time zone is used as the default. Refer to Java's SimpleDateFormat class for available formats.
second(toTimestamp('2009-07-30 12:58:59')) -> 59
seconds
seconds(<value1> : integer) => long
Gives the duration in milliseconds for the number of seconds.
seconds(2) -> 2000L
setBitSet
setBitSet (<value1>: array, <value2>:array) => array
Sets bit positions in this bitset.
setBitSet(toBitSet([10, 32]), [98]) => [4294968320L, 17179869184L]
sha1
sha1(<value1> : any, ...) => string
Calculates the SHA-1 digest of a set of columns of varying primitive data types and returns a 40-character hex string. You can use it to calculate a fingerprint for a row.
sha1(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '46d3b478e8ec4e1f3b453ac3d8e59d5854e282bb'
sha2
sha2(<value1> : integer, <value2> : any, ...) => string
Calculates the SHA-2 digest of a set of columns of varying primitive data types when given a bit length, which can only be of values 0(256), 224, 256, 384, and 512. You can use it to calculate a fingerprint for a row.
sha2(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 'afe8a553b1761c67d76f8c31ceef7f71b66a1ee6f4e6d3b5478bf68b47d06bd3'
sin
sin(<value1> : number) => double
Calculates a sine value.
sin(2) -> 0.9092974268256817
sinh
sinh(<value1> : number) => double
Calculates a hyperbolic sine value.
sinh(0) -> 0.0
size
size(<value1> : any) => integer
Finds the size of an array or map type.
size(['element1', 'element2']) -> 2size([1,2,3]) -> 3
skewness
skewness(<value1> : number) => double
Gets the skewness of a column.
skewness(sales)
skewnessIf
skewnessIf(<value1> : boolean, <value2> : number) => double
Gets the skewness of a column, based on criteria.
skewnessIf(region == 'West', sales)
slice
slice(<array to slice> : array, <from 1-based index> : integral, [<number of items> : integral]) => array
Extracts a subset of an array from a position. The position is 1 based. If the length is omitted, it defaults to the end of the string.
slice([10, 20, 30, 40], 1, 2) -> [10, 20]slice([10, 20, 30, 40], 2) -> [20, 30, 40]slice([10, 20, 30, 40], 2)[1] -> 20isNull(slice([10, 20, 30, 40], 2)[0]) -> trueisNull(slice([10, 20, 30, 40], 2)[20]) -> trueslice(['a', 'b', 'c', 'd'], 8) -> []
sort
sort(<value1> : array, <value2> : binaryfunction) => array
Sorts the array by using the provided predicate function. The sort function expects a reference to two consecutive elements in the expression function as #item1 and #item2.
sort([4, 8, 2, 3], compare(#item1, #item2)) -> [2, 3, 4, 8]sort(['a3', 'b2', 'c1'], iif(right(#item1, 1) >= right(#item2, 1), 1, -1)) -> ['c1', 'b2', 'a3']
soundex
soundex(<value1> : string) => string
Gets the soundex code for the string.
soundex('genius') -> 'G520'
split
split(<string to split> : string, <split characters> : string) => array
Splits a string based on a delimiter and returns an array of strings.
split('bojjus,guchus,dumbo', ',') -> ['bojjus', 'guchus', 'dumbo']split('bojjus,guchus,dumbo', '|') -> ['bojjus,guchus,dumbo']split('bojjus, guchus, dumbo', ', ') -> ['bojjus', 'guchus', 'dumbo']split('bojjus, guchus, dumbo', ', ')[1] -> 'bojjus'isNull(split('bojjus, guchus, dumbo', ', ')[0]) -> trueisNull(split('bojjus, guchus, dumbo', ', ')[20]) -> truesplit('bojjusguchusdumbo', ',') -> ['bojjusguchusdumbo']
sqrt
sqrt(<value1> : number) => double
Calculates the square root of a number.
sqrt(9) -> 3
startsWith
startsWith(<string> : string, <substring to check> : string) => boolean
Checks if the string starts with the supplied string.
startsWith('dumbo', 'du') -> true
stddev
stddev(<value1> : number) => double
Gets the standard deviation of a column.
stdDev(sales)
stddevIf
stddevIf(<value1> : boolean, <value2> : number) => double
Gets the standard deviation of a column, based on criteria.
stddevIf(region == 'West', sales)
stddevPopulation
stddevPopulation(<value1> : number) => double
Gets the population standard deviation of a column.
stddevPopulation(sales)
stddevPopulationIf
stddevPopulationIf(<value1> : boolean, <value2> : number) => double
Gets the population standard deviation of a column, based on criteria.
stddevPopulationIf(region == 'West', sales)
stddevSample
stddevSample(<value1> : number) => double
Gets the sample standard deviation of a column.
stddevSample(sales)
stddevSampleIf
stddevSampleIf(<value1> : boolean, <value2> : number) => double
Gets the sample standard deviation of a column, based on criteria.
stddevSampleIf(region == 'West', sales)
subDays
subDays(<date/timestamp> : datetime, <days to subtract> : integral) => datetime
Subtracts days from a date or time stamp. Same as the - operator for date.
subDays(toDate('2016-08-08'), 1) -> toDate('2016-08-07')
subMonths
subMonths(<date/timestamp> : datetime, <months to subtract> : integral) => datetime
Subtracts months from a date or time stamp.
subMonths(toDate('2016-09-30'), 1) -> toDate('2016-08-31')
substring
substring(<string to subset> : string, <from 1-based index> : integral, [<number of characters> : integral]) => string
Extracts a substring of a certain length from a position. The position is 1 based. If the length is omitted, it defaults to the end of the string.
substring('Cat in the hat', 5, 2) -> 'in'substring('Cat in the hat', 5, 100) -> 'in the hat'substring('Cat in the hat', 5) -> 'in the hat'substring('Cat in the hat', 100, 100) -> ''
substringIndex
substringIndex(<string to subset> : string, <delimiter> : string, <count of delimiter occurrences> : integral]) => string
Extracts the substring before count occurrences of the delimiter. If the count is positive, everything to the left of the final delimiter (counting from the left) is returned. If the count is negative, everything to the right of the final delimiter (counting from the right) is returned.
substringIndex('111-222-333', '-', 1) -> '111'substringIndex('111-222-333', '-', 2) -> '111-222'substringIndex('111-222-333', '-', -1) -> '333'substringIndex('111-222-333', '-', -2) -> '222-333'
sum
sum(<value1> : number) => number
Gets the aggregate sum of a numeric column.
sum(col)
sumDistinct
sumDistinct(<value1> : number) => number
Gets the aggregate sum of distinct values of a numeric column.
sumDistinct(col)
sumDistinctIf
sumDistinctIf(<value1> : boolean, <value2> : number) => number
Gets the aggregate sum of a numeric column, based on criteria. The condition can be based on any column.
sumDistinctIf(state == 'CA' && commission < 10000, sales)sumDistinctIf(true, sales)
sumIf
sumIf(<value1> : boolean, <value2> : number) => number
Gets the aggregate sum of a numeric column, based on criteria. The condition can be based on any column.
sumIf(state == 'CA' && commission < 10000, sales)sumIf(true, sales)
T
tan
tan(<value1> : number) => double
Calculates a tangent value.
tan(0) -> 0.0
tanh
tanh(<value1> : number) => double
Calculates a hyperbolic tangent value.
tanh(0) -> 0.0
toBase64
toBase64(<value1> : string, <encoding type> : string]) => string
Encodes the specific string in base64. You can optionally pass the encoding type.
toBase64('bojjus') -> 'Ym9qanVz'toBase64('± 25000, € 5.000,- |', 'Windows-1252') -> 'sSAyNTAwMCwggCA1LjAwMCwtIHw='
toBinary
toBinary(<value1> : any) => binary
Converts any numeric, date, time stamp, or string to binary representation.
toBinary(3) -> [0x11]
toBoolean
toBoolean(<value1> : string) => boolean
Converts a value of (t, true, y, yes, 1) to true and (f, false, n, no, 0) to false and NULL for any other value.
toBoolean('true') -> truetoBoolean('n') -> falseisNull(toBoolean('truthy')) -> true
toByte
toByte(<value> : any, [<format> : string], [<locale> : string]) => byte
Converts any numeric or string to a byte value. You can use an optional Java decimal format for the conversion.
toByte(123)123toByte(0xFF)-1toByte('123')123
toDate
toDate(<string> : any, [<date format> : string]) => date
Converts an input date string to date by using an optional input date format. Refer to Java's SimpleDateFormat class for available formats. If the input date format is omitted, the default format is yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].
toDate('2012-8-18') -> toDate('2012-08-18')toDate('12/18/2012', 'MM/dd/yyyy') -> toDate('2012-12-18')
toDecimal
toDecimal(<value> : any, [<precision> : integral], [<scale> : integral], [<format> : string], [<locale> : string]) => decimal(10,0)
Converts any numeric or string to a decimal value. If precision and scale aren't specified, it defaults to (10,2). You can use an optional Java decimal format for the conversion. Use an optional locale format in the form of a BCP47 language like en-US, de, or zh-CN.
toDecimal(123.45) -> 123.45toDecimal('123.45', 8, 4) -> 123.4500toDecimal('$123.45', 8, 4,'$###.00') -> 123.4500toDecimal('Ç123,45', 10, 2, 'Ç###,##', 'de') -> 123.45
toDouble
toDouble(<value> : any, [<format> : string], [<locale> : string]) => double
Converts any numeric or string to a double value. You can use an optional Java decimal format for the conversion. Use an optional locale format in the form of a BCP47 language like en-US, de, or zh-CN.
toDouble(123.45) -> 123.45toDouble('123.45') -> 123.45toDouble('$123.45', '$###.00') -> 123.45toDouble('Ç123,45', 'Ç###,##', 'de') -> 123.45
toFloat
toFloat(<value> : any, [<format> : string], [<locale> : string]) => float
Converts any numeric or string to a float value. You can use an optional Java decimal format for the conversion. Truncates any double.
toFloat(123.45) -> 123.45ftoFloat('123.45') -> 123.45ftoFloat('$123.45', '$###.00') -> 123.45f
toInteger
toInteger(<value> : any, [<format> : string], [<locale> : string]) => integer
Converts any numeric or string to an integer value. You can use an optional Java decimal format for the conversion. Truncates any long, float, double.
toInteger(123) -> 123toInteger('123') -> 123toInteger('$123', '$###') -> 123
toLong
toLong(<value> : any, [<format> : string], [<locale> : string]) => long
Converts any numeric or string to a long value. You can use an optional Java decimal format for the conversion. Truncates any float, double.
toLong(123) -> 123toLong('123') -> 123toLong('$123', '$###') -> 123
topN
topN(<column/expression> : any, <count> : long, <n> : integer) => array
Gets the top N values for this column based on the count argument.
topN(custId, count, 5)topN(productId, num_sales, 10)
toShort
toShort(<value> : any, [<format> : string], [<locale> : string]) => short
Converts any numeric or string to a short value. You can use an optional Java decimal format for the conversion. Truncates any integer, long, float, double.
toShort(123) -> 123toShort('123') -> 123toShort('$123', '$###') -> 123
toString
toString(<value> : any, [<number format/date format> : string], [<date locale> : string]) => string
Converts a primitive data type to a string. You can specify a format for numbers and date. If unspecified, the system default is picked. Java decimal format is used for numbers. Refer to Java's SimpleDateFormat class for available formats. The default format is yyyy-MM-dd. For a date or time stamp, you can optionally specify a locale.
toString(10) -> '10'toString('engineer') -> 'engineer'toString(123456.789, '##,###.##') -> '123,456.79'toString(123.78, '000000.000') -> '000123.780'toString(12345, '##0.#####E0') -> '12.345E3'toString(toDate('2018-12-31')) -> '2018-12-31'isNull(toString(toDate('2018-12-31', 'MM/dd/yy'))) -> truetoString(4 == 20) -> 'false'toString(toDate('12/31/18', 'MM/dd/yy', 'es-ES'), 'MM/dd/yy', 'de-DE')
toTimestamp
toTimestamp(<string> : any, [<timestamp format> : string], [<time zone> : string]) => timestamp
Converts a string to a time stamp when given an optional time-stamp format. If the time stamp is omitted, the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. The timestamp function supports up to millisecond accuracy with a value of 999. Refer to Java's SimpleDateFormat class for available formats.
toTimestamp('2016-12-31 00:12:00') -> toTimestamp('2016-12-31 00:12:00')toTimestamp('2016-12-31T00:12:00', 'yyyy-MM-dd\'T\'HH:mm:ss', 'PST') -> toTimestamp('2016-12-31 00:12:00')toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss') -> toTimestamp('2016-12-31 00:12:00')millisecond(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871
toUTC
toUTC(<value1> : timestamp, [<value2> : string]) => timestamp
Converts the time stamp to UTC. You can pass an optional time zone in the form of GMT, PST, UTC, or America/Cayman. It defaults to the current time zone. Refer to Java's SimpleDateFormat class for available formats.
toUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> falsetoUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true
translate
translate(<string to translate> : string, <lookup characters> : string, <replace characters> : string) => string
Replace one set of characters by another set of characters in the string. Characters have a one to one replacement.
translate('(bojjus)', '()', '[]') -> '[bojjus]'translate('(gunchus)', '()', '[') -> '[gunchus'
trim
trim(<string to trim> : string, [<trim characters> : string]) => string
Trims a string of leading and trailing characters. If the second parameter is unspecified, it trims whitespace. Otherwise, it trims any character specified in the second parameter.
trim(' dumbo ') -> 'dumbo'trim('!--!du!mbo!', '-!') -> 'dumbo'
true
true() => boolean
Always returns a true value. Use the function syntax(true()) if a column is named true.
(10 + 20 == 30) -> true(10 + 20 == 30) -> true()
typeMatch
typeMatch(<type> : string, <base type> : string) => boolean
Matches the type of the column. You can use it in pattern expressions only. The number function matches short, integer, long, double, float, or decimal. The integral function matches short, integer, long. The fractional function matches double, float, decimal. The datetime function matches the date or time stamp type.
typeMatch(type, 'number')typeMatch('date', 'datetime')
U
unescape
unescape(<string_to_escape> : string, <format> : string) => string
Unescapes a string according to a format. Literal values for acceptable formats are json, xml, ecmascript, html, and java.
unescape('{\\\\\"value\\\\\": 10}', 'json')'{\\\"value\\\": 10}'
unfold
unfold (<value1>: array) => any
Unfolds an array into a set of rows and repeats the values for the remaining columns in every row.
unfold(addresses) => anyunfold( @(name = salesPerson, sales = salesAmount) ) => any
unhex
unhex(<value1>: string) => binary
Unhexes a binary value from its string representation. You can use it with sha2, md5 to convert from string to binary representation.
unhex('1fadbe') -> toBinary([toByte(0x1f), toByte(0xad), toByte(0xbe)])unhex(md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))) -> toBinary([toByte(0x4c),toByte(0xe8),toByte(0xa8),toByte(0x80),toByte(0xbd),toByte(0x62),toByte(0x1a),toByte(0x1f),toByte(0xfa),toByte(0xd0),toByte(0xbc),toByte(0xa9),toByte(0x05),toByte(0xe1),toByte(0xbc),toByte(0x5a)])
union
union(<value1>: array, <value2> : array) => array
Returns a union set of distinct items from two arrays.
union([10, 20, 30], [20, 40]) => [10, 20, 30, 40]
upper
upper(<value1> : string) => string
Uppercases a string.
upper('bojjus') -> 'BOJJUS'
uuid
uuid() => string
Returns the generated UUID.
uuid()
V
variance
variance(<value1> : number) => double
Gets the variance of a column.
variance(sales)
varianceIf
varianceIf(<value1> : boolean, <value2> : number) => double
Gets the variance of a column, based on criteria.
varianceIf(region == 'West', sales)
variancePopulation
variancePopulation(<value1> : number) => double
Gets the population variance of a column.
variancePopulation(sales)
variancePopulationIf
variancePopulationIf(<value1> : boolean, <value2> : number) => double
Gets the population variance of a column, based on criteria.
variancePopulationIf(region == 'West', sales)
varianceSample
varianceSample(<value1> : number) => double
Gets the unbiased variance of a column.
varianceSample(sales)
varianceSampleIf
varianceSampleIf(<value1> : boolean, <value2> : number) => double
Gets the unbiased variance of a column, based on criteria.
varianceSampleIf(region == 'West', sales)
W
weekOfYear
weekOfYear(<value1> : datetime) => integer
Gets the week of the year when given a date.
weekOfYear(toDate('2008-02-20')) -> 8
weeks
weeks(<value1> : integer) => long
Gets the duration in milliseconds for the number of weeks.
weeks(2) -> 1209600000L
X
xor
xor(<value1> : boolean, <value2> : boolean) => boolean
Uses the logical XOR operator. Same as the ^ operator.
xor(true, false) -> truexor(true, true) -> falsetrue ^ false -> true
Y
year
year(<value1> : datetime) => integer
Gets the year value of a date.
year(toDate('2012-8-8')) -> 2012
Related content
- List of all aggregate functions.
- List of all array functions.
- List of all cached lookup functions.
- List of all conversion functions.
- List of all date and time functions.
- List of all expression functions.
- List of all map functions.
- List of all metafunctions.
- List of all window functions.
- Learn how to use Expression Builder.