Data Transformations
Our data pipeline product provides two ways to create data transformation rules that clean retrieved data into the preferred format: Formulas or JavaScript functions
Formulas
You can perform various data transformations and manipulations by using our formulas. Starting from creating if conditions =IF()
, replacing data =REPLACE()
, concatenating data =CONCATENATE()
to using our custom formulas to tell the system to remove specific rows REMOVE()
. Our product offers various formulas. Some of them you might know from Microsoft Excel or Google Sheets.
Here are a few examples of formulas that you can use inside our pipeline product:
=CONCATENATE(firstName;" "; lastName)
=REPLACE(phoneNumber; 1; 0; "+49")
=IF(ERROR();REMOVE())
=IF(EMPTY(organisation);REMOVE())
=IF(OR(amount > 10; amount < 2); amount; 20)
=TEXT(closingDate;"dd.mm.yyyy")
These are just a few examples of the many formulas available in our tool. Using these, you can efficiently perform complex data manipulations and transformations.
Date & time
Function ID | Description | Syntax |
---|---|---|
DATE | Returns the specified date as the number of full days since today | DATE(Year; Month; Day) |
DATEDIF | Calculates distance between two dates, in provided unit parameter (“D”, “M” or “Y”). | DATEDIF(Date1; Date2; Units) |
DATEVALUE | Parses a date string and returns it as the number of full days since today | DATEVALUE(Datestring) |
DAY | Returns the day of the given date value. | DAY(Number) |
DAYS | Calculates the difference between two date values. | DAYS(Date2; Date1) |
HOUR | Returns hour component of given time. | HOUR(Time) |
ISOWEEKNUM | Returns an ISO week number that corresponds to the week of year. | ISOWEEKNUM(Date) |
MINUTE | Returns minute component of given time. | MINUTE(Time) |
MONTH | Returns the month for the given date value. | MONTH(Number) |
NOW | Returns current date + time as a number of days since today. | NOW() |
TODAY | Returns an integer representing the current date as the number of full days since today. | TODAY() |
Logical
Function ID | Description | Syntax |
---|---|---|
AND | Returns true if all arguments are true. | AND(Logicalvalue1; Logicalvalue2; ...; Logicalvalue30) |
IF | Specifies a logical test to be performed. If the test is true the second argument is returned. If it is false, the third argument is returned. The third argument is optional. | IF(Test; Then value; Otherwisevalue) |
ERROR | Returns true if the output value contains an error. | ERROR() |
NOT | Complements (inverts) a logical value. | NOT(Logicalvalue) |
SWITCH | Evaluates a list of arguments, consisting of an expression followed by a value. | SWITCH(Expression1; Value1[; Expression2; Value2[...; Expression_n; Value_n]]) |
OR | Returns true if at least one argument is true. | OR(Logicalvalue1; Logicalvalue2; ...; Logicalvalue30) |
XOR | Returns true if an odd number of arguments evaluates to true. | XOR(Logicalvalue1; Logicalvalue2; ...; Logicalvalue30) |
Math & trigonometry
Function ID | Description | Syntax |
---|---|---|
ABS | Returns the absolute value of a number. | ABS(Number) |
CEILING | Rounds a number up to the nearest multiple of Significance. | CEILING(Number; Significance) |
CEILING.PRECISE | Rounds a number up to the nearest multiple of Significance. | CEILING.PRECISE(Number[; Significance]) |
COMBIN | Returns number of combinations (without repetitions). | COMBIN(Number; Number) |
COMBINA | Returns number of combinations (with repetitions). | COMBINA(Number; Number) |
DECIMAL | Converts text with characters from a number system to a positive integer in the base radix given. | DECIMAL("Text"; Radix) |
EVEN | Rounds a positive number up to the next even integer and a negative number down to the next even integer. | EVEN(Number) |
EXP | Returns constant e raised to the power of a number. | EXP(Number) |
FLOOR | Rounds a number down to the nearest multiple of Significance. | FLOOR(Number; Significance) |
INT | Rounds a number down to the nearest integer. | INT(Number) |
MOD | Returns the remainder when one integer is divided by another. | MOD(Dividend; Divisor) |
POWER | Returns a number raised to another number. | POWER(Base; Exponent) |
ROUND | Rounds a number to a certain number of decimal places. | ROUND(Number; Count) |
ROUNDDOWN | Rounds a number down, toward zero, to a certain precision. | ROUNDDOWN(Number; Count) |
ROUNDUP | Rounds a number up, away from zero, to a certain precision. | ROUNDUP(Number; Count) |
SQRT | Returns the positive square root of a number. | SQRT(Number) |
SQRTPI | Returns sqrt of number times pi. | SQRTPI(Number) |
SUBTOTAL | Computes aggregation using function specified by number. | SUBTOTAL(Function; Number1; Number2; ...; Number30) |
SUM | Sums up the values of the specified cells. | SUM(Number1; Number2; ...; Number30) |
Operator
Function ID | Description | Syntax |
---|---|---|
ADD | Adds two values. | ADD(Number; Number) |
CONCAT | Concatenates two strings. | CONCAT(String; String) |
DIVIDE | Divides two values. | DIVIDE(Number; Number) |
EQ | Tests two values for equality. | EQ(Value; Value) |
LTE | Tests two values for less-equal relation. | LEQ(Value; Value) |
LT | Tests two values for less-than relation. | LT(Value; Value) |
GTE | Tests two values for greater-equal relation. | GEQ(Value; Value) |
GT | Tests two values for greater-than relation. | GT(Value; Value) |
MINUS | Subtracts two values. | MINUS(Number; Number) |
MULTIPLY | Multiplies two values. | MULTIPLY(Number; Number) |
UNARY_PERCENT | Applies percent operator. | UNARY_PERCENT(Number) |
Text
Function ID | Description | Syntax |
---|---|---|
CHAR | Converts a number into a character according to the current code table. | CHAR(Number) |
CLEAN | Returns text that has been "cleaned" of line breaks and other non-printable characters. | CLEAN("Text") |
CONCATENATE | Combines several text strings into one string. | CONCATENATE("Text1"; ...; "Text30") |
EXACT | Returns true if both text strings are exactly the same. | EXACT("Text"; "Text") |
LEFT | Extracts a given number of characters from the left side of a text string. | LEFT("Text"; Number) |
LEN | Returns length of a given text. | LEN("Text") |
LOWER | Returns text converted to lowercase. | LOWER("Text") |
MID | Returns substring of a given length starting from Start_position. | MID("Text"; Start_position; Length) |
PROPER | Capitalizes words given text string. | PROPER("Text") |
REPLACE | Replaces substring of a text of a given length that starts at given position. | REPLACE("Text"; Start_position; Length; "New_text") |
REPT | Repeats text a given number of times. | REPT("Text"; Number) |
SPLIT | Divides the provided text using the space character as a separator and returns the substring at the zero-based position specified by the second argument. SPLIT("Lorem ipsum"; 0) -> "Lorem" – SPLIT("Lorem ipsum"; 1) -> "ipsum" | SPLIT("Text"; Index) |
TEXT | Converts a number into text according to a given format. | TEXT(Number; Format) |
TRIM | Strips extra spaces from text. | TRIM("Text") |
UPPER | Returns text converted to uppercase. | UPPER("Text") |
Custom formulas
Function ID | Description | Syntax |
---|---|---|
REMOVE | If called on a row, the row will be removed/ignored for the output. | REMOVE() |
EMPTY | Returns true if the given input value is empty. | EMPTY(column) |
ERROR | Returns true if the output value contains an error. | ERROR() |
VALUEAT | Returns the value of the given column at the given index | VALUEAT(column; index) |
Best Practices
Mathematical Operators
As you can see above, you can use all the mathematical operators. But instead of using ADD()
, MINUS()
, DIVIDE()
and MULTIPLY()
we recommend you to use +
, -
, /
and *
. This saves you time when creating formulas.
For example, you would create a formula for a multiplication like =amount*costs
instead of =MULTIPLY(amount; costs)
.
Delimiter (";" instead of ",")
If you are used to Excel, you probably use "," as the delimiter for formulas with more than one argument. When you work inside our data pipeline tool, you must adjust because our syntax prefers to use ";" as a delimiter, similar to Google Sheets.
For example, you would create a formula for an if-condition like =IF(amount > 1000; REMOVE())
instead of =IF(amount > 1000, REMOVE())
.
Custom formula: REMOVE()
With the custom formula REMOVE()
, you can tell our system when to ignore/discard a row for the pipeline output. Every row marked as "Removed" will not be exported or arrive at the defined output target. TIP: We recommend using REMOVE()
with if-conditions to remove the unwanted rows selectively.
Custom formula: EMPTY()
With the custom formula EMPTY()
, you can check whether an input value is empty. If the value equals empty, the formula returns true. If not, the formula returns false. TIP: EMPTY()
can be combined with IF()
and REMOVE()
to selectively remove the rows whose input is empty (=IF(EMPTY(column); REMOVE())
).
Custom formula: ERROR()
With the custom formula ERROR()
, you can check whether a row's output throws an error regarding the validations. If the output throws an error, the formula returns true. If not, the formula returns false. TIP: ERROR()
can be combined with IF()
and REMOVE()
to selectively remove the rows whose output throws an error (=IF(ERROR(); REMOVE())
).
Compare operators
If you want to compare specfic values, you can use the following operators: =
, >
, <
, >=
or <=
For example, if you want to check if the amount is greater than 1000, the formula would be: =IF(amount > 1000; Then value; Otherwisevalue)
Javascript
JavaScript offers a wide range of functionality, including mathematical operations, string manipulation, conditional statements, loops, and more. By using JavaScript in your application, users can perform various data transformations, mappings, validations, and cleanings.
Here are a few examples of common JavaScript functions that your users might find useful:
Mathematical Operations
Addition: var x = 5 + 3;
Subtraction: var x = 5 - 3;
Multiplication: var x = 5 * 3;
Division: var x = 5 / 3;
Modulus: var x = 5 % 3;
String Manipulation
Concatenation: var name = "John" + " " + "Doe";
Length: var length = name.length;
Substring: var sub = name.substring(0, 4);
Replace: var replaced = name.replace("John", "Jane");
Conditional Statements
var age = 25;
if (age >= 18) {
return "You are an adult";
} else {
return "You are not an adult";
}
Loops
for (var i = 0; i < 5; i++) {
return i;
}
var numbers = [1, 2, 3, 4, 5];
numbers.forEach(function(number) {
return number;
});
Async
async (row, rowIndex) => {
try {
const response = await fetch("https://api.example.com/data");
const data = await response.json();
return data;
} catch (error) {
return row.data;
}
};
These are just a few examples of what your users can do with JavaScript. The possibilities are endless, and your users can combine these functions and add their own custom logic to perform even more complex data transformations, mappings, validations, and cleanings. For more information, check out the JavaScript documentation.
Global States
Central state management can be achieved through the use of global states. These states enable you to execute an asynchronous function once and save the output for future use across multiple columns. As a user, you can create up to 5 global states for each pipeline. The data returned from these states can then be accessed in every column mapping, making it convenient for data validation and cleaning purposes.
Global states are especially useful in scenarios where you need to make a single API request and use the results to validate input data for multiple rows and columns. By using global states, you can avoid making multiple API requests and reduce the overall processing time.
The memory capacity for each global state is limited to 1mb.
Best Practices
HTTP 429 Too Many Requests
The HTTP 429 Too Many Requests response status code is a HTTP standard response code indicating that the user has sent too many requests in a specified time frame, also known as "rate limiting." This status code is used to protect the server from being overwhelmed with too many requests.
To avoid hitting the rate limit, it's important to use global states to store the fetched data. For example, if you want to retrieve the data from the database, we suggest to use global states to make the async call:
async () => {
const response = await fetch(`https://api.example.com/data/${row.id}`);
const data = await response.json();
return data;
};
The data stored inside the global state can be accessed across column.