First steps for using nuvo's pipeline
Data pipeline workflow
Our data pipeline product helps you automate and monitor the data onboarding and migration process with ease. Follow these five steps to get your pipeline running and keep track of the runs on the pipeline's detail page:
- Connector Selection: Define the source your data is coming from, the target to which the cleaned data should go and its structure plus validations. We currently support (S)FTP, HTTP(S), and AWS S3. (Further connectors upcoming)
- Header Selection: Select the row containing the column names for the input data.
- Data Transformation: Our smart ML-supported algorithm recommends you mappings between the imported columns and the target data model columns. You can resolve recommendations and create new mappings. Moreover, you can determine data transformation rules via formula (similar to Microsoft Excel or Google Sheets) or JavaScript function. NOTE: You can map multiple input columns to one output column and one input column to multiple output columns.
- Review Entries: Evaluate the cleaned data in your preferred output structure and clean the outlier and errors. NOTE: Manual changes will not be remembered for future runs compared to the data transformation rules.
- Schedule Pipeline: Determine a pipeline name, optionally specify an error threshold for future runs, and schedule the pipeline.
Further functionalities
Our data pipeline product has three additional important functionalities before or after setting up a pipeline.
Set up a target data model: Before setting up a pipeline, you need to determine your preferred output schema. You can define the structure and its validations of it as a target data model (TDM) inside our target data model generator. NOTE: You can also define a TDM during the data pipeline creation.
Create your connectors: To obtain data and deliver it clean to your desired location, you need to create input and output connectors. You can define these within the "Connector" tab or during the pipeline creation. We currently support connector types for (S)FTP, HTTP(S), AWS S3, and Custom Connectors.
Monitor & fix your pipeline: After creating a pipeline, you always have transparency about what has happened at every run. You can jump into the view mode for successful runs and see the retrieved data, the applied data transformation and the mappings. If a run fails, the error source is displayed to you, and inside the edit mode, you can fix the run by adjusting the header selection, the transformation rules or the mappings.