Lineage

This page describes the Lineage tab for stream and file sources.

The Lineage tab provides an insight into your data's journey, visually displaying where datasets and jobs interact, enabling you to drill into each entity, and see how pipelines and datasets relate and connect.

Viewing data lineage

The lineage diagram for your job shows the related datasets for the current job, which is always highlighted in the diagram to show its position within the pipeline architecture:

Adjust the view

You can use your mouse scroller to zoom in and out on the diagram to change the display size, or use the zoom control in the bottom left-hand corner of the screen to set the view. Click fit view to fill the screen with the diagram. If you need to move the image, use your mouse to grab and relocate the diagram on the screen.

Data source

Click on the data source icon to display a pop-up with the name of the topic or bucket location where the data is sourced, along with the connection used by the job to copy the data.

Job

Click the highlighted job icon to display information including the full name of the job, and whether it is a sync or non-sync job. From here, click Info to display the SQL syntax that created the job, and optionally click Copy to paste the code into a worksheet:

Data target

The target includes additional information to provide a top-level view of the schema. Click the dataset icon to open the pop-up and view the table and schema names, and the connection used by the job to write the data.

Schema

Click Info to open the modal, which provides an overview into your data:

From this modal, click Entity Page to navigate to Datasets and view more detailed information about the data. Alternatively, click on the Entity Page link in the pop-up to open the entity in Datasets.

SQL

The SQL tab in the modal displays the syntax used to create the table, along with job options and configuration settings. Optionally, click Copy to paste the code into a worksheet.

Extended lineage

As well as viewing the immediate data sources that are processed by the job, you can click Display Extended Lineage to view where the job exists in relation to other entities in your ecosystem.

Click the checkbox to extend the lineage diagram:

In this example, the diagram extends to show that the current job - which is always highlighted - shares its data source with another job that writes to different location. Viewing a job in relation to other entities enables us to plan for changes that may have wider impact within our organization.

As with the previous view, you can click on all related entities in the extended diagram to expose further information and drill into the details.

Last updated 1 year ago