JDBC data source

This article provides an introduction to JDBC along with a guide on how to create a JDBC data source in Upsolver.

What is JDBC?

Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity and is part of the Java Standard Edition platform from Oracle Corporation.

Upsolver supports uploading data from the following JDBC sources:

  • Microsoft SQL Server

  • MySQL

  • PostgreSQL

  • Amazon Redshift

  • Snowflake

  • Oracle

Create a JDBC data source

1. From the Data Sources page, click New.

2. Select JDBC.

3. Name this data source.

4. From the dropdown, select a compute cluster (or create a new one) to run the calculation on.

5. Select a target storage connection (or create a new one) where the data read will be stored (output storage).

6. Under Parallelism, enter in the number of independent shards to parse data, increase parallelism, and reduce latency.

This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

7. Enter the JDBC connection string.

See: Connections for more information on the string format for a specific connection type.

8. (Optional) Configure any additional properties necessary to make the connection.

9. Enter in the connection username.

10. Enter in the corresponding password.

11. Enter the name of the table to connect to.

12. (Optional) Identify an incrementing column.

This is a column that is always incrementing in value.

Every time the data is ingested, it starts with the number following the last number loaded (e.g. an ID column).

If you do not specify an incrementing column, Upsolver will automatically try and identify an incrementing column.

13. (Optional) Identify any timestamp columns.

This is a column with a timestamp. When the data source executes, it queries the data for rows with a more recent timestamp.

Multiple timestamp columns are also supported. If you specify multiple timestamp columns, the query uses the first non-empty time stamp that it finds.

Examples:

  • If you have two columns (time1, time2) and a specific row only has an entry in time2, this will be used.

  • If there is an entry in both time1 and time2, it uses the timestamp from time1.

You can also specify both an incrementing column and timestamp columns.

In this case, it queries the data for rows with a more recent timestamp, or rows with a pre-existing timestamp but with an higher value in the incrementing column.

14. (Optional) Set a read delay (in terms of seconds) to provide a buffer for how far back in time to look in the data.

This can be useful in cases such as when the timestamp column is rounded to minutes, a read delay of 30 seconds can ensure that you do not miss any relevant rows.

15. (Optional) Enter the full load interval.

16. Click Continue to preview your data and review the following metrics:

  • Sample Size

  • Parsed Successfully

  • # of Errors

17. (Optional) If there are any errors, click Back to change the settings as required.

18. Click Create.

You can now use your JDBC data source.