Table Statistics
Last updated
Last updated
The Table Statistics tab is visible if you have used the Iceberg Table Optimizer to tune your tables, or are using Upsolver to ingest data to Iceberg. Upsolver continuously monitors the files that comprise your Iceberg tables, and performance tunes them based on best practises for maintenance.
From Datasets, expand the navigation tree to display the table you want to view. If you are using Upsolver purely to manage Iceberg tables created by another source, you will see the tables that are currently selected for optimization. If you also have created pipelines using Upsolver, you will see all your datasets within the navigation tree.
At any time, you can remove an external table from Upsolver's optimization process. From Datasets, use the navigation tree to select the table you want to remove.
In the top right-hand corner of the tab, click the ellipsis to open the menu, then select Remove Table from Optimization. In the confirmation window, Click Remove Table from Optimization, or Cancel to exit and resume optimization.
The Optimization card provides instant insight into the size of your table.
Metric | Description |
---|---|
Current Table Size | The current size of the table in GB. |
Projected Table Size | The expected size of the table in GB following compaction. |
Current Scan Overhead | The time it currently takes to scan the table. |
Projected Scan Overhead | The expected time it will take to scan the table following compaction. |
The Table Statistics card provides deeper insights into the files and partitions that comprises your table. Use this card for instant insight into the underlying volume of files and partitions. You can also monitor the size of your partitions, and discover the maximum number files in a partition.
Metric | Description |
---|---|
Partitions | The number of partitions currently comprising the table. |
Avg. Partition Size | The average size of a partition forming the table. |
Total Files | The total number of files across all partitions in the table. |
Avg. File Size | The average size of a file across all partitions that form the table. |
Avg. Files per Partition | The average number of files within each partition in the table. |
Max. Files per Partition | The maximum number of files in any of the partitions comprising the table. |
The File Count graph displays the number of files across all partitions comprising your tables. Upsolver continuously monitors the files within your table partitions and provides visibility into the underlying count. The file count will naturally rise as new data is added, and fall after Upsolver's compaction process reduces the number of files required to store the data.
Ideally, fewer files that are compacted means denser data, leading to fast and efficient data retrieval. However, the count of files depends entirely on the overall size of the table, but the aim is to remove gaps, and reduce files that have only small volumes of data as they are less efficient.
As with the rise and fall in the size of the underlying number of files, the size of your table will change in line with the volume of data being written in and deleted, and reduce when Upsolver runs a compaction process on the partitions.
You can use the Table Size graph to monitor the size of your table:
The Top Partitions table displays detailed statistics for each partition used by the table, with the count of partitions displayed in the header label. You can use the Search box to help you find a partition - or group of partitions - and view the relevant statistics. Type in the name, or part name, for the partition(s) and click Enter.
Metric | Description |
---|---|
Partition | The name of the partition. |
Avg. File Size | The average size of a file within the partition. |
Total Files | The total number of files stored within the partition. |
Partition Size | The current size of the partition before optimization. |
Projected Partition Size | The estimated size of the partition after optimization. |
Scan Overhead | The time in minutes required to scan the partition. |
Projected Scan Overhead | The estimated time in minutes to scan the partition after optimization. |