Reads are suffering badly

I have a two dimensional dense array with float properties.

domain.add_dimension(Dimension::create<int>(ctx, "rows", {{row_dimension.start, row_dimension.end}}, row_dimension.tileExtent))
            .add_dimension(Dimension::create<int>(ctx, "cols", {{column_dimension.start, column_dimension.end}}, column_dimension.tileExtent));

my tile configuration is as below:

(1,5000,1,1,5000000,5000000). In short an array of (5K X 5M) where each tile is (1 row X 5M columns) = total 5k Tiles.

I am writing in this array with Row_major direction. where each row is a sample that has been processed. I have no issues with write times and it writes pretty fast. However, when it comes to reading. It is severely slow when it comes to large array. When it is a small array (15 rows X 5m columns). Reading is pretty fast but when I try that on (5k Rows X 5m columns) and try to get .slice which is (100 Row X 100k Columns), it is pretty much taking 2 minutes. Which sounds pretty bizarre. Any suggestions?

@TileDbUser, I think you have to adjust your tiling scheme a bit to have a better “average case”. Here the tiling is (1, 5_000_000). Every time you read a 1 row tile, your IO cost is 5_000_000 (compressed) cells, so your io efficiency with 1 row is ~ 5_000_000 / (15_000) if all 15_000 columns read are contained within one tile. This will work decently well if most of your queries are single row at a time. If they are 15_000k rows though you can see the io efficiency is probably very low (you can verify this with tiledb::stats output instrumenting the reads).

The solution here to to pick a better compromise for your workload, tile more rows together and reduce the column tiling. By playing with a subset of the data and looking at the stats output you can get a better sense of a good tiling scheme. Unfortunately it’s hard to tell apriori what this is because each use case is different but we are working on solutions to suggest better ones.


These docs might also help get a better understanding of the effect of tiling on performance (e.g., paragraph Space Tiling).

Thanks Jake. It was very helpful.

Thank you. I will look into this.