Reads are suffering badly

I have a two dimensional dense array with float properties.

domain.add_dimension(Dimension::create<int>(ctx, "rows", {{row_dimension.start, row_dimension.end}}, row_dimension.tileExtent))
            .add_dimension(Dimension::create<int>(ctx, "cols", {{column_dimension.start, column_dimension.end}}, column_dimension.tileExtent));

my tile configuration is as below:

(1,5000,1,1,5000000,5000000). In short an array of (5K X 5M) where each tile is (1 row X 5M columns) = total 5k Tiles.

I am writing in this array with Row_major direction. where each row is a sample that has been processed. I have no issues with write times and it writes pretty fast. However, when it comes to reading. It is severely slow when it comes to large array. When it is a small array (15 rows X 5m columns). Reading is pretty fast but when I try that on (5k Rows X 5m columns) and try to get .slice which is (100 Row X 100k Columns), it is pretty much taking 2 minutes. Which sounds pretty bizarre. Any suggestions?

@TileDbUser, I think you have to adjust your tiling scheme a bit to have a better “average case”. Here the tiling is (1, 5_000_000). Every time you read a 1 row tile, your IO cost is 5_000_000 (compressed) cells, so your io efficiency with 1 row is ~ 5_000_000 / (15_000) if all 15_000 columns read are contained within one tile. This will work decently well if most of your queries are single row at a time. If they are 15_000k rows though you can see the io efficiency is probably very low (you can verify this with tiledb::stats output instrumenting the reads).

The solution here to to pick a better compromise for your workload, tile more rows together and reduce the column tiling. By playing with a subset of the data and looking at the stats output you can get a better sense of a good tiling scheme. Unfortunately it’s hard to tell apriori what this is because each use case is different but we are working on solutions to suggest better ones.

-Jake

These docs might also help get a better understanding of the effect of tiling on performance (e.g., paragraph Space Tiling).

Thanks Jake. It was very helpful.

Thank you. I will look into this.