Space vs. data tiles vs. MBRs diagram cell entry ordering


#1

I am looking at https://docs.tiledb.io/en/latest/tutorials/tiling-sparse.html#space-vs-data-tiles-vs-mbrs and there are six diagram showing the differences as one changes the space and cell order and the capacity.

Looking at the 4x4 -> 8x8 row major diagrams I am wondering if they are correct as shown?

Since the with 4x4 space ordering the first 16 entries are in the upper quadrant, I would have thought converting to a 8x8 space ordering those first 16 entries would have occupied the first two rows.

The original first 4 rows in the 4x4 space tiling diagram is:
X, 1, 2, X, 4, 5, 6, X
3, X, X, X, 7, 8, 9, X
X, X, X, X, 10, 11, X, X
X, X, X, X, 12, 13, 14, 15

Instead of the first 4 rows in the 8x8 space tiling diagram being:
X, 1, 2, X, 3, 4, 5, X
6, X, X, X, 7, 8, 9, X
X, X, X, X, 10, 11, X, X
X, X, X, X, 12, 13, 14, 15

I would have thought they would have been:
X, 1, 2, X, 3, X, X, X
X, X, X, X, X, X, X, X
4, 5, 6, X, 7, 8, 9, X
10, 11, X, X, 12, 13, 14, 15

X = empty cells

Don’t you still preserve the empty cell ordering when changing the tiling?

Thanks,
Terry


#2

Hi Terry,

I think there is a bit of confusion with the above diagrams. The above diagrams are showing the MBRs according to the physical sort order of the cells on disk (or in a result buffer) in TileDB’s “global order” (ex the space filling curve defined by the tile extents per dimension, tile layout and cell layout). So if you had a Sparse Matrix in COO format these show the order of the the I, J and V values (V is omitted). Logically if you ascribed zero values for the empty cells and materialized them, your I, J, and V buffers would contain the same values, just in a different sorted order.

Best,
Jake


#3

Hi Jake,

Yeah, I am confused but not sure exactly where… yet? Maybe attacking this from what confused me first might clear this up. In the first two diagrams why did values 3 and 6 change positions? Was this done because of the change in tiling or did this happen for some other reason?

Thanks,
Terry


#4

HI Terry,

3 and 6 don’t represent physical cell values, but enumerate the sorted order for the given array with the given non-empty cells under different space filling curves (the grey cells can represent any non-empty cell value(s)).

Starting with diagram #2 (simpler) - the space tile extents are 8x8 which cover the whole array domain, this is the order you would expect for a row major Sparse COO matrix. Cells are sorted in row-major order within each tile domain. Since the array domain == tile domain, the space filling curve for this array schema is traditional row major order for the whole array.

Diagram #1 has a space tile extent of 4x4. There are 4 tiles which cover the domain, the tile order is row major and the cell order is row major but the cells are sorted within each tile domain, and then across the sorted order of tiles. You can see this in https://docs.tiledb.io/en/latest/_images/tiling_sparse_physical_cell_layout.png

Best,
Jake


#5

Yeah, it just occurred to me that is what you were saying in your first response.

Thanks,
Terry


#6

Yes this fancy ordering doesn’t make too much sense from a traditional sparse matrix / hpc perspective, it comes more from the spatial database world where you want to aggregate multiple non-empty cells over a spatial extent (ex a range query of latitude / longitude). These types of range queries come up in many different applications though (ex. genomics, where you want a range of genomic positions / across a range of samples). Even if the range query is not contiguous, tuning the space tiling in this way helps to minimize the amount of results to read from disk / cache and the result set a given query has to scan over to satisfy the given constraints if the original data has some defined ordering.

-Jake