Key/values of dense arrays

Hi, I am learning how to model my application domain data in TileDB. For exmaple, I have many images and I can easily save those as dense arrays. But then, I want to group a collection of images together by a key.

We want to access all the images with a key; in Python, I’d model this as a dictionary with string keys and numpy array values.

Looking at the docs, I see TileDB has key-value data type- can the value itself be a dense array? My read of the documentation suggests that the values are just dumb string/byte arrays.

I see there is also an object hierarchy which appears to be a distinct/separate feature from key-value data type (is that correct?). And that feature is documented to allow nested structures.

So my guess is that I would use the object hierarchy, with multiple levels, to handle the top-level access and then reach into the actual dense arrays.

Hi there,

Thanks for reaching out!

You could group your images (modeled as dense arrays) in a hierarchy of groups (i.e., folders / common array URI prefixes), instead of actually using a key-value structure. Then the common URI path prefix of the images/arrays becomes your “key”. This must be the easiest and probably fastest solution to this problem.

To retrieve the arrays associated with a group, all you need to do is list the corresponding URI prefix. Note though that this may incur some non-negligible cost for backends such as S3. If you foresee numerous such group listings, you may want to use a fast external key-value store (e.g., RocksDB?) for listing by common prefix.

Thanks for your interest in TileDB. Please let us know if you have any other questions.

Best,
Stavros