AWS S3, basic read/write, "Unable to connect to endpoint with address..."

Hi,

  • I’m trying to write a sparse array in AWS S3 from python 3.6, Ubuntu 18.04 with the pip-installed stock version 0.4.3. conda install -c conda-forge tiledb would install a v1.*, with which import of tiledb fails. Using conda pip, though.
  • The array works fine when written and read locally, but rewriting wrom scratch, adding the s3 bucket and prefix doesn’t seem to work.
  • First tried with no explicitly set configs, e.g. just the env vars set
  • also tried aws-syncing it to the bucket first, then reading it
  • tried to set vfs configs to the credentials and region
  • nevertheless I seem to get an
    => “Error message: Unable to connect to endpoint with address x.x.x.x”

My AWS user has r/w privileges to the bucket. I’m using the usual AWS env vars AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID , which operate fine with the AWS CLI.

First tried with ‘s3://my-bucket/my-prefix/my-array-name’, then ‘s3://my-bucket/my-array-name’ , no change.

Should be a fairly vanilla operation, isn’t it… any ideas, thanks in advance ?

Hello @luotzi,

The problem you’ve encountered is likely due to curl CA certificate mismatches. If you are using conda the simple fix is to uninstall tiledb from pip and use our python conda package tiledb-py.

conda install -c conda-forge tiledb-py

It is also possible to install from pip without using the binary wheels and compile from source, you would need to install cmake, g++ and build-essentials.

pip install tiledb --no-binary tiledb

The specifics about the issue are when curl is compiled it will detect the location of the CA certificates and it hard codes that path into the binary. Pip binary package is build for manywheel1 which uses an older centos and has the CA certificates in a separate location. We have a fix for this issue coming in the next TileDB release 1.7 by doing runtime detection of CA Cert locations on Linux.

Let us know if you have problems with the tiledb-py conda package.

(Many thanks for a swift response!)
Apparently, filters have been relocated, error below… ?

tiledb 1.2.2 0 conda-forge
tiledb-py 0.1.1 py36h7eb728f_1 conda-forge

AttributeError: module ‘tiledb’ has no attribute ‘LZ4Filter’

Hi @luotzi,

That is an old version of tiledb and tiledb-py, and it doesn’t have S3 support (plus apparently some other linking issue). The latest version of tiledb-py is 0.4.3.

Without seeing the rest of the environment, it is not clear why conda installed the old version, but my guess is that you have something else installed in your existing environment which prevents conda from installing the latest version due to pinning/compatibility checks.

One option for testing purposes would be to create a new conda environment and ensure that the latest tiledb-py is installed there, and works for your needs:

conda create -p /tmp/testenv
conda activate /tmp/testenv
conda install -c conda-forge tiledb-py python
conda list
<then try S3>

(if it doesn’t work, please send the full output of conda list)

Hope that helps,
Isaiah

Thanks,

some upgrades helped. Now, I’m trying to optimize a write of a large array into S3, or actually a small part of it first. With default configs, the write succeeds but is quite slow. I have a total of 8 threads on 4 cores. I use Bzip filtering for the data, that part seems to work in parallel. Looks like S3 writes are not being parallelized at all, though. Below, a config attempt to enable all but one thread for s3 writes. Any further configuration tips ?

config['vfs.max_parallel_ops'] = 7
config['sm.num_writer_threads'] = 7

It seems to have no effect - from the stats dump, I have
.
.
.
vfs_s3_num_parts_written, 81
vfs_s3_write_num_parallelized, 0

Summary:

Hardware concurrency: 8
Reads:
Read query submits: 0
Tile cache hit ratio: 0 / 0
Fixed-length tile data copy-to-read ratio: 0 / 0 bytes
Var-length tile data copy-to-read ratio: 0 / 0 bytes
Total tile data copy-to-read ratio: 0 / 0 bytes
Read compression ratio: 518 / 450 bytes (1.2x)
Writes:
Write query submits: 20
Tiles written: 23646
Write compression ratio: 710357323 / 67085935 bytes (10.6x)

Could you please share your array schema and the write query layout?

From the number of bytes and tiles written, I suspect your tile size is very small, which may be affecting the compression time. I also suspect that bzip2 compression dominates the total time (this is the slowest compressor).

Finally, note that the parameter that controls parallelism on S3 is "vfs.s3.max_parallel_ops" (all S3 parameters have a vfs.s3.* prefix). "vfs.s3.multipart_part_size" is also relevant here.

For the complete array of which I’m now only testing small slices, 27 min out of 1h 10 min seemed to have been used for bzip2, deciding from compressor_bzip_compress of the dump. Tried with gzip, did not reduce total time very much, so perhaps the bottleneck really is s3 parallelization? Setting vfs.s3.max_parallel_ops had no effect, in the stats dump still

  vfs_s3_write_num_parallelized, 0

although in config,

    config['vfs.max_parallel_ops'] = 7
    config['vfs.s3.max_parallel_ops'] = 7
    config['sm.num_writer_threads'] = 7

Schema:

CHUNK_SIZE_DIM1 = CHUNK_SIZE_DIM2 = 1000
DIMENSIONS_DATA_TYPE = np.int64
ATTR_DATA_TYPE = np.int16
CHUNK_SIZE_DIM3 = int(1e6)
CHUNK_SIZE_DIM4 = int(3.5e6)
COORDS_COMPRESSION = [tiledb.DoubleDeltaFilter(), tiledb.Bzip2Filter()]
DATA_COMPRESSION = [tiledb.Bzip2Filter()]
CHUNK_CAPACITY = 2500

def create_array(array_path, ctx=None):

    dom = tiledb.Domain(
                        tiledb.Dim(name='dim1, domain=(0, DIM1_MAX), tile=CHUNK_SIZE_DIM1, dtype=DIMENSIONS_DATA_TYPE),
                        tiledb.Dim(name='dim2', domain=(0, DIM2_MAX), tile=CHUNK_SIZE_DIM2, dtype=DIMENSIONS_DATA_TYPE),
                        tiledb.Dim(name='dim3', domain=(DIM3_MIN, DIM3_MAX), tile=CHUNK_SIZE_DIM3, dtype=DIMENSIONS_DATA_TYPE),
                        tiledb.Dim(name='dim4', domain=(DIM4_MIN, DIM4_MAX), tile=CHUNK_SIZE_DIM4, dtype=DIMENSIONS_DATA_TYPE))

    schema = tiledb.ArraySchema(ctx=ctx, domain=dom, sparse=True, capacity = CHUNK_CAPACITY,
                                coords_filters=tiledb.FilterList(COORDS_COMPRESSION),
                                attrs = \
                                    [tiledb.Attr(name='attr1', dtype=ATTR_DATA_TYPE, filters=tiledb.FilterList(DATA_COMPRESSION)),
                                    tiledb.Attr(name='attr2', dtype=ATTR_DATA_TYPE, filters=tiledb.FilterList(DATA_COMPRESSION))]
    )

    tiledb.SparseArray.create(array_path, schema)

Write profile consists of hypercubes limited by dim1 and dim2, written in patches of 1000x1000. dim3 and dim4 are piecewise linearly behaving values, hence DoubleDelta (which seems to work great). All million cells in each write op have attribute values.

versions:

tiledb-py                 0.4.3            py37hb78b526_1    conda-forge
Python 3.7.4

The above stats were just for a small slice, since the whole array takes quite a long time. Here are the stats for one complete run I made with tile size of (dim1, dim2) = (10,10). Changing to (1000,1000) didn’t seem to make a difference…

.
.
.
writer_num_attr_tiles_written, 910971
writer_num_bytes_before_filtering, 27327986064
writer_num_bytes_written, 2675107932
sm_contexts_created, 1
sm_query_submit_layout_col_major, 0
sm_query_submit_layout_row_major, 0
sm_query_submit_layout_global_order, 0
sm_query_submit_layout_unordered, 792
sm_query_submit_read, 0
sm_query_submit_write, 792
tileio_read_num_bytes_read, 450
tileio_read_num_resulting_bytes, 518
tileio_write_num_bytes_written, 9560597
tileio_write_num_input_bytes, 39039583
vfs_read_total_bytes, 450
vfs_write_total_bytes, 2684819009
vfs_read_num_parallelized, 0
vfs_read_all_total_regions, 0
vfs_posix_write_num_parallelized, 0
vfs_win32_write_num_parallelized, 0
vfs_s3_num_parts_written, 3169
vfs_s3_write_num_parallelized, 0

Summary:

Hardware concurrency: 8
Reads:
Read query submits: 0
Tile cache hit ratio: 0 / 0
Fixed-length tile data copy-to-read ratio: 0 / 0 bytes
Var-length tile data copy-to-read ratio: 0 / 0 bytes
Total tile data copy-to-read ratio: 0 / 0 bytes
Read compression ratio: 518 / 450 bytes (1.2x)
Writes:
Write query submits: 792
Tiles written: 910971
Write compression ratio: 27367025647 / 2684668529 bytes (10.2x)

OK, a couple of interesting comments to make.

On write parallelism

  • TileDB parallelizes writes across attributes (i.e., tiles across attributes are written in parallel)
  • TileDB potentially parallelizes the write of sequential tiles on each attribute. That is, it takes the tiles that will be placed contiguously in the attribute file, and if their collective size is big enough, it issues a multipart upload to S3. This is based on vfs.s3.max_parallel_ops and vfs.s3.multipart_part_size, so if the tiles you are writing in a single write session are smaller than vfs.s3.multipart_part_size, the tiles are written serially.

Since you are writing 1,000,000 cells at a time, this means that for the coordinate tiles you end up writing 1,000,000 * 4 (dimensions) * 8 (bytes) = 32,000,000 bytes. But since you experience ~10x compression ratio, TileDB really ends up writing ~3MB. The default vfs.s3.multipart_part_size is 5MB, and this is why you don’t see any parallelism here. Note though that you do get parallelism across attributes, i.e., TileDB writes the coordinate tiles, and the attribute tiles on attr1 and attr2 in parallel, but essentially you only utilize only 3 of your threads. Also note that the sizes of attr1 and attr2 are considerably smaller (~200KB compressed each).

In other words, each of your writes is too small, thus not being able to fully take advantage of TileDB’s parallelism and while the S3 latency must be killing you.

On compression parallelism

  • TileDB parallelizes across attributes (and coordinates)
  • TileDB parallelizes within each tile by chunking. Default chunking is 64KB.

You tile capacity (CHUNK_CAPACITY) is 2500, so each coordinate tile is 2500 * 4 * 8 = 80,000 bytes, and your attribute tile is 2500 * 2 = 5000 bytes. This is too small to take advantage of (i) good compression ratio from the compressors, and (ii) parallelism across tile chunks and L1 cache locality.

Solution

  1. Increase your chunk capacity to be in the order of 100,000 or 1,000,000, so that the tiles are big enough to be decomposed into chunks that will be compressed in parallel.

  2. Perform much larger writes. Make sure that each time you go to S3, you go for real, i.e., for many MBs at a time for each attribute. So perhaps buffer many more cells and dispatch big writes in TILEDB_UNORDERED mode (so that you don’t worry about sorting - TileDB will do it for you).

On array design

Are you sure that the array should be sparse? The first 2 of your dimensions seem dense. How about you move DIM3 and DIM4 to attributes and create a 2D dense array instead? If the first two dimensions are selective enough, you can filter DIM3 and DIM4 later outside of TileDB without too much overhead. I expect this to be much more efficient.

(Thanks again for fast and to-the-point replies. )

Will study those tips in detail… In the meantime, a quick answer to the sparse vs. dense question: the main motivation of the application really is to be able to slice the read queries by dim3 and dim4, which are indeed sparse…

Regarding “bigger writes”, you can also consider writing in TILEDB_GLOBAL_ORDER (https://docs.tiledb.io/en/stable/tutorials/writing-sparse.html#writing-in-global-layout), as TileDB will be doing the proper buffering before writing to S3. The downside is that you need to provide the cells sorted to TileDB in the global order, which is cumbersome.

If slicing needs to be done by DIM3 and DIM4, I would suggest making those the two dimensions of a sparse array, and move DIM1 and DIM2 to attributes. In the very least, I would suggest moving those two as the first two dimensions, if you think that those are indeed the selective dimensions.