The number of key/value pairs is the same, and the total amount of data is very close. The keys are 96 bytes, but the length of the values and the contents were tweaked to trigger the four different storage types: inline uncompressed, inline compressed, toast uncompressed, and toast compressed. To compare the performance of queries on the three "schemaless" types, I created a table with 1M rows, each containing a single column with 10 key/value pairs. For more details, see the Postgres documentation on page layout, or a less formal article, particularly the section "How The Rows are Stored". This causes the performance of queries that need to read these values to get substantially worse. If the tuple is still too large, it stores the variable-length fields in a separate table using a mechanism called TOAST (The Oversized-Attribute Storage Technique). When a row is larger than this threshold, Postgres first tries to compress variable-length fields. The exact number is 2032 bytes: (8196 byte page - 24 byte page header - 4×4 byte item offsets) / 4, rounded down to 8 byte alignment (see heaptoast.h). It attempts to store a minimum of 4 tuples per page, which means a tuple has a maximum size of a bit less than 2 kiB. Still: it seems like something that might be worth investigating for the Postgres maintainers. This is large enough that it may hurt some workloads. For an SSD with a 75 µs random read latency and throughput of 3000 MiB/sec (approximate 2022 numbers), it would increase from about 77 µs to 95 µs, which is a 23% increase. If we assume a 200 MiB/sec transfer rate and a 5 ms seek time (approximately a 2022 data center hard drive), going from 8 kiB pages to 64 kiB pages would mean random reads would go from about 5.04 ms to 5.4 ms, or an increase of about 7%, which means this would likely be a performance win for any application with large rows. These benchmarks show that larger pages can be better for large rows, since it would increase the amount of storage before falling off this cliff. A rough rule of thumb is that smaller pages are better for workloads that read and write small values, but larger pages are likely better for queries that scan ranges. I suspect the 8 kiB page may be the wrong default these days. However, Postgres's row length limit is pretty low. This is a general database problem and not a Postgres problem: MySQL and others have their own performance cliffs for large values. This should reduce the performance penalty for compressed values. I didn't test it, but others have found it to use a bit more space but be signficantly faster ( 1, 2). If you are using Postgres 14 or later, you should use LZ4 compression. In cases where you need excellent query performance, you may want to consider trying to split large JSON values across multiple rows. Compressed values makes queries take about 2× more time, and queries for values stored in external TOAST tables take about 5× more time. Never use JSON because the performance is terrible. Queries on HSTORE values are slightly faster (~10-20%), so if performance is critical and string key/value pairs are sufficient, it is worth considering. Accessing JSONB values is about 2× slower than accessing a BYTEA column. It stores parsed JSON in a binary format, so queries are efficient. Most applications should use JSONB for schemaless data. My conclusion is that you should expect a 2-10× slower queries once a row gets larger than Postgres's 2 kiB limit. This article contains some quick-and-dirty benchmark results to explore how Postgres's performance changes for the "schemaless" data types when they become large. The same performance cliff applies to any variable-length types, like TEXT and BYTEA. Unfortunately, the performance of queries of all three gets substantially slower (2-10×) for values larger than about 2 kiB, due to how Postgres stores long variable-length data ( TOAST). Postgres supports three types for "schemaless" data: JSON (added in 9.2), JSONB (added in 9.4), and HSTORE (added in 8.2 as an extension).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |