Column based stores are optimized for reading, but Vertica boasts quick write times and no locking issues. How does it manage this? Vertica uses its memory as a write optimized store before commiting data to disk in the main read optimized store.
Data loading
There are two commands for inserting data into the database. The first any SQL user should be familiar with; INSERT
. This works pretty much as you would expect. The second is COPY
. COPY
takes delimited plain text data from a file or STDIN
and bulk loads it into the database.
This is where the nuance of the WOS and ROS come in. COPY can run in three modes. The default mode (AUTO
) fills up the WOS before writing to disk, and is intended for small to medium sized updates. COPY DIRECT
inserts straight into the ROS and should be used for large sets of data. A third mode, COPY TRICKLE
is available for small amounts of data. It is more efficient than the default, but will fail if your load uses up all the RAM.
Tuple Mover
We may have an optimised write store, but the data does need to get to disk eventually. This process is handled by the Tuple Mover. It batch compresses and writes items in the WOS to disk. This batching reduces the overheads of disk access and spreads it across the entire batch.
Data on disk is stored as a collection of read optimized stores. Each transaction will write to a new store as a way of preventing the overheads of merging and fragmentation. This also has the advantage of meaning that all accessible data is read only, avoiding the need for locks. Over time, however, these do need to be merged for performance. The Tuple Mover handles this process too.
COPY Gotchas
COPY
is highly optimized for speed. This means that corners are cut more than with INSERT
. Data is not transformed by default, you will have to explicitly set your conversion (e.g. for dates) when writing you COPY
statement.
More worryingly, referential integrity is not checked. It is check at query time rather than load time. This means that once you’ve spent the time to load your data you may have the server turn around and complain then you try to access it. As a result special care is needed when using COPY
.