Vertica Error 4115 and new tables

I’m in the process of creating data tables to be populated by our load system. This will be updated periodically, so using a MERGE statement makes sense. After creating the table and attempting to merge I receive the following error:

ERROR 4115:  No super projection found for table table_name

But isn’t a super projection created by default when a table is created? Checking the documentation confirms that yes a super projection should be created. So why am I getting this error?

This comes down to one of Vertica’s counter-intuitive optimizations. For this table I didn’t describe the super projection so I am relying on Vertica to create one for me (I may revisit the projections once I’ve made my proof of concept). So the information can be derived and the projection will be created if I INSERT into the table.

In fact, doing that resolves the issue. The call to INSERT creates the projection and the problem resolves itself. It would seem that Vertica does not think to create the projection at MERGE time despite the fact the INSERTs and SELECTs will work independently of one another.

EDIT: Yes, this does have the desired effect, but the preferred work around is to add an encoding type to any column. Even if you explicitly set it to auto this does the trick. This is slightly less convoluted, but I don’t see why this step should be needed at all.

Black Light Burns live 14th February 2013

At The Underworld

Black Light Burns is creative work of on and off Limp Bizkit guitarist Wes Borland. He was described it as his main project and free of the need to be commercial or fit someone else’s vision he has had the freedom to be far more experimental. We’re still talking industrial rock, but it definitely has a flavour of its’ own.

I like the Underworld as a venue. It’s small and dingy with that “everything painted black” look. I wasn’t sure the stage was going to be big enough, but somehow they managed to fit a drum kit in the back.

The main support was Jayce Lewis who I’ve seen before supporting Combichrist. I don’t know whether it was the smaller stage, the new material or just having played more, but I thought they were a much better band. Their industrial rock sound was cohesive. Jayce playing additional percussion didn’t feel crowbarred in and they even made a show piece of it. They have an album out later in the year.

Black Light Burns made an unpretentious entrance and played a solid set from start to finish. The fact they were on tour barely registered. It felt almost as if we were in a small bar in their home town and they were bantering with the regulars. There was a strong sense of camaraderie and we were part of it. Jayce Lewis and his band felt it too and thanked the band for their involvement with the tour.

The time passed too quickly with Black Light Burns playing to curfew. Not even pausing for the ritual of an encore.

Storage and Dataloading in Vertica

Column based stores are optimized for reading, but Vertica boasts quick write times and no locking issues. How does it manage this? Vertica uses its memory as a write optimized store before commiting data to disk in the main read optimized store.

Vertica's ROS and WOS architecture

Data loading

There are two commands for inserting data into the database. The first any SQL user should be familiar with; INSERT. This works pretty much as you would expect. The second is COPY. COPY takes delimited plain text data from a file or STDIN and bulk loads it into the database.

This is where the nuance of the WOS and ROS come in. COPY can run in three modes. The default mode (AUTO) fills up the WOS before writing to disk, and is intended for small to medium sized updates. COPY DIRECT inserts straight into the ROS and should be used for large sets of data. A third mode, COPY TRICKLE is available for small amounts of data. It is more efficient than the default, but will fail if your load uses up all the RAM.

Tuple Mover

We may have an optimised write store, but the data does need to get to disk eventually. This process is handled by the Tuple Mover. It batch compresses and writes items in the WOS to disk. This batching reduces the overheads of disk access and spreads it across the entire batch.

Data on disk is stored as a collection of read optimized stores. Each transaction will write to a new store as a way of preventing the overheads of merging and fragmentation. This also has the advantage of meaning that all accessible data is read only, avoiding the need for locks. Over time, however, these do need to be merged for performance. The Tuple Mover handles this process too.

Vertica's Tuple Mover

COPY Gotchas

COPY is highly optimized for speed. This means that corners are cut more than with INSERT. Data is not transformed by default, you will have to explicitly set your conversion (e.g. for dates) when writing you COPY statement.

More worryingly, referential integrity is not checked. It is check at query time rather than load time. This means that once you’ve spent the time to load your data you may have the server turn around and complain then you try to access it. As a result special care is needed when using COPY.