Disk trends will drive the need for column stores
In a recent post, Sam Madden explained the benefits that column-store architecture offers decision support applications. Summarizing briefly, columns stores have two significant advantages over row stores for such applications: First, only the columns needed by a query need to be read from disk; second, column stores, by their very nature, can be compressed much more effectively than row stores.
In this post we will discuss how trends in disk drive technology will eventually compel vendors that take decision support queries seriously to adopt column-store architecture. In a follow-on post, I will discuss how trends in CPU technology also favor the use of column-store architectures.
Disk capacity is up ...
In 1980, a "big" disk drive had a capacity of 80 MB (e.g., CDC model 9760) and a transfer rate of 1.2 MB/second. Today, 500 GB drives are common and TB drives are available. Thus, over a period of a little more than 25 years, disk capacities have increased by about a factor of 10,000. During this same period of capacity explosion, the cost per byte of storage has dropped dramatically. Those 1980-era 80 MB drives cost thousands of dollars. Today, a half-terabyte drive costs about $100. And this increase in capacity at lower costs is important for companies. While 10 years ago we were all in awe of Wal-Mart's 10 TB database, databases of this size have become almost commonplace.
... but ETB is down
Unfortunately, while capacities and costs have enabled users to store their databases on fewer drives, disk bandwidth issues are working against database systems that rely on row-store architecture. While we have witnessed capacity increases of four orders of magnitude over the last 25 years, disk transfer rates over this same period have increased less than two orders of magnitude. An increase of 1.2 MB/second to about 60 MB/second is typical, depending on which cylinder you are reading from and how big the blocks are that a database system uses (bigger is definitely better). The story on seek times is even worse with only about a factor of four improvement.
The problem is what I call decreasing ETB, the effective transfer bandwidth per byte of capacity. Despite capacities and transfer rates increasing by factors or 10,000 and 100 respectively, typical drive ETB has actually decreased by a factor of 100. As our good friend and late colleague Jim Gray was fond of saying, "Disks have become tapes."
Consider, for example, a 10 TB database. Ten years ago, this database would have occupied two thousand 5 GB drives - a common size at the time. With a 3 MB/second transfer rate, the aggregate bandwidth of these 2,000 drives would have been 6 GB/second, enabling the entire database to be scanned in about 30 minutes. Today, only about 20 higher-capacity drives would be needed to hold this same database. Those 20 drives would have an aggregate bandwidth of 1.2 GB/second, increasing the time required to scan the entire database to 150 minutes - an increase of two hours.
Unless Flash becomes the standard technology for constructing mass storage devices these trends are going to continue. Every two years, drive capacities will double and transfer rates (which are a function of rotation rate and density) will only increase marginally. Since very few organizations are doubling in size every two years, companies will need fewer and fewer drives to store their entire database. Already we see transaction processing systems running on farms of mostly empty disk drives to obtain enough seeks/second to satisfy their transaction processing rates.
Reducing the transfer bottleneck
Transfer-rate trends are one reason why users are going to find the use of column-store database systems like Vertica's so critical. How do column stores beat row stores in terms of dealing with the transfer bottleneck?
In summary, although both drive capacity and transfer rates have improved dramatically over the last twenty-five years, since they have not increased at the same rate we have actually witnessed a reduction in the transfer capacity per byte of storage by a factor of 100. Consequently, transfer bandwidth has become a bottleneck. While very large block sizes help, the only truly effective solution is to adopt a storage architecture based on a column store. In this way, only the columns needed for a query are read from disk and no precious disk bandwidth is wasted transferring columns that are not needed.
In this post we will discuss how trends in disk drive technology will eventually compel vendors that take decision support queries seriously to adopt column-store architecture. In a follow-on post, I will discuss how trends in CPU technology also favor the use of column-store architectures.
Disk capacity is up ...
In 1980, a "big" disk drive had a capacity of 80 MB (e.g., CDC model 9760) and a transfer rate of 1.2 MB/second. Today, 500 GB drives are common and TB drives are available. Thus, over a period of a little more than 25 years, disk capacities have increased by about a factor of 10,000. During this same period of capacity explosion, the cost per byte of storage has dropped dramatically. Those 1980-era 80 MB drives cost thousands of dollars. Today, a half-terabyte drive costs about $100. And this increase in capacity at lower costs is important for companies. While 10 years ago we were all in awe of Wal-Mart's 10 TB database, databases of this size have become almost commonplace.
... but ETB is down
Unfortunately, while capacities and costs have enabled users to store their databases on fewer drives, disk bandwidth issues are working against database systems that rely on row-store architecture. While we have witnessed capacity increases of four orders of magnitude over the last 25 years, disk transfer rates over this same period have increased less than two orders of magnitude. An increase of 1.2 MB/second to about 60 MB/second is typical, depending on which cylinder you are reading from and how big the blocks are that a database system uses (bigger is definitely better). The story on seek times is even worse with only about a factor of four improvement.
The problem is what I call decreasing ETB, the effective transfer bandwidth per byte of capacity. Despite capacities and transfer rates increasing by factors or 10,000 and 100 respectively, typical drive ETB has actually decreased by a factor of 100. As our good friend and late colleague Jim Gray was fond of saying, "Disks have become tapes."
Consider, for example, a 10 TB database. Ten years ago, this database would have occupied two thousand 5 GB drives - a common size at the time. With a 3 MB/second transfer rate, the aggregate bandwidth of these 2,000 drives would have been 6 GB/second, enabling the entire database to be scanned in about 30 minutes. Today, only about 20 higher-capacity drives would be needed to hold this same database. Those 20 drives would have an aggregate bandwidth of 1.2 GB/second, increasing the time required to scan the entire database to 150 minutes - an increase of two hours.
Unless Flash becomes the standard technology for constructing mass storage devices these trends are going to continue. Every two years, drive capacities will double and transfer rates (which are a function of rotation rate and density) will only increase marginally. Since very few organizations are doubling in size every two years, companies will need fewer and fewer drives to store their entire database. Already we see transaction processing systems running on farms of mostly empty disk drives to obtain enough seeks/second to satisfy their transaction processing rates.
Reducing the transfer bottleneck
Transfer-rate trends are one reason why users are going to find the use of column-store database systems like Vertica's so critical. How do column stores beat row stores in terms of dealing with the transfer bottleneck?
- By reading only the columns that a query needs, a column store consumes the least possible disk bandwidth. If a typical OLAP query references 5 out of 200 columns, only having to read the 5 columns needed for a query, in effect, means column store users will have a disk subsystem that is operating 40 times faster than if they were using a row store. The judicious use of materialized views can help but only if there exists a minimal materialized view for essentially every query executed.
- Compression is, in effect, a huge I/O accelerator. Being able to compress each column individually frequently yields factors of 10s or more in reduction in size. The result is like getting a disk that is 10 times faster for free. While row stores can indeed be compressed, as Sam explained in his post, it is impossible to achieve the same level of compression that a column store can.
In summary, although both drive capacity and transfer rates have improved dramatically over the last twenty-five years, since they have not increased at the same rate we have actually witnessed a reduction in the transfer capacity per byte of storage by a factor of 100. Consequently, transfer bandwidth has become a bottleneck. While very large block sizes help, the only truly effective solution is to adopt a storage architecture based on a column store. In this way, only the columns needed for a query are read from disk and no precious disk bandwidth is wasted transferring columns that are not needed.
Categories
Database architecture3 Comments
Leave a comment

In reading about larger drives, I have noticed some concern expressed over increasing failure rates.
I think we may be approaching a point where people will buy several lower capacity drives that tend to fail less often, provide better throughput and recoverability, rather than fewer larger drives to manage their data. For example, RAID setups are becoming more common in the mainstream (there are lots of consumer RAID devices available) and very affordable, which will facilitate this.
If this is true, the benefits of column based storage are still there, but they could become much less relevant.
Concerning "ETB" I was wondering to what extent the column based approach may exacerbate the problem.
It would seem reasonable to me to assume that in a column based database the values associated with a given row are unlikely to be stored in close proximity on the physical disk (since columns are stored in separate files). If this is correct then referencing n different columns would incur at lease some multiple of n seeks. Given that a seek is one of the worse performing disk actions there must be some crossover point at which the I/O incurred by reading data not directly referenced by a query is less time consuming than repositioning the disk head(s). An obvious answer to this would be to say that caching would reduce actual disk activity - but then wouldn't most of the arguments for a column based approach as apposed to a row based evaporate when the data is heavily cached?
On a slightly different topic, I'd be interested to see comment on techniques such as index covering as used by some row based databases.
This is an excellent observation. We sometimes refer to this problem as "gluing" the columns required by a query back together. If this is done naively the resulting performance can indeed be much worse than a row store. In particular, assume the project list of a query specifies five attributes. If one simply reads one page of each column at a time every page access will indeed incur a seek and the resulting performance of the scan can be very poor. The solution is relatively straightforward and entails reading many blocks of each column before seeking to the next column. For example, one might sequentially read 50 blocks of the first column into the buffer pool, then 50 blocks of the second column, etc. In this way, the cost of seeking between columns can be amortized over many block reads. The 2006 VLDB paper by Harizopoulos, et. al. carefully analyzes the problem you observed and the solution I mentioned above.