Recently in Database architecture Category
The purpose of this post is to explore alternative data warehouse architectures and understand the advantages MPP column store databases offer. It also debunks some of the myths of MPP data warehousing and suggests some evaluation criteria when comparing different MPP implementations.
Continue reading "The Truth About MPP & Data Warehousing" »
The information technology world is buzzing about cloud computing, but what about cloud security? The good news is that cloud security is not that different from enterprise security -- you can use many of the same tools you use to secure your external Web servers,
Continue reading "Securing Your Data in the Cloud" »
Could the tidal wave of virtualization and cloud computing sweeping today's data centers trigger in the analytic database arena a repeat of the historic demises suffered by 14" and 8" disc manufacturers decades earlier? I think so. Over the next very few years, I predict that proprietary hardware DBMS products will look like albatrosses in a sea of uniformity in both the public and private data centers.
Continue reading "The Innovator's Dilemma for Analytic Database Systems" »
In this post, we debunk the myth that the performance advantages of a column store can gained by replacing a row-oriented storage layer of a DBMS with a column-oriented storage layer without also rewriting the row-oriented query execution system that plans and processes queries on top of the storage system. Read how we did this by running experiments with the C-Store column-store database to discover how closely related the performance of column stores is to these storage layer optimizations.
Continue reading "Debunking Yet Another Myth: Column-Stores As A Storage-Layer Only Optimization" »
With database volumes growing exponentially and CPUs far out performing disks, compression has become a hot topic among database management solutions. Just don't believe everything you hear about compression. Real world compression rates will vary dramatically depending on the data in your warehouse and how you load and query it.
Continue reading "Field Fodder -- Compression in Real World Datasets" »
We debunk another commonly proposed approach for making a row-store perform like a column-store: vertically partitioning a row-store. Vertical partitioning is a performance enhancing trick that some DBAs perform to enhance performance on read-mostly data warehouse workloads. The idea is to store an n-column table in n new tables. Each of these new tables contains two columns - a tuple ID column and data value column from the original table.
Continue reading "Debunking Another Myth: Column-Stores vs. Vertical Partitioning" »
Consider a traditional, row-oriented database. Indexes are known to improve performance in database systems. They can greatly reduce I/O costs by avoiding the need to perform table scans since they directly contain the data you need to answer a query or contain pointers to such data. If you have a query that accesses only two out of thirty columns from a large table, and you have an index on these two columns, then you can use the indexes to avoid scanning all of the data in a table.
Continue reading "Debunking a Myth: Column-Stores vs. Indexes" »
Both column-stores and data cubes are designed to provide high performance on analytical database workloads (often referred to as Online Analytical Processing, or OLAP.) These workloads are characterized by queries that select a subset of tuples, and then aggregate and group along one or more dimensions. In this post, we study how column-stores and data cubes would evaluate a query on a sample database.
Continue reading "Understanding the Difference Between Column-Stores and OLAP Data Cubes" »
There will soon be a myriad of announcements of DBMS offerings in the cloud. Many of these will NOT be marriages made in heaven. However, the most innovative new DBMS software married to new cloud computing services are here today and truly take advantage of the cloud architecture in order to change the economics and the responsiveness of business analytics.
Continue reading "DBMS innovations that will make analytics in the cloud a reality" »
In this post, Mike Stonebraker tackles two issues with regards to row- versus column-store databases. In the first issue, he looks at performance challenges given the demands of users. In the second issue, he discusses the availability of third-party connectivity as well as automatic database design tools.
Continue reading "Supporting Column Store Performance Claims" »