I'm a big fan of MySQL, but I ran into an interesting performance issue this weekend regarding views. MySQL added view support in 5.0, and I've used it to great effect, but it's not all roses.
Edit (2009-04-22): I used the term "materialized view" below in a misleading way. True materialized views are stored in permanent concrete tables. Below I refer to views that are created into a temp table on demand, used for the query in question, and then deleted.
Within MySQL, there are several ways to categorize views, but the one I care about is materialized vs. rewritten/merged. Materialized views are views that, as you might guess, are materialized when they're needed. In simpler terms, the view definition is used to create a temporary table and then that temporary table is used for the actual query. Rewritten or merged views are combined with the actual query by the SQL engine and executed as a single query.
The performance problem I ran across was with the former view type. It seems that the internals of MySQL manage materializing in a very inefficient way. I can't provide any sort of hard data to support that, but what I experienced was definitely points to the conclusion.
I had a view that had some fairly intensive stuff in it (both structurally and computationally). I knew it was materialized and was fine with a little bit of a slowdown, but I was seeing times in the 15 second range for certain queries that used it on my personal server (which, admittedly, is a only 3GHz Celeron). Manually materializing the view (i.e. running the view statement to create my own table) and then running the queries in question took a few hundred milliseconds tops, counting both the generation of the temp table and the actual query time (with most of it being the former).
The only explanation I can come up with was that the materializing of the view was happening repeatedly. Very repeatedly. Recasting the "view" as a normal table that I repopulate as needed has eliminated all the performance issues, and the queries are now typically running in tens of milliseconds.
I should mention that I've never had any performance issues with rewritten/merged views. MySQL seems to do a fantastic job efficiency-wise with them, and I continue to use them.
Interestingly, I had a very similar problem with Oracle a few months ago. The main difference was that I wasn't using views, but rather subqueries in the FROM clause (i.e., inline views). In that app, creating separate views was both impossible (we didn't own the database), and unnecessary (only one query used the view), and we were able to get around it with "factored subqueries" (the WITH clause).
Isn't this form of "materializing" data from different sources into a new table a form of denormalization? How much benefit is there when you weigh the cost of building these tables and storing the data in duplicate? Compared to just pulled out fresh data with a normal rewritten/merged view? I thought a big part of using normalizing data was that it was more efficient, with less data there is less to store and index, less to scan, less to update, and more can fit in fast RAM?
Thanks!
Jordan,
Yeah, it's absolutely denormalization.
When MySQL does it, it's an implementation detail of the server; if they changed to something else the DB client wouldn't care. As such, it's not a denormalization you'd care about.
When I do it manually, it's a much bigger deal. I'd love to have just changed my view to be of the "rewritten/merged" style, but with the current MySQL implementation, that's impossible. With the database not able to directly address my problem, it's up to me to choose between performance and maintenance overhead.
For the cost of having to rebuild my "view" tables when data is updated and the potential to have slightly outdated statistics in the "views", I can drop a significant number of queries from the 10-15 second range down to 40-100 milliseconds. Often there are multiple instance per page request as well. So the tradeoff seems a no-brainer to me.
I should also have mentioned that this is a CPU-bound limitation. On my laptop (slower disks, faster CPU, equivalent free RAM), the queries with the views run around a second a piece, compared with 10-15 seconds on the server.
What part of the operation makes it CPU bound? Is the view aggregating data, or running functions to produce the output? Is it possible to denormalize just some fields manually into a new table and join, instead of duplicating everything? Or does this reduce the performance gains?
Also just out of curiosity how much data is this working with, thousands, tens of thousands of rows, more? Does this size matter, like the larger it is, the more you can save?
P.S. I wish your blog had a comment email subscription option.
Jordan,
Sadly, the row count is tiny: less than a thousand rows.
I didn't do exceptionally granular testing, but the main offenders were aggregating subqueries. It would be possible to duplicate the subqueries in the "view" throughout the queries that leverage the view to get most of the performance benefit, but the maintenance downside was significat. That's actually the path I went down first, because I didn't want to jack up my database, but it wasn't worth it. The app is very SELECT-centric with relatively few updates, so manually managing the "view" tables isn't actually much of a headache.
There is a feed link at the bottom for subscribing to comments. I'd also naively assumed that WordPress would email thread participants when new comments were posted (like MovableType did), but I just looked it up, and that doesn't seem to be the case. I found a plugin that will provide that functionality; I'll be installing it in just a sec.
I had a similar problem with views on our web reporting platform. The culprit that time was that the view was not using the index from the underlying table. I forced a primary key on the ID, and the query times went down 1000%. Just something to look into…
Mark,
I'm pretty sure that the issue in my case was that the view engine was rematerializing the view over and over again within the same query. If it had just materialized it once and reused it, it would have been fine, even without an index. But still a good thought to keep in mind.
Thanks for your article! I seem to have hit the same problem with a test database on MySQL 5.1. In particular I wanted to mimic the behaviour of queries in Access (from which I am wishing to migrate the data). In Access you can 'stack' queries on top of one another and supply parameter values to the top level query which are inherited and used by lower level queries.
MySQL does not allow parameter values (aka user variables) with views, but there is a workaround using a stored function (which can access and then return the user variable). But the performance is terrible.
Previously I have used sub-queries (in MySQL 4.1), though not with user variables, and they have always been impressively fast (but I had to write my own VBA code to nest sub-query definitions into higher-level queries). Views should be neater but with their inability to access user variables (except inefficiently as it seems through a stored function), and the performance hit you describe, I guess I must stick to coded sub-queries for now.
"You keep using that word. I do not think it means what you think it means."
The error is the statement "Materialized views are views that, as you might guess, are materialized when they're needed."
From http://en.wikipedia.org/wiki/Materialized_view : "A materialized view takes a different approach in which the query result is cached as a concrete table that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of some data being potentially out-of-date. It is most useful in data warehousing scenarios, where frequent queries of the actual base tables can be extremely expensive."
From http://dev.mysql.com/doc/refman/5.1/en/faqs-views.html :
"23.6.5: Does MySQL 5.1 have materialized views?
No. "
(Same answer for MySQL 5.0, 5.1, and 6.0)
Vaughn,
You are correct; I chose my words poorly. MySQL does not have true materialized views. They have statement-rewriting views and temporary-table views. I was referring to the latter, where the view IS materialized into a concrete table, but then thrown away at the conclusion of the statement. So while it is materialized, it's not persistent, which is the whole point of true materialized views. True materialized views have to be manually created/managed by application code (which is pretty straightforward).
Wow. Way to resurrect a REALLY old post.
Hi all,
As a Oracle architect, ex developper, the MV (Materialized view) can be easily simulated with the SQL used to create a dynamic view (standard view) and replace the CREATE VIEW by CREATE TABLES MV_xxxx as select….. I am convinced argument exist to go against this option. () drop table time, refresh (recreate the MV) frequency etc… What so ever it is a KISS solution, and idiot-proof. B rgds – thierry