Motivation
Most newly written modules have scaling problems. In part it's because they are typically developed without a realistic data set in the database or have been ported from oracle or postgres without the new queries being properly tuned. I took one such module forums which was originally written by Openforce for MIT and looked at how well it performed as the number of users and posts increased.
When I do work like this I typically go through the following steps:
- Set up a testing server.
- Write a script to load the data via the existing API.
- Tune the load process.
- Work through the existing pages checking for slow queries using developer support.
- Document any changes I make as I go.
The documentation step is particularly important since it is easy to create an index in the database, have it be important for performance, but neglect to put it back into the create scripts.
First Steps
I installed OpenACS 4.6 on a server called oatest and
created about 10,000 users using this tcl script:
# Generate a bunch of users to test scalability of UI etc.
ReturnHeaders text/plain
ns_write "Test User creation:\n"
set last_name "test"
for {set i 1000} {$i < 10001} {incr i} {
set first_names "user$i"
set email "test-$i@xorch.net"
set password "passfor$i"
set user_id [db_nextval acs_object_id_seq]
set new_user_id [ad_user_new $email $first_names $last_name $password "" "" "" "t" "approved" $user_id]
ns_write "created ad_user_new $email $first_names $last_name $password $user_id\n"
}
and ran "vacuumdb -v -z -f oatest" to make sure the
query optimizer had the data for properly optimizing queries.
Add Realistic Data
I have a couple months of posts to the usenet comp.* newsgroups,
collected using slrnpull (part of the slrn newsreader package)
which I use as a test dataset; in total 800,000 posts, 3.8GB of data,
some of which is not even porn spam. The first step is to
insert some records (about 1,000) into forums with the script I
will be testing with, loader.tcl (and some data if you would like
to try it out: usenet-tcl.tar.gz)
and again do "vacuumdb -v -z -f oatest".
As an aside, lest you think that all this vacuuming is silly, here is the insert rate without having run vacuumdb...
oatest=# select now(), count(*) from forums_messages;
now | count
-------------------------------+-------
2002-12-02 08:38:38.335564+00 | 865
oatest=# select now(), count(*) from forums_messages;
now | count
-------------------------------+-------
2002-12-02 08:56:12.630173+00 | 871
or a stuning 275 seconds per insert. Stellar. After running vacuumdb it was more like 2 seconds per insert (including a site node per newsgroup and users as needed).
Faster Bulk Inserts
This rate is not too horrifying since it would only take about 3 days or so to insert 100,000 messages but even so it would be nice if it were a little faster. I used developer support to see what is actually taking the time on the inserts (example). Setting up the forum site node (via site_node__new and apm_package__new) takes about 2 seconds/forum (ok, we don't do it that much), creating the forum takes 43ms (great), adding users takes about 600ms/user (not great), and the adding a message takes about 800ms (but more than half of that seems to be due to notifications checking and from calling forum::message::get twice which should be both much faster and is probably unneccessary).
Also, there are some really good candidates for caching in the package_instantiate_object code (it seems crazy to hit the database 4 or 5 times to generate an object instantiation proc); it is fast but since the data just does not change it should be cached.
Looking at the query for forum::message::get.select_message:
select forums_messages.*,
person__name(forums_messages.user_id) as user_name,
party__email(forums_messages.user_id) as user_email,
forums_forum__name(forums_messages.forum_id) as forum_name,
forums_message__root_message_id(forums_messages.message_id) as root_message_id,
(select fm2.subject
from forums_messages fm2
where fm2.message_id = forums_message__root_message_id(forums_messages.message_id)) as root_subject,
to_char(forums_messages.posting_date, 'Mon DD YYYY HH24:MI:SS') as posting_date
from forums_messages
where forums_messages.message_id= :message_id
there is an obvious thing to look at and that is the nested select for getting the root subject. A common consequence of using a function in the where clause is to force a table scan and that is exactly what is going on here.
oatest=# explain select fm2.subject from forums_messages fm2
where fm2.message_id = forums_message__root_message_id(29437) ;
NOTICE: QUERY PLAN:
Seq Scan on forums_messages fm2 (cost=0.00..122.09 rows=1 width=37)
Recreating the function with the attribute
"with(iscachable)" results in a much more satifying
query plan:
oatest-# explain select fm2.subject from forums_messages fm2
where fm2.message_id = forums_message__root_message_id(29437) ;
NOTICE: QUERY PLAN:
Index Scan using forums_messages_pk on forums_messages fm2 (cost=0.00..5.76 rows=1 width=37)
With that change the forum::message::get.select_message takes about 5ms rather than the 165ms it was taking. It still probably does not need to be called twice on a message insert but there are bigger fish to fry. Also, keep in mind that declaring a function iscachable can be wrong and dangerous. In this case, iscachable is true by virtue of there being no functionality to split or reparent threads but if such functionality is added to the forums, the stated semantics required for iscachable would no longer hold. From the PostgreSQL 7.2 Documentation:
Iscachableindicates that the function always returns the same result when given the same argument values (i.e., it does not do database lookups or otherwise use information not directly present in its parameter list). The optimizer usesiscachableto know whether it is safe to pre-evaluate a call of the function.
Note that this is from 7.2 and the 7.3 keywords are now
different (immutable, stable, and
volatile). Iscachable is deprecated
but treated as equivalent to immutable in 7.3. (7.4dev docs)
With the iscachable change and with a change to plpgsql_utility::generate_attribute_parameter_call to cache the parameter lookups, the load of 16 messages take 5.5 seconds with no obvious easy ways to shave off much time (developer support info). Unfortunately the real limit on inserts turns out to be new user creation, the optimization of which is left as an excercise for the reader.
At this point I turned off developer support, which is important. Leaving it on for a page which does thousands of DB requests will cause your server to grow very large until it just falls over since developer support will attempt to cache the text of all executed queries in memory.
I ran an insert overnight, which worked great until the error log file hit 2GB (which I discovered hangs the server on linux). It inserted 55,000 messages and created 10,000 users in 12 hours and did not slow down too much as more data was added (going from 650ms per insert to about 1000ms) which is a good thing.
After this mass of inserts I ran vacuumdb again, as much to see that things were ok as to try and improve performance, and ran into what seems to be a common error in vacuum (and which might be fixed in 7.2.3 or 7.3):
NOTICE: Analyzing forums_forums
NOTICE: --Relation forums_messages--
NOTICE: Pages 24227: Changed 17534, reaped 24093, Empty 0, New 0; Tup 72696: Vac 138195, Keep/VTL 0/0,
UnUsed 1446, MinLen 112, MaxLen 2032; Re-using: Free/Avail. Space 126919692/126914840;
EndEmpty/Avail. Pages 0/24126.
CPU 1.27s/0.28u sec elapsed 23.98 sec.
NOTICE: Index forums_messages_pk: Pages 603; Tuples 72696: Deleted 138195.
CPU 0.16s/0.46u sec elapsed 9.59 sec.
NOTICE: Index forums_mess_sk_forum_un: Pages 1182; Tuples 72696: Deleted 138195.
CPU 0.17s/0.58u sec elapsed 14.97 sec.
ERROR: No one parent tuple was found
vacuumdb: vacuum oatest failed
I ran a vacuumdb -z -v -t forums_messages oatest
rather than full vacuum which worked fine on that table so I
left it at that. The vacuum seemed to bring the average insert
time back down into the same neighborhood as when I started
(650ms or so) but after inserting 15,000 new rows it was back up
to 900ms which seemed unlikely. Looking at the nsd process I
saw it had grown to 300mb and some inserts were taking quite a
long time (the winners being two that took almost 5 minutes
each) and the longest ones correlated pretty well with swapping
activity. I used the telemetry.adp page to check and
there were no nsv's or ns_cache's which were unreasonably big so
I think it's pretty likely that it's just that there is some
sort of memory leak (possibly affecting only long running
threads) and I don't have enough memory to really spare 300mb
wasted in an nsd process. Simply stopping and restarting the
server made the insert process fast again.
After all this what do we have: 121 forums, 33,637 users, 161,378 messages, and 230,454 objects total. That seems like reasonable amount of data to actually start looking at whether forums will work in the "real world".
Lots of Data...Now What
Server startup
The first thing I like to do is restart the server and make sure that it does not take an egregious long time to restart. I first move the oatest-error.log file to a backup, restart the server, wait until says it is accepting connections, copy the error file to oatest-error.log.start, and restart the server again. That way I have a file with just the startup information. I grep out the timestamps and look for gaps with
egrep '^[[]' oatest-error.log.start | sed 's/[]].*//' | uniq
The overall start time is not terrible (3 minutes) so it's not a big deal but there are three gaps that are worth checking into:
- 13/Dec/2002:14:28:10 - 13/Dec/2002:14:28:21
- 13/Dec/2002:14:28:39 - 13/Dec/2002:14:29:04
- 13/Dec/2002:14:29:06 - 13/Dec/2002:14:30:46
The first is the annoyingly slow query to check that there is an admin defined on the system (it uses all_object_party_privilege_map). It could be made faster (more or less instantaneous) by using the acs_permissions_all view but that checks that there is a party assigned the admin permission but not that the party has any actual users as members. I would take this out for most production sites since you want the server to start up even if there is no admin so it's not clear why this is checked every time anyway.
The second two are the query dispatcher load of .xql files (also a little slower than I would like but hard to do much about). It's not obvious from the log that is what is going on (TODO: something we should fix, a starting .xql cache and finishing .xql cache whould be nice).
It may seem like a waste of time to worry so much about the startup time for the server but I have seen enough servers fall over under load or have unanticipated bugs that can cause them to crash that making sure the startup time is reasonable is the difference between a service being completely broken and merely a little flakey but still usuable while you spend the time to really fix the problems causing the server restart.
Using the package
At this point I turn developer support back on so I can see what is going on as I browse around and start looking at pages Starting with the "comp.lang.c" forum (12,417 messages in 1332 threads). Unfortunately it does not take much looking to find one that does not really work as the index.tcl page has a query that takes about 30 seconds (more than that just after a restart, and less once the data is in the kernel or postgresql caches).
24666 ms subquery dbqd.forums.www.index.select_forums: select nsdb0
select forums_forums_enabled.*,
(select count(*)
from forums_messages
where forums_messages.forum_id = forums_forums_enabled.forum_id
and 1 = tree_level(forums_messages.tree_sortkey)) as n_threads,
to_char(last_post, 'Mon DD YYYY HH24:MI:SS') as last_post,
case when last_post > (now() - 1) then 't' else 'f' end as new_p
from forums_forums_enabled
where forums_forums_enabled.package_id = :package_id
and (
forums_forums_enabled.posting_policy = 'open'
or forums_forums_enabled.posting_policy = 'moderated'
or 't' = acs_permission__permission_p(forums_forums_enabled.forum_id, :user_id,'forum_read')
)
order by forums_forums_enabled.name
Once again the likely culprit is the subselect so lets look at that first...
oatest=# explain select count(*)
from forums_messages
where forums_messages.forum_id = forums_forums_enabled.forum_id
and 1 = tree_level(forums_messages.tree_sortkey)
NOTICE: QUERY PLAN:
Aggregate (cost=54175.31..54175.31 rows=1 width=0)
-> Seq Scan on forums_messages (cost=0.00..54175.11 rows=80 width=0)
Nice and short plan, unfortunately it is a table scan. Why? the only two indexes on the table are the message_id primary key and one to enforce the uniqueness of the forum_id, tree_sortkey.
oatest=# \d forums_mess_sk_forum_un
Index "forums_mess_sk_forum_un"
Column | Type
--------------+-------------
tree_sortkey | bit varying
forum_id | integer
unique btree
This should probably have been created the other way around since you always know the forum_id and only sometimes know the tree_sortkey. Creating one on forum_id, tree_sortkey:
oatest=# create unique index forums_mess_forum_sk_un on forums_messages(forum_id, tree_sortkey);
Gives a plan that looks much better:
oatest=# explain select count(*)
from forums_messages
where forums_messages.forum_id = forums_forums_enabled.forum_id
and 1 = tree_level(forums_messages.tree_sortkey)
NOTICE: QUERY PLAN:
Aggregate (cost=48832.01..48832.01 rows=1 width=0)
-> Index Scan using forums_mess_forum_sk_un on forums_messages (cost=0.00..48831.81 rows=80 width=0)
and the execution time for the query is now 1500ms. It's still too slow, since the way I set things up there is one forum per forum site node so only scans one forum on the index page, but a real site (like openacs.org for example) will typically have multiple forums for a goven site node which will make this even slower.
Another thing to check is that the tree_level function is declared "with(iscachable)". It is which means we have to look elsewhere for speedups so first take a look at the plan for the whole query.
Sort (cost=5.04..5.04 rows=1 width=89)
-> Seq Scan on forums_forums (cost=0.00..5.03 rows=1 width=89)
SubPlan
-> Aggregate (cost=5899.73..5899.73 rows=1 width=0)
-> Index Scan using forums_mess_forum_sk_un on forums_messages (cost=0.00..5899.71 rows=10 width=0)
There is no index on package_id and adding one gives a better plan but the query is not really any faster.
Sort (cost=4.22..4.22 rows=1 width=89)
-> Index Scan using forums_package_id on forums_forums (cost=0.00..4.21 rows=1 width=89)
SubPlan
-> Aggregate (cost=5899.73..5899.73 rows=1 width=0)
-> Index Scan using forums_mess_forum_sk_un on forums_messages (cost=0.00..5899.71 rows=10 width=0)
There are not any obvious easy ways to fix this via indexes or tweaking the query, the problem is counting up the threads per forum is expensive so we will file it away as a potential issue (and something we could consider for either caching or denormalizing).
One good general technique to consider is whether there is something that is about as useful to the end user but easier to calculate. In this case we could eliminate the tree_level call if we were to get the count of posts rather than threads. Making that change, the query then takes 43ms. A much more pleasant result! We will still need to check the case where lots of forums exist under one forum package since doing the forum post counts per forum rather than with a subselect which does a group by/aggregate count(*) may still be a problem.
Another general rule is avoid functions in the where
clause. Having made that change I also realized that there is a
duplicated piece of data in the forums_messages table, it has
both a tree_sortkey and a parent_id field. Changing it to look
for parent_id is null rather than
tree_level(tree_sortkey) = 1 is just as fast as
counting posts and gets us thread count again. I prefer the
total number of posts rather than a thread count but that is
something for another article.
One other problem here is that the speed of this query is sensitive to whether or not the data is in the cache. Coming back to the page a few hours later I found loading the page took unreasonably long the first two times about 1 second for subsequent loads.
1st: 144775 ms subquery dbqd.forums.www.index.select_forums: select nsdb0 2nd: 84119 ms subquery dbqd.forums.www.index.select_forums: select nsdb0 3rd: 1287 ms subquery dbqd.forums.www.index.select_forums: select nsdb0
The moral of the story there is "buy memory and lots of it". Also caching is probably necessary as is seeding the cache. It might actually make the most sense to denormalize thread and post count since otherwise the response time can be unreasonably long if the DB does not fit in memory.
Onward
The next page is a list of all threads. It is fast enough except that it does not paginate the thread listing and with comp.lang.c you get 1332 rows in the html table and the page takes about 5 seconds to generate. We will skip that problem for now and take a look at a long thread.
The longest thread, which was someone looking for people to do his homework for $100, had 640 posts (aside, I am not sure why I have 640 and google only 279 but I ignore X-No-Archive and do no filtering other than removing duplicate messages). The db queries to pull out the whole thread are again plenty fast (about 1 second total) but generating the page takes 30 seconds, and again it's a matter of providing pagination.
Deleting Threads
Editing a post is fast (well fast enough), but deleting a thread is horrifying, this was the tail of one thread so was one post:
24932 ms subquery dbqd.forums.tcl.messages-procs.forum::message::delete.delete_message: 0or1row nsdb0
select forums_message__delete_thread(:message_id);
And this was a post with 5 followups:
123063 ms subquery dbqd.forums.tcl.messages-procs.forum::message::delete.delete_message: 0or1row nsdb0
select forums_message__delete_thread(:message_id);
So it scales about linearly with the number of posts and each delete takes about 25 seconds. Probably not acceptable...
Looking at the code for forums_message__delete
sheds no light on the problem. It simply calls
acs_object__delete which in turn constructs and
issues simple deletes for the tables from which the given
object inherits:
for obj_type
in select o2.table_name, o2.id_column
from acs_object_types o1, acs_object_types o2
where o1.object_type = (select object_type
from acs_objects o
where o.object_id = delete__object_id)
and o1.tree_sortkey between o2.tree_sortkey and tree_right(o2.tree_sortkey)
order by o2.tree_sortkey desc
loop
-- Delete from the table.
if table_exists(obj_type.table_name) then
execute ''delete from '' || obj_type.table_name ||
'' where '' || obj_type.id_column || '' = '' || delete__object_id;
end if;
end loop;
In this case there are two deletes issued:
delete from forums_messages where message_id = X; delete from acs_objects where object_id = Y;
Since these are simple deletes it's obviously the case that either one of the triggers is at fault (either an explicitly created OpenACS trigger or implicitly by a references clause). Deleting generic objects is not unreasonably slow so it's likely to be the forums_messages delete causing the problem.
Trying to use explain to figure out what the
problem is does not work that well since we get the following:
explain analyze delete from forums_messages where message_id = 347927; NOTICE: QUERY PLAN: Index Scan using forums_messages_pk on forums_messages (cost=0.00..3.35 rows=1 width=6) (actual time=26.33..26.33 rows=1 loops=1) Total runtime: 26.53 msec
The actual time to delete the row here is much longer so the result is quite deceptive and since we know it is deleting rows from this table that takes so long (and it also happens to be the only table big enough to cause a simple query to take 30 seconds to finish). Looking at the table definition there is an obvious problem; parent_id has a referential integrity constraint which references forums_messages:
parent_id integer
constraint forum_mess_parent_id_fk
references forums_messages (message_id),
there is no index on parent_id. That means when you delete a message from forums_messages postgres has to check there is no message whose parent_id references the message_id you are deleting. Without an index this means it has to do a table scan on this exceedingly large table.
Creating the index:
create index forums_messages_parent_id_idx on forums_messages(parent_id);
seems to solves all the problems and now deleting a single post is quite fast (63ms) and a post with 25 replies takes 1.6s (61ms per post).
Of course this all depends on the data being in the cache. It can still be frighteningly slow to delete if nothing is in the cache...here was the result of trying to delete another thread with 28 replies:
854959 ms subquery dbqd.forums.tcl.messages-procs.forum::message::delete.delete_message: 0or1row nsdb0
select forums_message__delete_thread(:message_id);
It's not clear to me if this can be improved upon. Certainly removing some of the referential integrity checks would make this faster but given how infrequent deletes would be I think it is not necessary. Also, given what I am running on the machine at any given time, there is not a lot of extra ram for the cache (and the data size at about 650mb is about twice as large as the kernel cache and 5 times as large a` the postgres shared buffer on the box on which I am testing).
