Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Geek Culture / MySQL - question about indexes

Author
Message
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 13:35
I'm playing about with Drupal (a CMS) and looking through the database... This is the node table:


And this is the index description...


Now - phpMyAdmin is complaining about fields having multiple indexes on them.

My question is: "Is this a problem?"

Although it may be inneficient on space/storage, if this is not an issue (which is rarely is nowadays) does it cause any other problems?

I have this query:


And it takes 0.4 seconds to run, which is pretty high for a MySQL Query. All fields seem to be indexed, but the n.nid field (for example) is indexed multiple times. Could this slow things down?

[center]
Kenjar
19
Years of Service
User Offline
Joined: 17th Jun 2005
Location: TGC
Posted: 28th Jul 2006 13:41
If there software is retreiving data without problems, then no. If you start getting SQL error messages, then yes there is.


[href]www.openoffice.org[href]
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 13:49
Errors are one thing - speed in another.

If there were error's I'd be ok - I can diagnose errors. This issue is an inneficient query.

running EXPLAIN on that query returns:

(appologies for the lack of readbility on that, I cant embedd HTML tables in this forum!)

[center]
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 28th Jul 2006 14:05
Having more than one index is fine, providing your query uses the RIGHT one when it runs! You can see in the explain that it had a choice of loads of indices to use, yet it settled on just using the PRIMARY key, which for that query isn't the best choice.

It looks like there are indexes on most of the columns, but not grouped together? For example in the query you are running, you have a WHERE clause on cn.cid, n.status and n.moderate - so ideally they should be collectively indexed for maximum speed, especially as the LIMIT will always slow it down slightly.

Bite my shiny metal ass
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 14:22 Edited at: 28th Jul 2006 14:25
@Rich - thanks for that explanation. Kind of makes sense...

This is the phpMyAdmin display of the table index structure...



[center]

Attachments

Login to view attachments
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 14:23 Edited at: 28th Jul 2006 14:26
This is the explain (easier to read)


[center]

Attachments

Login to view attachments
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 14:25 Edited at: 28th Jul 2006 14:26
And this is the table structure


How do I tell MySQL which index to use?

EDIT:
Also - Could you please explain what the explain table actually means? I'm not sure I understand it.

[center]

Attachments

Login to view attachments
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 14:28 Edited at: 28th Jul 2006 14:29
Actually, this is the explain. I changed the cn table in a vain attempt to solve it...


[center]

Attachments

Login to view attachments
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 15:48
Ahh I just read up on indexes - I get it now!!

So in the above node index list, there is no point having an index for the status column because status is the left most index in the node_status_type index. There is also no point having a nid index on its own as its the left most part of the primary key!

I'll post this up as a "bug" on the Drupal forum - I still dont get why that query runs slow though. Is it just the sheer number of fields?

[center]
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 28th Jul 2006 16:09
I can't add much to what Rich has provided (which is far more that I could have done), but I notice in you your query that you have 'WHERE cn.cid IN ...' - you may or may not notice a difference if you change that to 'WHERE n.cid IN ...'

It might make a difference, depending on whether mysql carries out the join before or after matching the IN list, and whether it realises that any selection on n.cid is the same as the same selection on cn.cid.

For free Plug-ins and source code http://www.matrix1.demon.co.uk
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 28th Jul 2006 16:57
I might also notice an error seeing as table "n" doesn't have a field called cid

Thanks IanM though - its those kinds of issues that only a 2nd pair of eyes can see! (also under that category there are the classic missing semi-colon, brace, etc, the rogue capital and finally - the spelling mistake in a variable )

[center]
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 30th Jul 2006 04:50
Bah, so we both had a typo

Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 30th Jul 2006 11:30
I dont think either of us did. You were unaware of the table structures and I dont think I've done any typo's (not any that make a difference)

[center]
Phaelax
DBPro Master
21
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 30th Jul 2006 20:51
Your version of phpmyadmin looks prettier than mine.

Now I know you can have primary and secondary keys, but I didn't think you could have 2 primary keys. (nid, vid)

"Using Unix is the computing equivalent of listening only to music by David Cassidy" - Rob Pike
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 30th Jul 2006 21:48
You're mixing the term key and index up (or maybe I am?!) but they're two different things.

The index that you reffer to can contain as many fields as you like. MySQL just reffers to them "from the top" (or left). So in your example, nid and vid can be a single index - however if you do a query such as "where vid = 1" without doing anything to nid, it seems MySQL will create a new temporary index for that field as you're not doing anything with the first field, nid.

I use the Artic Ocean theme from the phpMyAdmin site. Its much easier to use as everything is a touch smaller so you can fit more on the screen

[center]
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 30th Jul 2006 22:46
You ought to give SQLyog a try. I use it for all my MySQL work, it is the dogs nads. I think there is even a free version.

Bite my shiny metal ass
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 30th Jul 2006 23:39
I've given it a go before I think... I'll try it tomorrow - I just use phpMyAdmin because once its installed then you can use it anywhere you have internet access!

[center]
Phaelax
DBPro Master
21
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 31st Jul 2006 18:15
yea my mistake. I skimmed this thread rather quickly.
it does themes? sweet.

"Using Unix is the computing equivalent of listening only to music by David Cassidy" - Rob Pike
adr
21
Years of Service
User Offline
Joined: 21st May 2003
Location: Job Centre
Posted: 31st Jul 2006 18:59 Edited at: 31st Jul 2006 19:01
The most obvious detriment to having one too many indices is that any update/inserts will have to modify more indices! Writes can't really be cached (unlike reads) so any writes to disk are baaad, mkay.

Quote: "Could you please explain what the explain table actually means? I'm not sure I understand it."

Not sure if anyone covered this specifically, so I'll gloss over it. The most important column is the "extra" field as you already guessed. Temporary Tables/File Sorts are the worst offenders for speed, which usually come about from GROUP BYs and non-indexed ORDER BYs. Good example of how something as simple as ORDER BY RAND() can really mess up your day:
http://jan.kneschke.de/projects/mysql/order-by-rand/

As Rich mentionned, MySQL has a stab at picking the index for a query. If you look at the "Cardinality" of each index, I think that's how many unique rows can be identified from that index (which MySQL can use for simple COUNT(*) or FOUND_ROWS() queries). If your query does something obvious in the WHERE/JOIN clause, MySQL decides which index would lead to the smallest resultset, therefore speeding up GROUPs, LIMITs and ORDERs. If however, your query is too complex to optimise, it looks like it goes with the primary key.

To force the usage of an index do:



[center]
But you see, I have the will of the warrior. Therefore, the battle is already over. The winner? Me!
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 31st Jul 2006 19:39
Quote: "If you look at the "Cardinality" of each index, I think that's how many unique rows can be identified from that index"


Cardinality applies to columns (not indexes), but your description of it is bang on. A column that has values of 4, 1, 2, 4, 2, 6 has a cardinality of 4. An index always works better on columns with high cardinality, for example the age range of a person. A poor column to index would be something like 'male' or 'female', since the values can occur about equally, you'd get half the rows no matter which value you search for.

To speed up the optimizer the following tips help: Index SHORT values. For example if you only need to store a low number, don't use an INT. If you only need to store 20 characters for something, don't use a CHAR(100). Check to see which fields can be converted easily into enums or boolean fields. The smaller the value, the less I/O required, the quicker it'll be.

Index the PREFIX of a string value. For example if you've got a char(200) don't index the whole thing if it's likely to be different each time (i.e. peoples names), instead index the first 10 or 20 characters only.

Compare columns that have the SAME data type (by that I mean identical types.. i.e. compare an INT with an INT, not an INT with a BIGINT).

Avoid Type Conversion. For example if num is an INT, use:

SELECT * FROM tbl WHERE num = 4

and NOT

SELECT * FROM tbl WHERE num = '4'

It will have to do a type conversion on the 2nd one, which is slower. More importantly, if num was indexed a comparison involving a type conversion will often skip using the index at all.

You can run PROCEDURE ANALYSE in a table to see it recommend column types (based on the data contained) for you:

SELECT * FROM tbl PROCEDURE ANALYSE(16,256)

(btw, if you can afford it, get the MySQL book Third Edition by Paul DuBois (Developers Library) - it is the best resource you'll ever read on MySQL, way better than the online official docs)

Bite my shiny metal ass
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 1st Aug 2006 11:13
Cheers for your help here guys - there is a lot of good information here to take in.

RE That book... I have the second edition sitting in front of me - its currently raising my monitor up for 2 or 3 inches for a more ergonomic feel. I tend not to read programming books as I've found in the past that I learn more by jumping in the deepend, but if this is a recommended book then I'll give it a go.

I've always wondered what the Cardinality "field" was - its slightly clearer now. It sounds like its to do with how many unique values are in that column(?).

Question: What is the "Size" field for when I create a key in an index?

[center]
adr
21
Years of Service
User Offline
Joined: 21st May 2003
Location: Job Centre
Posted: 1st Aug 2006 11:58 Edited at: 1st Aug 2006 11:58
Quote: "It will have to do a type conversion on the 2nd one, which is slower."


Surely the query optimiser does that? It'll parse the query, check all query values and then look at the table's schema and do the cast then? Slower, yeah, but once the query's been parsed it'll go in the query cache. I guess every little helps, but unless I'm misunderstanding something, the benefit would be releatively small.

Quote: "What is the "Size" field for when I create a key in an index?"

I think thise refers to Rich's comments about keeping indices short...
Quote: "Index the PREFIX of a string value. For example if you've got a char(200) don't index the whole thing if it's likely to be different each time (i.e. peoples names), instead index the first 10 or 20 characters only."


So when indexing a person's second name, don't bother with the full whack - the first 10 characters would be MORE than enough, and would probably strike the balance between having an index which gives 1-1 matche (slow to update/seek) and one which gives 10,000 matches (pointless).

[center]
But you see, I have the will of the warrior. Therefore, the battle is already over. The winner? Me!
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 1st Aug 2006 12:09
Ok - so in that case, with the name example - if we have an index on first names and we make the index size something silly, like 2, then what happens if I search the following list for Nicholas?
Andy
Nick
Nicholas
Nicolas
Nicola
Nickie
Nikie
Nimrod
Niiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
Zebra

Does it just filter out the first and last entries so instead of searching through 10 rows it only has to do 8? After that it just makes a temporary index on the rest?!

Thanks for all your help - this could be VERY valuable!

[center]
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 1st Aug 2006 12:12 Edited at: 1st Aug 2006 12:17
Quote: "Surely the query optimiser does that? It'll parse the query, check all query values and then look at the table's schema and do the cast then? Slower, yeah, but once the query's been parsed it'll go in the query cache. I guess every little helps, but unless I'm misunderstanding something, the benefit would be releatively small."


Consider it more a case of 'good design' than pure speed optimisation. Yes the query optimiser does that, but my original comment was on helping the query optimiser do its job as fast as possible. You take a small performance hit on even using the optimiser (which is why in some instances, it's better to compile MySQL without it, although granted this is rare). Plus the more dangerous aspect would be if the index wasn't used because of a needless type conversion.

Bear in mind that the query cache doesn't hold the contents of your query for ever (on a popular server the cache can get filled up very quickly, so your query is dumped - also on tables as soon as you perform an insert or update, all caches are dumped). The cache only works on exact literal string text comparisons. So if your query is unique per user (say you are doing a WHERE including a user ID), the cache will only work for that single user as long as the query remains identical per request, times that by several thousand (or hundred of thousand) users and everything you can do to help the optimiser counts.

Bite my shiny metal ass
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 1st Aug 2006 12:15
Quote: "Does it just filter out the first and last entries so instead of searching through 10 rows it only has to do 8?"


Yes.

Bite my shiny metal ass
Dave J
Retired Moderator
21
Years of Service
User Offline
Joined: 11th Feb 2003
Location: Secret Military Pub, Down Under
Posted: 1st Aug 2006 12:19
Quote: "ow I know you can have primary and secondary keys, but I didn't think you could have 2 primary keys. (nid, vid)"


You can't have 2 primary keys in the one table, but you can have a composite primary key consisting of more than one column.


"Computers are useless, they can only give you answers."
adr
21
Years of Service
User Offline
Joined: 21st May 2003
Location: Job Centre
Posted: 1st Aug 2006 12:28
Someone should sticky this in the "How do I make a forum in HTML?" forum.

[center]
But you see, I have the will of the warrior. Therefore, the battle is already over. The winner? Me!
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 1st Aug 2006 13:29
@Rich: Thats some serious (almost "grasping at straws") optimising if you consider removing the optimizer because you want an extra ounce of power and you know you're queries are as good as they'll get!

@Dave: I get what you're saying, but I think thats what Phaelax meant.

@adr:Is that the forum thats next to the "How to make a MMORPG 24 hours"

[center]
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 18th Aug 2006 12:13
Another question....

Take this query:

... and column1 is the primary key and column2 is a seperate index. Will MySQL use two different indexes for the same query or will it try to use the same one for both and if an index that doesn't cover both doesn't exist it will make a temporary one (if that makes sense).

Basically, I have a query which takes 0.1 seconds but if I add to the where statement "and t.vid IN(1,2,3)" (where vid is an indexed column) the query time goes from 0.1 to 0.5 seconds!

Any idea's?

[center]
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 18th Aug 2006 13:43
It will use one index, whatever it deems the most suitable for its optimiser.

The problem with using 'IN' is that it has to check all of the rows because it cannot work it out on a computation level. If VID is an INT, and you only want values between 1 and 3, then I'd do vid > 1 and vid < 3 (or inclusive if needed, => <=) because it can then use math to shorten the dataset down (via your index).

God that read horribly, but I hope you get the drift. Try some variations and see.

"Bite my shiny metal ass" - Bender, Futurama
No pixels were harmed in the making of this post
"Don't ping my cheese with your bandwidth"
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 18th Aug 2006 13:49 Edited at: 18th Aug 2006 13:49
I got the drift... Unfortunately, those 1,2,3 are ID's that are preselected from a set of tickboxes (which take their values from the vocab ID's)...

So things to try:
1) Create a single index with tid AND vid as keys (rather than as two seperate indexes)
2) Don't use IN... Try using something else. Would it work better using OR? I did try that but it didn't seem quicker...

EDIT: btw, Thanks Rich

[center]
Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 18th Aug 2006 14:42
Try the single index route first. What version of MySQL are you running? I only ask because for the new TGC site I have been using Stored Procedures instead of queries from PHP, and it's significantly faster.

"Bite my shiny metal ass" - Bender, Futurama
No pixels were harmed in the making of this post
"Don't ping my cheese with your bandwidth"
CattleRustler
Retired Moderator
21
Years of Service
User Offline
Joined: 8th Aug 2003
Location: case modding at overclock.net
Posted: 18th Aug 2006 14:58 Edited at: 18th Aug 2006 14:59
IN is a form of OR, should not be too much speed difference in your query there. Other more complex queries may show a speed diff however.

Anyone check out SQL Server 2005 yet? We are using it at work and its the dogz arse imho. CLR Sprocs shall be uber. Basicially you write your sprocs not in t-sql but as vb.net/c# oop classes as functions and subs, then register the compiled dll assembly with the sql instance. precompiled

Science, Mathematics, and Physics do not lie - only people do.
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 18th Aug 2006 16:02
I'm running 4.1.20 on my dev server and about the same on the live server too.

Stored procedures sound interesting but sounds incannily like a MySQL 5 feature.

Cheers for you advice, I'll post back with results.

[center]
Kentaree
22
Years of Service
User Offline
Joined: 5th Oct 2002
Location: Clonmel, Ireland
Posted: 18th Aug 2006 16:08
Stored procedures are pretty common to all DBMSes, if I remember correctly it's been in mySql for quite a while

Richard Davey
Retired Moderator
22
Years of Service
User Offline
Joined: 30th Apr 2002
Location: On the Jupiter Probe
Posted: 18th Aug 2006 16:15
Quote: "I'm running 4.1.20 on my dev server and about the same on the live server too."


You need 5.x

Shame.

"Bite my shiny metal ass" - Bender, Futurama
No pixels were harmed in the making of this post
"Don't ping my cheese with your bandwidth"
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 18th Aug 2006 17:39
I'll bare that in mind though - I have 5.0 on my home server. I have a feeling our company is going to need a new server in the not too distant future judging by the amount of investment they're putting into the web side of things. Might suggested a nice and shiny new server which supports MySQL "natively" (things like Fedora Core 4 support 4.x natively and RHEL3 (our live server's OS) prefers 3.23! I had to get our hosts to manually install 4.1.19).

[center]

Login to post a reply

Server time is: 2024-11-17 11:47:35
Your offset time is: 2024-11-17 11:47:35