sql query query - get most relevant

Status
Not open for further replies.

ozonew4m

ThisTimeNextYearRodders
Jul 6, 2007
234
3
0
At my desk!
what is the best way to setup a mysql table and query to get the best search results?

php by the way :)

so for example...

i have a database of thousands of "bodytext" rows containing articles,blog posts,comments,forums posts etc...

I also have a list of keywords/phrases in an array that i would like to match to the best possible "bodytext"

what would be the best way to setup my table and sql query to get the best possible "body text" for my keyword or phrase and / or vice versa..

hopefully I can pull multiple rows sorted by relevance?

full text?
blob?
...
..

@mysql(...............) ?

:anon.sml:
 


what would be the best way to setup my table and sql query to get the best possible "body text" for my keyword or phrase and / or vice versa..

hopefully I can pull multiple rows sorted by relevance?

The best way to sort by relevancy is to index each body text by words and word count. So you would have two new tables - one to hold the words and one to hold the appearance of each word in a content block:

Code:
CREATE TABLE `keywords` (
  `id` int(10) NOT NULL auto_increment,
  `keyword` varchar(50) NOT NULL default '',
  PRIMARY KEY  (`id_keyword`),
  UNIQUE KEY `keyword` (`keyword`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 PACK_KEYS=0;

CREATE TABLE `keywords_blocks` (
  `id` int(10) NOT NULL auto_increment,
  `id_keyword` int(10) NOT NULL default '0',
  `id_block` int(10) NOT NULL default '0',
  `word_count` int(5) NOT NULL default '0',
  PRIMARY KEY  (`id_keyword_page`),
  KEY `id_keyword` (`id_keyword`,`id_sp_page`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 PACK_KEYS=0;
Once you have created those and filled them (with a spider function you'll need to write) you can then search by your keyword(s). For each keyword you want to run a query something like:
Code:
SELECT distinct b.id, word_count
                        FROM blocks b
                            INNER JOIN keywords_block kb ON b.id = kb.id 
                            INNER JOIN keywords k ON k.id = kb.id_keyword           
                        WHERE k.keyword = '$searchword'
                        GROUP BY b.id
the result of that query you can then turn into an array something like this:
Code:
    [keyword] => Array
        (
            [id] => Array
                (
                    [0] => 1434
                    [1] => 1501
                    [2] => 1502
                    [3] => 1503
                    [4] => 1507
                    [5] => 1510
   
                )

            [weight] => Array
                (
                    [1434] => 2
                    [1501] => 4
                    [1502] => 2
                    [1503] => 1
                    [1507] => 1


                )
where the id is the id of your block, and the weight its relevancy.

There are bound to be some open source project around that will do most of the work for you here. When I did it (for a site search function) I coded it all myself and it took days, although the hard part was coding the tweaks to relevancy.

Cheers,

Leon
 
Your way is bad ass dont get me wrong, and nice work on the yahoo account thing btw, but full text indexing is just ... simple, quick and dirty. Like my girlfriend should be.
 
Full text indexing, not isam, google has the proprietary shit going on. Also, I meant match against, not where in.


I was being retarded this morning.
 
Thanks fellas.. I appreciate the help... I might just pull each text item and parse with php and then re store it properly for easier searching....
 
Status
Not open for further replies.