Search ENgine
Search ENgine
HISTORY
Archie First search tool for the Internet Gopher indexed plain text documents Jughead searched the files stored in Gopher index systems
Search engine is a software program that searches for sites based on the words that you designate as search terms. Search engines look through their own databases of information in order to find what it is that you are looking for. Search engine is the popular term for an Information Retrieval (IR) system.
3
Match the query terms with words in the index Sort documents by relevance
Display results
5
Search Processing
content
search functionality
Product name, picture ID Category, topic, or subject Other attributes, for relevance ranking and display
10
Crawl (spider) via HTTP Read files on file servers Access databases (HTTP or API) Data silos via local APIs Applications, via Web Services
12
Index Issues
Metadata
Explicit (tags) Implicit (context)
Semantics
Database fields XML tags and attributes
13
Spiders
Robots
15
16
Handle character set, maybe language Looks and organize the query Look for field names or metadata Extract words (just like the indexer) Deal with letter casing
17
Resolving a Query
Process the list and for each document add weights to the accumulator based on doc length.
Find the best ranked document and look up the mapping table. Retrieve and Summarize the docs.
18
Retrieval = Matching
Single-word queries
Find items containing that word
20
Relevance Ranking
Theory: sort the matching items, so the most relevant ones appear first Can't really know what the user wants Relevance is hard to define and situational Short queries tend to be deeply ambiguous
What do people mean when they type bank?
First 10 results are the most important The more transparent, the better
21
Other Algorithms
Vector space Probabilistic (binary interdependence) Fuzzy set theory Bayesian statistical analysis Latent semantic indexing Neural networks Machine learning
22
Page layout and navigation Results header List of results items Results footer
23
Google Results
24
Yahoo Results
25
- Searching
technique
Googles PAGE RANK . 1. Frequency and location of keywords within the Web page 2. Web page history. 3. Number of other Web pages that link to the page in question
26
THANK YOU
28