0% found this document useful (0 votes)
962 views

Search ENgine

The document provides an overview of search engine technology. It discusses the history of early search tools like Archie, Gopher, and Wandex. It then explains that search engines use software to search databases for keywords and return relevant results. The document outlines the key components required to build a search engine, including indexing content, handling queries, and ranking results by relevance. It also discusses challenges like ambiguous queries, keeping large indexes up to date, and ensuring searches return useful information to users.

Uploaded by

Praveen Yadav
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
962 views

Search ENgine

The document provides an overview of search engine technology. It discusses the history of early search tools like Archie, Gopher, and Wandex. It then explains that search engines use software to search databases for keywords and return relevant results. The document outlines the key components required to build a search engine, including indexing content, handling queries, and ranking results by relevance. It also discusses challenges like ambiguous queries, keeping large indexes up to date, and ensuring searches return useful information to users.

Uploaded by

Praveen Yadav
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Search Engines: A Technology Overview

Parveen Yadav 4th year SMEC, neemrana

HISTORY

Archie First search tool for the Internet Gopher indexed plain text documents Jughead searched the files stored in Gopher index systems

Wandex first Web search engine

Introduction of Search Engine

Search engine is a software program that searches for sites based on the words that you designate as search terms. Search engines look through their own databases of information in order to find what it is that you are looking for. Search engine is the popular term for an Information Retrieval (IR) system.
3

Purpose of Search Engines

Helping people find what theyre looking for


Starts with an information need Convert to a query Gets results

No need to remember exact URL Materials available

Web pages Every file data and formats


4

What does it take to build a search engine?


Decide what to index Collect it Index it (efficiently) Keep the index up to date Provide user-friendly query facilities Provide search forms

Match the query terms with words in the index Sort documents by relevance

Display results
5

Search Processing

Search is Mostly Invisible


Like an iceberg, 2/3 below water user interface

content

search functionality

Text Search vs. Database Query


Text search works for structured content Keyword search vs. SQL queries Approximate vs. exact match Multiple sources of content Response time and database resources Relevance ranking- important Works in the real world (e.g. EBay)

Search is Only as Good as the Content


Users blame the search engine


Even when the content is unavailable

Understand the scope of site or intranet


Kinds of information Divided sites: products / corporate info Dates Languages Sources and databases... Update processes
9

What the Index Needs

Basic information for document or record


File name / URL / record ID Title or equivalent Size, date.

Full text of item More metadata

Product name, picture ID Category, topic, or subject Other attributes, for relevance ranking and display
10

Making a Searchable Index


Store text to search it later Many ways to gather text

Crawl (spider) via HTTP Read files on file servers Access databases (HTTP or API) Data silos via local APIs Applications, via Web Services

Security and Access Control


11

Simple Index Diagram

12

Index Issues

Metadata
Explicit (tags) Implicit (context)

Semantics
Database fields XML tags and attributes

13

How search engine works

Spiders

Robots

Robot Indexing and Web Crawling

15

Robot Indexing and Web Crawling

16

Search Query Processing


What happens after you click the search button, and before retrieval starts. Usually in this order

Handle character set, maybe language Looks and organize the query Look for field names or metadata Extract words (just like the indexer) Deal with letter casing
17

Resolving a Query

Consider ( cat hat mat )

Select a word from query ( cat )


Retrieve the list for the word cat

Process the list and for each document add weights to the accumulator based on doc length.
Find the best ranked document and look up the mapping table. Retrieve and Summarize the docs.

18

Retrieval = Matching

Single-word queries
Find items containing that word

Multi-word queries: combine lists


Any: every item with any query word All: only items with every word Phrases: find only items with all words in order

Boolean and complex queries


Use algorithm to combine lists
19

Why Searches Fail


Empty search Nothing on the site on that topic (scope) Misspelling or typing mistakes Vocabulary differences Restrictive search defaults Restrictive search choices Software failure

20

Relevance Ranking

Theory: sort the matching items, so the most relevant ones appear first Can't really know what the user wants Relevance is hard to define and situational Short queries tend to be deeply ambiguous
What do people mean when they type bank?

First 10 results are the most important The more transparent, the better

21

Other Algorithms

Vector space Probabilistic (binary interdependence) Fuzzy set theory Bayesian statistical analysis Latent semantic indexing Neural networks Machine learning

22

Search Results Interface


What users see after they click the Search button The most visible part of search Elements of the results page

Page layout and navigation Results header List of results items Results footer

23

Google Results

24

Yahoo Results

25

- Searching
technique

Google use spiders Large index of keywords.

Googles PAGE RANK . 1. Frequency and location of keywords within the Web page 2. Web page history. 3. Number of other Web pages that link to the page in question
26

Search Will Never Be Perfect


Search engines cant read minds


User queries are short and ambiguous

Things that help


Design a usable interface Show match words in context Keep index current and complete Adjust heuristic weighting Maintain suggestions and synonyms Consider faceted metadata search
27

THANK YOU

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy