Application of Eigenvalues and Eigenvectors.
Application of Eigenvalues and Eigenvectors.
• At first glance, it seems reasonable to imagine that what a search engine does is
to keep an index of all web pages, and when a user types in a query search, the
engine browses through its index and counts the occurrences of the key words
in each web file. The winners are the pages with the highest number of
occurrences of the key words. These get displayed back to the user.
How it works:
• Modern search engines employ methods of ranking the results to provide the "best" results first that
are more elaborate than just plain text ranking. One of the most known and influential algorithms for
computing the relevance of web pages is the Page Rank algorithm used by the Google search engine.
It was invented by Larry Page and Sergey Brin while they were graduate students at Stanford, and it
became a Google trademark in 1998. The idea that Page Rank brought up was that the importance of
any web page can be judged by looking at the pages that link to it. If we create a web page i and
include a hyperlink to the web page j, this means that we consider j important and relevant for our
topic. If there are a lot of pages that link to j, this means that the common belief is that page j is
important. If on the other hand, j has only one backlink, but that comes from an authoritative site k,
(like www.google.com, www.cnn.com, www.cornell.edu) we say that k transfers its authority to j; in
other words, k asserts that j is important. Whether we talk about popularity or authority, we can
iteratively assign a rank to each web page, based on the ranks of the pages that point to it.
Google’s page rank:
• Google's extraordinary success as a search engine was due to
their clever use of eigenvalues and eigenvectors. From the
time it was introduced in 1998, Google's methods for
delivering the most relevant result for our search queries has
evolved in many ways, and PageRank is not really a factor
any more in the way it was at the beginning.
Example:
• Let us assume the web contains 3 pages
only . The author of the page A thinks
that page B,C has a good content and
links to them. . The author of the page B
thinks that page C has a good content and
links to them . The author of the page C
thinks that the page A has a good content
and links to them . The links between
these and the other pages in this simple
web are summarized in this diagram.
Cont.
• Considering Page A, it has 2 outgoing links (to
pages B ,C). So, in the first row of our "links
matrix", we place value 1/2 in column 2,3. The
rest of the columns in row 1 have value 0,
since Page A doesn't link to any of them
• Meanwhile, Page B also have one outgoing
link, to pages C So in the second row we place
value 1 in column 3 and 0 in the rest.
• Page C have a one outgoing link, so in the
third row we place 1 in column 1 and 0 in the
rest
Cont.
Here Teleport /dumping
factor is the eigen value
Which can be derived using
some formula.
Cont.
• Here the eigen vector is
[M^3]T=[0.85,0.64,1.065]
• As Page C(3) has the highest PageRank
(of 1.065 in the above vector), we
conclude it is the most "important", and
it will appear at the top of the search
results.
• For higher order matrices only, we must
find eigen values and eigen vectors using
some formula to find their page rank.
Search engine reality checks:
1.Our example web above has 3 pages, whereas Google and Bing and other search engines)needs to cope with
billions of pages. This requires a lot of computing power, and clever mathematics to optimize processes.
2.PageRank was only one of many ranking factors employed by Google from the beginning. They also looked
at key words in the search query and compared that to the number of times those search words appeared on
a page, and where they appeared (if they were in headings or page descriptions they were "worth more"
than if the words were lower down the page). All these factors were easy to "game" once they were known
about, so Google became more secretive about what it uses to rank pages for any search term.
3.Google currently use over 200 different signals when analyzing Web pages, including page speed, whether
local or not, mobile friendliness, amount of text, authority of the overall site, freshness of the content, and
so on. They constantly revise those signals to beat "black hat" operators (who try to game the system to get
on top) and to try to ensure the best quality and most authoritative pages are presented at the top.
4.Further reference: https://www.youtube.com/watch?v=0eKVizvYSUQ&feature=youtu.be
Barath Sanjay.S
Harish.B
Moulish Arunachalam.G
Vaishnavi.K
Karthikeyan.K