Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Loading...
User Settings
close menu
Welcome to Scribd!
Upload
Read for free
FAQ and support
Language (EN)
Sign in
0 ratings
0% found this document useful (0 votes)
14 views
ML 5
Uploaded by
read4free
AI-enhanced
Copyright:
© All Rights Reserved
Available Formats
Download
as PDF or read online from Scribd
Download
Save
Save ml5 For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
ML 5
Uploaded by
read4free
0 ratings
0% found this document useful (0 votes)
14 views
20 pages
AI-enhanced title
Document Information
click to expand document information
Original Title
ml5
Copyright
© © All Rights Reserved
Available Formats
PDF or read online from Scribd
Share this document
Share or Embed Document
Sharing Options
Share on Facebook, opens a new window
Facebook
Share on Twitter, opens a new window
Twitter
Share on LinkedIn, opens a new window
LinkedIn
Share with Email, opens mail client
Email
Copy link
Copy link
Did you find this document useful?
0%
0% found this document useful, Mark this document as useful
0%
0% found this document not useful, Mark this document as not useful
Is this content inappropriate?
Report
Copyright:
© All Rights Reserved
Available Formats
Download
as PDF or read online from Scribd
Download now
Download as pdf
Save
Save ml5 For Later
0 ratings
0% found this document useful (0 votes)
14 views
20 pages
ML 5
Uploaded by
read4free
AI-enhanced title
Copyright:
© All Rights Reserved
Available Formats
Download
as PDF or read online from Scribd
Save
Save ml5 For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download as pdf
Jump to Page
You are on page 1
of 20
Search inside document
Fullscreen
cd UNIT 4 ae RECURRENT NEURAL NETWORK, LONG SHORT-TERM MEMORY, GATED RECURRENT UNIT, TRANSLATION, BEAM ‘SEARCH AND WIDTH, BLEU SCORE, ATTENTION MODEL RL CNT SSRN Q.1. Explain in detail about recurrent neural network. Ans. A recurrent neural network (RNN) is an extension of feedforward neural network, input. The RNN handles the v State whose activation at eac a conventional which is able to handle a variable-length sequence ariable-length sequence by having a recurrent hidden h time is dependent on that of the previous time. More formally, given a sequence x= (2, 3. X,), the RNN updates its recurrent hidden state h, by 0, t=0 B= Lodhix), otherwise @ where ¢ is @ non-linear function such as cor With an affine transformation. Optionally, the RNN may heve ac output YO Ya, = Fa) which may again be of variable length, Traditionally, the update of the recurrent hidden state in ‘equation (i) is implemented as Imposition of a logistic Sigmoid y= e(Wx, + Ub), ‘whete gis a smooth, bounded function such as a lo hyperbolic tangent function 4 generative RNN outputs Stribution ov E \e represent (ii) tic sigmoid function or gi ofth 4 probability distribution over the next element adi state h,,and this gener. tive model can capture er sequences of variable length by using 4 special output symbol 'MPOSed into : : pee : Pir, *) PO) PO | 41) POX | xy, x2) PAXp | x, Xp) «iy yy, ~_ ee126 Machine Leaming (Vi-Sem) where the last clement is special end-ofsequence vatue conditional probability distribution with 5 POL Ris ewes Xe) = wh) ‘where ty is from equation (). We modht Q.2. Briefly explain the term long short-term memory unit. Ans. The long short-term memory (LSTM) unit was intially proposed by Hochreiter and Schmidhuber. Sines then, a number of minor modifications i the original LSTM unit have been made. Unlike to the recurrent unit which simply computes a weighted sum oft input signal and applies a non-linear function, each j* LSTM unit maintains. memory ¢} at time, The output hi}, or the activation, of the LSTM units the hj = of tanh (Ct), where 0} — An output gate that modulates the amount of memory ens exposure. “The output gate is computed by of = o€Wors + Usb +Voe) =A logistic Sigmoid function V =A diagonal matrix. is updated by partially forge where The memory cell cf ih oj = fly tie ‘memory content i, where tbe new memOry SAME st to which the existin ‘content is ‘each time-step from sing the existing mem d- ‘and adding a new memory content i js mosultd? 1g memory is foroten ‘ate ‘The extent ee 1 phe deg tic EM To forget gate ft. 2 yam npr ete GH pepe? pie ocwen Um el is ts 4+ Vert : i= ocwin, Ube! 4 sce ise par ¥gand Vi 906 Ging ich One wt ne she traditional Umer ini ‘Unlike t© jon (ii)in 1 equation Unit lV 127 ep the exit mem? ‘importa wy sae eas ii the existence of the feature) ry via the introduced gates. Intuitively ifthe LSTM feature from an iteasily carries input sea eon theese a fong distance, hence, ence deen fe Here, Fand oare the input, forget and ee ively. ¢ and © denot cs espentvely. and © denon the memory cell and the new memory Fig, 4.1 Long Short-term content is shown in fig. 4.1 Memory (03: What do you mean by gated recurrent unit ? Explain. “dns. A gated recurrent unit (GRU) was proposed by Cho et al. to make cach eouent unit to adaptively capture dependencies of different time scales. Silay to the LSTM unit, the GRU has gating units that modulate the flow of infomation inside the unit, however, without having a separate memory cell The activation hi of the GRU at time tis a linear interpolation between the previous activation hj , and the candidate activation fh) : hi d-ziyni +2) Bi, “i where an update gate z} decides how much the unit updates its activation, or Content. The update gate is computed by she 2 = ofW,x, + Uh 4)? ‘This procedure of taking a linear ev compat oa kine linea sum between the existing Sate andthe 's similar the LSTM unit. The GRU, however, does ty makes the unit act as ifit is reading allowing it to forget the previously Say MPU sequence, TRA128 Machine Learning (VI-Sem) ‘The reset gate xj to the update gate — j q is computed similarly OCW, x, + Uyhy y)! Here, and 2 ae th 2 re the rest and pa gates, and h and h are the activation. me candidate activation. The ated recent is shown in fig. 4.2. : Fig. 4.2 Gated Recurrent Uni @.4. How can we use SMT to find synonyms ? Ans. The word “ship” ina 2 ina particular context cam be translated to ano . err word “ship” is synonymous with the word “transport”. Sp, our example abowe fof a query such as “how to ship a box” might have the same translation “how to transport box” 3 “The search might be expanded to include both queries ~"how to ships box” as well as “how to transport a box” ‘A machine translation system may also collect information about work in the same language, o lear about how those words might be related. Q.5. Write short note on beam search. “Ams. Neural sequence models are widely used to mode time-series Equally ubiquitous is the usage of beam search (BS) as an approxinwt tartpence algorithm decode ourput sequences from thes models. BSexplrs greedy left-right fashion retaining oly te OPP candids ane aetting in sequences that differ only stiBbly from each other. “The most prevalent method for approxima decoding is ih od the top-B highly scoring: ates at each time step; where B ons bel by BS atthe sat oF ‘Let us denote the set of B solutic ee rey. ane oe, fet At ‘each time step, BS on e ; be single token Tabastons ‘of these beams (ae Oe rs oh y i sons. More formally at 8°? selects the B most likely extens! Oe atid acgraxa es oy sien Fes an be sivial See ee advantages of beam search. lof beam search are as follows — r-identical beams make BS ac .e computation being repeat reduction of ea computationally ction of with essentially the same ted for hm, stl aT in in performance: oa mismatch i.e. improvements in posterior rosary corresponding 10 improvements in task-specific netesractce to deliberately throtile BS to Become & poorer om algorithm by using reduced beam widths. This treatment of an ce aiorim as 0 byper-parameter is nol only tellectually opting bt also has a significant practical side-effect — it leads to the tf arely bland, generic, and “safe” outputs, © always saying “I ort know” in conversion models Most importantly, ack of diversity in the decoded solutions is findamenily crippling in AI problems with significant ambiguity ~e.g. there tromulipie way of describing an image or responding in a conversation that Se as sine! to capture this ambiguity by finding several Q7. Explain the term BLEU score. Ans, BLEU (BiLit i eee ais Understudy) is an algorithm that was Pomel aluaichow accuntemachine translatedtextwas, Here, same approach to evaluate the quality of the text response that we is com130 Machine Leaming (Vi-Sem,) This are the BLEU seores from n-gram Se that it gives a higher score for lower n, case zero for 4-grams as no sequence of sentences. Tis isthe general methodology As mentioned earlier, BLEU score helps us to determine the next step for our model, As depicted in fig. 4.3, the methodology behind using BLEU score is to improve our model. A low score indicates that may be the performance is not as good as expected and so, we need to improve our fc
? in fig. 46. Initialize V(s) to arbitrary values Repeat ForallseS Forallaca QGa—Hesal ty Vo) mara Q6,2) UnulV(9) Converge . Fig. 46 Sea bevels converged if the maximum value if sis less than a certain threshold 3. zy Pelsave) eo stereos? fe Malv(s) —vi(g}
A, which make decisions without the need for optimization procedures on a value function, mapping representation of the states to actions selection probabilities. The value function is known as the critie Q§-SxA > R, which estimates the expected return to reduce variance and accelerate learning, mapping states to expected cumulative future reward. Fig. 4.8 shows an architecture design, the actorand criticare two iment Layee separated networks share a Extraction common observation. At each '*¥"* step, the action selected by actor network is also an input factor to the critic network. In the process of policy improvement. the critic Owtt £ network estimates the state-action “**F eee value of the current policy by Fig. 4.8 Actor-critic Neowork DQN, then actor network updates its policy in a direction improves the @ value. Compared with the previous pure policy-gradient methods, which do not have a value function, using a critic network to evaluate the current policy is more conducive to convergence and stability. The better the state-aetel value evaluation is, the lower the learning performance’s vane ee important and helpful to have a better policy can ae mani Policy-gradient-based actor-critic algorithms are use! variance 1 policies using low-¥a applications because they can search is ee aioe ew i all ha J, tosolve simul gradient estimates. Lillicrap et al. presented 1 ores combines the actor-critic approach with insights pune Nol tasks 18 physics tasks and ithas been widely used in ™4"Y 7c policy FFL ERG 442: determ! the actor networ xt policy: Bone neon rte Om ee critic networ a LE de noitob; yas! tee oe aeiany of LEE: e mi‘SUPPORT VECTOR MACHINES, BAYESIAN LEARNING, { APPLICATION OF MACHINE LEARNING IN COMPUTER — NISION, SPEECH PROCESSING, NATURAL LANGUAGE PROCESSING ETC, CASE STUDY — IMAGENET COMPETITION i What are support vector machine ? Explain, . auacinns Tt Vector machines are supervised learning models with «| dasitenicaite seortim that analyze date aft a they are used for ord ster etion refers to which images ane related to which class ape Botts. In machine leaming claseen ters considered fom lbeied qo learning which refers te aack of inferring a function — Mana Meni mage ge a dala in image ronieval pose cn be Eo diferent ene het Putin an partieutar clase Woon nm which he moe oh MBE Inthe SVM taining alee smodel rns 24° assigned to one category clea peat gn ategory class or TE a yet amt oF examples in ceegorne with each class ne and closet data por int is called MEIN OF sepa n less be the Machine is tions very high ition io meno 4 particular hyper eh ean be controlled144 Machine Leaming (VI-Sem) an important role inthe operon of ths class of ean define support vectors a the elements of taining dae tio ing hyper-plane in SVM tr emoved. As maximum-margin hyperplane and © with samples from two classes and these samples Support vectors or we can say that these are data decision surface. at Would ch aining algorithm if margins for an SVM tai On the margin are eaieds, Point that lies closest ne ‘they ae 2.2. Discuss key idea of the support vector machines (svMy. (RGRY, Now. 2018) ‘works on the principle of margin calculation. It basicaly, the classes. The margins are drawn in such a fashion th the margin end the classes is maximum and hence, error. The working of SVM is given in fig. $1. ‘draw margins between atthe distance between ‘minimizing the classification Suppor veto Fig, 5.1 Working of Support Vector Machine Input ~S, 2, T, k Initialize ~ Choose w, si. ws 1% Ton Choose Ay cS, where | Ay [=k Fort Set AT = {Quy)e Ag tyO¥Ex)
(i) Weean draw lines/planes at any angles (rather than just horizonta or vertical as in classic game), ii) The objective of the game is to segregate balls of different colors in different rooms. Ans, Let m-dimensional inputs x,(i = 1, M) bel 2 (= 1, nun M) belong to class 1 ot and the associated labels be y;= 1 for class I and ~ 1 for class 2. When these ‘ta are linearly separable then decision function can be determine a De)= wxtb ‘where W represents an m-dimensional vector, bisa scalar and for i visa feth for y,=+1 oO) Sl, for yj=-1 Equation (ii) is equivalent to ~ YiOw'x, Hb)21, fori i) ‘The hyperplane — Ft liv D&S Wxtbee, for-1 cect > jaa Unity 147 ums « separating hyperplane which separates = 1, yp. 1¢e 9 the separating hyperplane is inthe middle ofthe two separating hyperplane iipe= 1 and 1. The distance between the separating hyperplane andthe ining datum nearest tothe hyperplane is known asthe marin, Considering that the hyperplanes D(x) = 1 and ~1 includes at least one training datum, the hyperplane D(x) = 0 has the maximum margin for —1
0) is intr hy roduced in equation (ii) YiOWX, +B)ZI~E fori = 1, anny M " For he taining data x, when 0<8,<1, thedatadonothave them thargin but are still correctly classified. But if&> 1, the data are misinny by the optimal hyperplane, To oblain the optimal hyperplane in whch gu umber of training data which do not have the maximum margin is minimur ‘we require to minimize as given below — w= YE) ia 1, foré>0 whee E) Se lates But this is « combinatorial optimization problem and is hard to solve. in place of, we consider minimizing — = subject to the constraints — yilw'x; +b)21-&), for x) where C represents the upper bound which determines the tradeoff between ‘the maximization of margin and minimization of classification error, and is ct ‘toa large value. We call the achieved hyperplane soft margin hyperplane. Similar to the Tinearly separable condition, introducing the Lagrange multipliers « and f, we get ~ ote : u Qew,b..0.8) = sllwih *CQE Deaslyi(winrb)-1+5)-D i “eos “The conditions of optimality are given by — 2Q(w" bese" BH) _ 9 (xxi) & 240" br haP) _ 4 (i) ow QW" brea HY) _5 “cos? a daw? oF od 7 : Unity 151 Using equations (xxi) and (xxil)}(sxiv) reduce, respectively, to Salyi=0, BfeOy or ay a Mi . em Sialyin;, 0720, for = iM a a G14) =C, aF,8120, fori = |, M (avi) Pee ei alow ing aa peoolem. Nesey finde (i=,...M) that maximize a) ai) | subject to the constrains - ¥ i a ees | ‘which is similarto the linearly separable condition. According to Kuhn-Tucker’s condition, the optimal solution satisfies — ay(yj(w'x; +b)—1+8)) =O om) Bibi = (C-ay) =0 (cexxi) ‘Therefore, there are three cases for a @ a@j=0. Then & =0. Hence x, is correctly classified. 5 (0 < aj
0. Hence xs support vector and if 0.<&, <1, x, is comectly ctassified and if & > 1.x, is misclassified, For separable and nonseparable conditions the decision function is the Same and is given by — u DO= Satyxtx+o* (xxi) : a Since ‘are nonzero for the support vectors, the summation in equation D added only forthe support vectors a152. Machine Learning (VI-Sem.) ‘The unknown datum x is classified as follows — class 1, xe class 2, ifD@)>0 otherwise. (Cox Hence, when training data are separable, the region {x |1 > D(x) > 1 a generalization region. i Q.5. Differentiate between linear and non-linear SVM classifiers. Ans, The author has compared linear (LDA) vs. non-linear (SVM, NN) classifiers to classify the EEG signal for five different mental tasks. Brain signal for each task (relaxing, writing a letter, solving puzzles, counting, and rotating objects) has been taken for some seconds, and each task is repeated five times. The comparison between linear and non-linear classifiers based on five different mental tasks is described below — ‘We can conclude that non-linear classifier gives better classification result ‘as compare to linear classifier. Non-linear classifier specially SVM gives better result as compare to linear classifier (LDA) for high-dimensional nature of the ‘ERG signal. In multilayer back propagation network, NN include the problem of selecting proper number of hidden units, so it cannot give proper ‘generalization model. With the help of genetic algorithm, SVM has recovered ‘that problem. By selecting the subset of features efficiently from the large feature space, genetic algorithm provides better generalization approach. As ‘SVM can select a small and efficient subset of features, then the number of required electrodes will be les, so the noise will be minimized, and accuracy ‘will be improved for all mental tasks. For different window sizes, error rates bf three classifiers are calculated in each mental tasks. Window size in each twial defines the time-wise classification of different mental tasks. Error rates of SVM are less than other classifiers due to efficient use of GA with SVM 0.6. Describe margin and hard support vector machines (SVM)- Ans. Let $= (X45¥4)s4(%m,Ym) be a training set of examples, whe each x, €R" and y\ € {1}. We say that this training set is linearly separ®l°> if there exists « half space, (m, b), such that y, = sig {(0v,x;) +] Ca! Aliematively, this condition ean be rewritten as em). y,l(w.ay) + b]> 0 sae All half spaces (w; b) that satisfy this condition are ERM MyM (Gheir 0-1 error is zero, which is the minimum possible error). For a2) gu training sample, there are many ERM half spaces. Which one of ‘N= the learner pick? a as oo pe Unity 153 Consider, for example, the training set described in the picture that follows ~ ‘while both the dashed and solid hyperplanes separate the four examples, our intuition would probably lead us to prefer the black hyperplane over tye ereen one. One way to formalize this intuition is using the concept of margin. The margin of a hyperplane with respect to a training set is defined to be the minimal distance fpetween a point in the training set and the hyperplane. If a hyperplane has a large margin, then it will still separate the training set even if we slightly perturb each instance. ‘We will see later on that the true error of a half space can be bounded in terms of the margin it has over the training sample (The larger the margin, the smaller the error), regardless of the euclidean dimension in which tis half space resides. Hard-SVM is the leaming rule in which we retum an ERM hyperplane that separates the training set with the largest possible margin. To define Hard ‘SVM formally, we first express the distance between a point x (03 hyperplane Using the parameters defining the half space. (0.7. What is Bayesian learning ? Also deserbe the features of BAP" learning methods. includes methods ‘Ans, Bayesian learning is based on Bayes theorem and inca that utilize probabilities. Existing knowledge can be incorpor of initial probabilities. “The features of Bayesian learning methods are 28 F108 () Each observed training example can inerenett increase the estimated probability that a hypothesis O°" This provides a more flexible approsch to lenin completely eliminate a hypothesis if 18 f ‘example, (Gi) Bayesian methods can accommosst® Probabilistic predictions. E (ii) New instances can be classified le hypotheses, weighted by their POON, prove cat iv) where Bayesit gr dec ince, hey can poss hich other prtcal methods ca f= = shat ak ypotes? pre ing fed by comin154 Machine Leaming (VI-Sem.) the final probability of a hypotheses. In Bayesian lea provided by asserting. distribution over observed data for each possible hypothesis, (9)Prior knowledge can be combined with observed data tog let ming ming, prior knowle ‘dgeig A prior probability for each eandidate hypotheses and a probabil ‘Obability Q.8. What are Bayesian classifiers ? Ans. Bayesian classifiers are statistical classifiers. They can predict class ‘membership probabilities, such as the probability that a given tuple belongs to. a particular class. Bayesian classification is based on Baye’s theorem. Studies comparing classification algorithms have found a simple Bayesian classifier known asthe naive Bayesian classifier to be comparable in performance with decision ree and selected neural network classifiers. Bayesian classifier have also exhibited high accuracy and speed when applied to large databases. ‘Naive Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values ofthe otheratributes. This assumption is called class conditional independence. tis made to simplify the compuiativ: involved and, in this sense, is considered ‘naive’. Bayesian belie/networks a graphical models, which unlike naive Bayesian classifiers, allow the representation of dependencies among subsets of attributes. Bayesian belief networks can also be used for classification. Q.9. Explain naive Bayesian classification in detail Or Explain naive Bays classifier with example. IR.GR, May 2019 (VILI-Sem.)] Ans, The naive Bayesian classifier or simple Bayesion classifier, works follows ~ (®) LetD bea training set of tuples and their associated class labels [As usual, each tuple iS represented by an n-dimensional attribute veel X= (%), %, ns %q)> depicting n measurements made on the tuple from * attributes, respectively, Ay, A3. 2, Ay i) Suppose that there are m classes, Cj, Cy 2» Cw tuple, X; the classifier will predict that X belongs to the class Mv, hhighest posterior probability, conditioned on X. That is, the naive classifier predicts that tuple X belongs to the class C; if the only if PICAK) > PCCP) for 1 sj-Sm,j#i Given the a Thus we maximize P(CjX)). The class C; for which P(C)X) is max js called the maximum posteriori hypothesis. By Baye’s mex = PREP) \GP P(X) (iil) As P(X) is constant for all classes, only P(XIC,) P(C.) need be maximized. If the class prior probabilities are not known, then it asstimed that the classes are equally likely, that is, P(C P(C,,), and we would therefore maximize P(XIC)). Oth PCXIC)) P(C)). iv) Given data sets with many attributes, it woul computationally expensive to compute P(X{C;). In orderto reduce in evaluating PCX(C)), the naive assumption of class condit is made. This presumes that the values of the attributes independent of one another, given the class label of the tuple. 1 PXIG)= TT P(,IC,) = POI) * PIC) = --- * PIC) (¥)_ Inorder to predict the class label oF X, P(X(C;) P(C,) is evaluated for each class C;. The classifier predicts thatthe class label of tuple X is class C; ifand only if P(X(C) P(C) > PXIC) PCG) for 1
; for which PCXIC Ans, Bayesian belief networks specify joint conditional probability distributions. They allow class conditional independence o be defined between Subsets of variables. They provide « graphical model of causal relationships, {n Which leaming canbe performed. Trained Bayesian belie network can be Used for classification. Bayesian belief networks are also known as belief ‘networks, Bayesian networks and probabilistic networks A belief network is defined by two components ~a directed aeyele graph $ Se of conditional probability tables. Each node in the dreted eeelie Vag, puesen' random variable, The variables maybe dace o cons Vaviabloe payee, comTesPend to actual attributes given in he ese SR itnac esac ease tene186 Machine Learning (VI-Sem.) or immediate predecessor of Z and Z. is a descendant of Y. Bach vay concitonally independent ofits nondescendants inthe graph, siven its pare A belief network has one conditional probability table (CPT) for = variable. The CPT for variable Y specifies the conditional distribution PCY Parents (Y)), where Parents(Y) are the parents of Y. Let X = (6, «1.%q) be a data tuple described by the variables or attributes Yj, sss Yqp respectively. Each variable is conditionally independent of is nondescendants in the network graph, given its parents. This allow the network toprovide a complete representation ofthe existing joint probability distribution with the following equation — abe i POX 4 Xq) = TLP(x;[Parents(¥,)) where P(X), .... X,) is the probability of a particular combination of values of X ‘and the values for P(x Parents(¥;)) correspond to the entries in the CPT for Yi, ‘A node within the network can be selected as an “output” node, representing a class label attribute. There may be more than one output node. ‘Various algorithms for learning can be applied to the network. In the learning or training of a belief network, a number of scenarios are possible. The network topology may be given in advance or inferred from the data. The network variables may be observable or hidden in all or some of the training tuples. The case of hidden data is also referred to as missing values o° incomplete data. 0 ‘Several algorithms exist for learning the network topology from the training data given observable variables. If the network topology is known and the variables are observable, then training the network is straightforward . ItconsisS of computing the CPT entries, as is similarly done when computing the probabilities involved in naive Bayesian classification. When the network topology is given and some of the variables are hidden, there are various methods to choose from for training the belief network. Let D be a training set of data tuples, X, ining she non means hate mat ee iy fia cae ij having the parents U; = io WP = up) The wy are viewed us weighs, analogots # networks, The set of weights is collet“, Eo | | Univ 157 | kely, Such a strategy is iterative I searches fora sol negative of the gradient of a criterion function. We want i falas ‘weights, W, thatmaximize this function. To stat wi, the weighs arnt Cl m fo random probability values. The gradient descent metho pero st hilletimbing in that, at each iteration or step along the way, te yg moves toward what appears to be the best solution at the momen, wyhay backtracking. The weights are updated at each iteration. Eventually, they converge to a local optimum solution. ey equally lil Q.11. What is Baye's theorem ? Describe basie probability notaion How are these probabilities estimated ? | ‘Ans, Let X be a data tuple. In Bayesian terms, Xisconsidered“evidence” As usual, it is described by measurements made on a set of n attributes, Let H be some hypothesis, such as that the data tuple X belongs toa specified class | C Forclassification problems, we want to determine P(HIN), the potbiliy that the hypothesis H holds given the “evidence” or observed data ule X. In other words, we are looking for the probability that tuple X belongs to class C. given that we know the attribute description of X. P(HIX) is the posterior probability, or a posteriori probabil of H conditioned on X. For example, suppose our worid of data tuples somins! to customers described by the attributes age and income, espns 0 that X is a 35-year-old customer with an income of $40,000. Suppo 0"" is the hypothesis that our customer will buy a computer. Ther rn a the probability that customer X will buy a computer given th! = | wstomer's age and income In contrast, P(H) is the prior probabi aah For our exam reget the probability that any given customer will uy 8com0Ue" ET ig, income, or any other information, for that matter The POL, which i PCHIX), is based on more information than the peor Pt is independent of X. sitio" ee ;_ Similarly, P@xtt) is the posterior probability FN cams SHRM gaan the probability that a customer, X, is 359640" Ben that we know the customer will buy 8 COMPU. ne ota? POX) is the pri Using oure™ $40,000 'e prior probability of X: Usi IMME person from our set of customers 83592 sen dat a shan ts POSTED, anc POX) may be estimsled Foca way ofcaleaiee the patbslow.Baye's theorem is useflin "P15 and, PEN BYE "heorem oe probability, P(HIX), from PH458 Machine Learning (VI-Sem.) 0.12. Explain the applications of machine learning. ins. (i) Computer Vision ~ Many current vision systems, { i recognition systems, fo systems that automatically classify microsco Tom fgg wy calls, are developed using machine Tearing, again because the sR | Gystems are more accurate than hand-crafted programs. One masive ge apolieation of computer vision trained using machine learnings its use Ie US Post Office to automatically sort letters containing handwritten adda Over8$%of handwritten malin the US is sorted automatically, using handing analysis software trained to very high accuracy using machine learning era very large data set. Gi Speech Recognition ~ Currently ayailable commercial sfsems for speech recognition all use machine learning in one fashion ot anoftie ty train the system to recognize speech. The reason is simple — the Spee recognition accuracy is greater if one trains the system, then if one attempts to program it by hand, In fact, many commercial speech recognition systems involve two distinct learning phases — one before the software is shipped (iraining the general system in a speaker-independent fashion), and a second phase after the user purchases the software (to achieve greater accuracy y } ‘raining in a speaker-dependent fashion). (Gi) Bio-survellance ~ A variety of govemment efforts to detect ant | track disease outbreaks now use machine learning. For example, the RODS ‘project involves real-time collection of admissions reports to}emengeney room siross westem Pennsylvania, and the use of machine learning software to les | the profile of typical admissions so that it can detect anoinalous patters © | ‘symptoms and their geographical distribution. Current work involves aking | arch set of additional data, such as retail purchases of over-the-counter medic | toinerease the information flow into the system, further inereasing the need | ‘automated learning methods given this even more complex data set. f (iv) Robot Control ~ Machine learning methods have be! ‘successfully used in a number of robot systems, For example, Sev" Tesearchers have demonstrated the use of machine learning to acquire CO" ‘strategies for stable helicopter flight and helicopter aerobatics. The rece Darpa-sponsored competition involvinga robot driving autonomous!) focor 100 miles in the desert was won by a robot that used machine en ' ete! | 5 its tordetect:distanl-objecis (iraining:itself from-sel!-°O - serrain seen initially in the distance, and seen later UP © ) Natural inser ‘both Language Processing ~ {tis a field that which ™ ood Unuler understanding and manipulation of human anges seotin gathering ne» possibile. Ihis mostly soon ina large poo! "i? : meal ses ying 1odiscover new pattems onio 19010" ng ae It is a better way t0 analyze, understand and find th ui Thnguage easily and smarty. By wing NLP genes eM OF human as speech recognition, entity recognition, auemsn om asks such summarization. Aulomatic translation agg Diserete event simulation isa technique whe independent even associating each with se cars ode sm tweightand problematic scenarios, ete. Naural angige paver a Proprietary predictive model is used to make redistos such tanec can be predicted by hospitals to spread expertise whch isn ston sappy Dane See isan icicle coger decisions with radiology data (for example —CT, MRI and Radiogsps) @.13. Describe case study of ImageNet competition. ‘Ans, The image large scale visual recognition challenge (ILSVRC) as Known as ImageNet. ImageNet is 2 database of images used fo: recognition competitions. The ImageNet competion isan anal con ‘where researchers and their teams evaluate developed algorithms datasets, to review improvements in achieved accuracy in visusl woo challenges. ImageNet is a dataset of over 15 million labeled belonging to roughly 22000 categories. The images wer web and labeled by human labelers using Amazoa's mechanical sourcing tool. Stating in 2010, as part of the Pascal visio an annual competition called the ImageNet latze sale vis challenge (ILSVRC) has been held. ILSVRC uses subst oughly 1000 images in each of 1000 catesores million training images, 50000 validation images 1500005 ILSVRC follows in the footsteps of the PASCAL VOC cia 8 established in 2005, which set the precedent for standardized 21°°8 a Feeognition algorithms in the form of yearly competion. 4 8 PAS VOC, ILSVRC consists of two components ~ (HA publically available dataset, and Gi)An annual competition and corespondin. Allows forthe development and comparson 0 Algorithms, and the competition and workshop prvi ® NS Drogress and discuss the lessons learned from the mes SS" 'novative entries each year. TLSVR-2010 i the only version of HSVRC fw Ae available, so this is the version on which we PSE workshop. The dase jest een yo track the and ical the wate tes tnost of oofr 1460 Machine Learning (Vi-Sem.) experiments. On ImageNet, it is customary to report two error rates ~top-1 ang top-5, where the top-5 error rate is the fraction of test images for which the correct label is not among the five labels considered most probable by the mode| ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of 256 x 256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and then cropped out the central 256 x 256 patch from the resulting image. We did not pre process the images in any other way, except for subtracting the mean activity over the training set from each pixel. So we trained our network on the (centred) raw RGB values of the pixels. Some example of the winning architecture in ImageNet competition, a AlexNet, ZFNet, NiN, GoogleNet, ResNet, SeNet, The latest winner of the ILSVRC 2017 cometition alongside SegNet, DenseNet, FractalNet. $8 388
You might also like
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (6017)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (625)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Rating: 4 out of 5 stars
4/5 (1113)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (909)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1739)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1245)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (619)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (937)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2121)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (546)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (358)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (831)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (479)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1062)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (275)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (814)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1954)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (443)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (434)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2281)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
Rating: 3.5 out of 5 stars
3.5/5 (2029)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4952)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (99)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (125)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1961)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (273)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4264)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1934)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2599)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (232)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (235)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (4059)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (805)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (139)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (75)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (883)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2520)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
Rating: 3.5 out of 5 stars
3.5/5 (109)
Unit .1
Document
7 pages
Unit .1
read4free
No ratings yet
Unit 2
Document
13 pages
Unit 2
read4free
No ratings yet
Unit 1
Document
17 pages
Unit 1
read4free
No ratings yet
ML 1
Document
35 pages
ML 1
read4free
No ratings yet
Unit 1
Document
21 pages
Unit 1
read4free
No ratings yet
Machine Learning Unit 3
Document
40 pages
Machine Learning Unit 3
read4free
No ratings yet
ML (Cs-601) Unit 4 Complete
Document
45 pages
ML (Cs-601) Unit 4 Complete
read4free
No ratings yet
ML Super Imp
Document
19 pages
ML Super Imp
read4free
No ratings yet
Machine Learning
Document
10 pages
Machine Learning
read4free
No ratings yet
ML 4
Document
10 pages
ML 4
read4free
No ratings yet
Unit 1
Document
17 pages
Unit 1
read4free
No ratings yet
ML 5
Document
20 pages
ML 5
read4free
No ratings yet
Super Important ML
Document
17 pages
Super Important ML
read4free
No ratings yet
ML (Cs-601) Unit 4 Complete
Document
45 pages
ML (Cs-601) Unit 4 Complete
read4free
No ratings yet
Super Important ML
Document
16 pages
Super Important ML
read4free
No ratings yet
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (105)