0% found this document useful (0 votes)
18 views19 pages

BAI601 Module 3 PDF

This document discusses Naive Bayes classifiers for text classification and sentiment analysis, detailing their application in tasks like sentiment analysis, spam detection, and language identification. It explains the supervised learning approach, the probabilistic nature of Naive Bayes, and the challenges such as zero probability problems, along with solutions like Laplace smoothing. Additionally, it covers optimization techniques for sentiment analysis and evaluation metrics for classification performance.

Uploaded by

pavan dhodmane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

BAI601 Module 3 PDF

This document discusses Naive Bayes classifiers for text classification and sentiment analysis, detailing their application in tasks like sentiment analysis, spam detection, and language identification. It explains the supervised learning approach, the probabilistic nature of Naive Bayes, and the challenges such as zero probability problems, along with solutions like Laplace smoothing. Additionally, it covers optimization techniques for sentiment analysis and evaluation metrics for classification performance.

Uploaded by

pavan dhodmane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MODULE – 3

Naive Bayes, Text Classification and Sentiment


Naive Bayes, Text Classification and Sentiment: Naive Bayes Classifiers, Training the Naive
Bayes Classifier, Worked Example, Optimizing for Sentiment Analysis, Naive Bayes for Other
Text Classification Tasks, Naive Bayes as a Language Model.

Textbook 2: Ch. 4.

Introduction

• Classification, heart of both human and machine intelligence  assigning a category to


an input
• Deciding what letter, word, or image has been presented to our senses, recognizing faces
or voices, sorting mail, assigning grades to homeworks;
• Naïve Bayes algorithm for text categorization: the task of assigning a label or category
to an entire text or document.
• Common text categorization tasks:
1. Sentiment analysis, the extraction of sentiment, the positive or negative orientation that
a writer expresses toward some object.
o A review of a movie, book, or product on the web.

Example: + ... any characters and richly applied satire, and some great plot twists

- It was pathetic. The worst part about it was the boxing scenes ...
+ ... awesome caramel sauce and sweet toasty almonds. I love this place!
- ... awful pizza and ridiculously overpriced ...

Words like great, richly, awesome, and pathetic, and awful and ridiculously are very informative
cues: unification is based on the functional specifications of the verb, which predicts the overall
sentence structure.

2. Spam detection:
o Binary classification task of assigning an email to one of the two classes spam or
not-spam.
o Many lexical and other features can be used to perform classification.
Example: Suspicious of an email containing phrases like “online pharmaceutical” or “WITHOUT
ANY COST” or “Dear Winner”.

1
3. Assigning a library subject category or topic label to a text: Various sets of subject
categories exist. Deciding whether a research paper concerns epidemiology, embryology,
etc..is an important component of information retrieval.

Supervised Learning:

• The most common way of doing text classification in language processing is supervised
learning.
• In supervised learning, we have a data set of input observations, each associated with
some correct output (a ‘supervision signal’).
• The goal of the algorithm is to learn how to map from a new observation to a correct
output.
• We have a training set of N documents that have each been hand labeled with a class: {(d1
c1)…(dN cN)}. Our goal is to learn a classifier that is capable of mapping from a new
document d to its correct class c € C, where C is some set of useful document classes.

3.1 Naive Bayes Classifiers

The intuition of the classifier is shown in Fig. 1. We represent a text document as if it were a bag
of words, that is, an unordered set of words with their position ignored, keeping only their
frequency in the document.

Instead of representing the word order in all the phrases like “I love this movie” and “I would
recommend it”, we simply note that the word I occurred 5 times in the entire excerpt, the word
it 6 times, the words love, recommend, and movie once, and so on.

• Naive Bayes is a probabilistic classifier.

2
• For a document d, out of all classes c € C the classifier returns the class 𝐶̂ which has the
maximum posterior probability given the document.

(1)

Use Bayes’ rule to break down any conditional probability P(x|y) into three other probabilities:

(2)

We can then substitute Eq.2 into Eq.1 to get Eq.3

(3)

Since P(d) doesn't change for each class, we can conveniently simplify Eq. 3 by dropping the
denominator.

(4)
We call Naive Bayes a generative model, Eq. 4 can be read as class is sampled from P(c), and
then the words are generated by sampling from P(d|c) and a document is generated.

Eq. 4 states, we compute the most probable class 𝑪̂ given some document d by choosing the
class which has the highest product of two probabilities: the prior probability of the class P(c)
and the likelihood of the document P(d|c):

(5)

we can represent a document d as a set of features f1, f2, ..... ,fn:


(6)

Eq. 6 is still too hard to compute directly: without some simplifying assumptions, estimating the
probability of every possible combination of features (for example, every possible set of words
and positions) would require huge numbers of parameters and impossibly large training sets.

Naive Bayes classifiers therefore make two simplifying assumptions.

The first is the bag-of-words assumption, that the features f1, f2, ... ,fn only encode word identity
and not position.

The second is commonly called the naive Bayes assumption, the conditional independence
assumption that the probabilities P(fi|c) are independent given the class c.

3
Therefore, P(f1, f2, .... ,fn|c) = P(f1|c).P(f2|c) ..... P(fn|c) (7)

The final equation for the class chosen by a naive Bayes classifier is:

(8)

To apply the naive Bayes classifier to text, we will use each word in the documents as a feature,
as suggested above, and we consider each of the words in the document by walking an index
through every word position in the document:

(9)

Naive Bayes calculations, like calculations for language modelling, are done in log space, to
avoid underflow and increase speed. Thus Eq. 9 is generally instead expressed as,

(10)

Eq. 10 computes the predicted class as a linear function of input features. Classifiers that use a
linear combination of the inputs to make a classification decision -like naive Bayes and also
logistic regression are called linear classifiers.

3.2 Training the Naive Bayes Classifier

How can we learn the probabilities P(c) and P(fi|c)?

To learn class priori P(c): What percentage of the documents in our training set are in each class
c.

Let Nc be the number of documents in our training data with class c. Ndoc be the total number of
documents. Then,

(11)

To learn the probability P(fi|c):

We'll assume a feature is just the existence of a word in the document's bag of words, and so we'll
want P(wi|c), we compute as the fraction of times the word wi appears among all words in all
documents of topic c.

4
Concatenate all documents with category c into one big "category c" text. Then we use the
frequency of wi in this concatenated document to give a maximum likelihood estimate of the
probability:

i.e (12)
Here the vocabulary V
total number of unique words in your vocabulary in all classes, not just the words in one class c.

Issues with training:

1. Zero Probability problem with maximum likelihood training:

Imagine we are trying to estimate the likelihood of the word "fantastic" given class positive, but
suppose there are no training documents that both contain the word "fantastic" and are classified
as positive. Perhaps the word "fantastic" happens to occur (sarcastically?) in the class negative.
In such a case the probability for this feature will be zero:

(13)

Since naive Bayes naively multiplies all the feature likelihoods together, zero probabilities in
the likelihood term for any class will cause the probability of the class to be zero, no matter the
other evidence!

To solve this, we use something called Laplace smoothing (or add-one smoothing). Instead of:

(14)

Now "fantastic" will still get a very small probability in the "positive" class — but not zero.

2. Words that occur in our test data but are not in our vocabulary:
• Remove them from the test document and not include any probability for them at all.
Some systems choose to completely ignore another class of words: stop words, very
frequent words like the and a.

5
• Defining the top 10-100 vocabulary entries as stop words, or alternatively by using one
of the many predefined stop word lists available online. Then each instance of these stop
words is simply removed from both training and test documents.
• However, using a stop word list doesn't improve performance, and so it is more common
to make use of the entire vocabulary.

Fig. The naïve Bayes algorithm, using add-1smoothing. To use add-smoothing instead, change
the +1 to +α for log likelihood counts in training.

3.3 Worked example:

Let’s use a sentiment analysis domain with the two classes positive (+) and negative (-), and take
the following miniature training and test documents simplified from actual movie reviews.

6
Step1: Prior P(c) for the two classes is computed as per equation 11:

P(-) = 3/5 P(+) = 2/5


Step 2: The word “with” doesn't occur in the training set, so we drop it completely.

Step 3: The likelihoods from the training set for the remaining three words "predictable", "no",
and "fun", are as follows:

Step 4: For the test sentence S = "predictable with no fun", after removing the word 'with', the

chosen class, via Eq. 9: is therefore computed as follows:

3.4 Optimizing for Sentiment Analysis

While standard naive Bayes text classification can work well for sentiment analysis, some small
changes are generally employed that improve performance.

3.4.1 Clip the word counts (duplicate words) in each document at 1:


• Remove all duplicate words before concatenating them into the single big document
during training and we also remove duplicate words from test documents.
• This variant is called binary multinomial naive Bayes or binary naive Bayes.
• Example:

7
Fig. An example of binarization for the binary naive Bayes algorithm
3.4.2 Deal with negation.

Consider the difference between I really like this movie (positive) and I didn’t like this movie
(negative). Similarly, negation can modify a negative word to produce a positive review (don’t
dismiss this film, doesn’t let us get bored).

Solution: Prepend the prefix NOT to every word after a token of logical negation (n’t, not, no,
never) until the next punctuation mark.

Thus the phrase: didn’t like this movie , but I

becomes: didnt NOT_like NOT_this NOT_movie , but I

‘words’ like NOT_like, NOT_recommend will thus occur more often in negative document and
act as cues for negative sentiment, while words like NOT_bored, NOT_dismiss will acquire
positive associations.

3.4.3 Insufficient labelled training data:

Derive the positive and negative word features from sentiment lexicons (corpus), lists of words
that are pre-annotated with positive or negative sentiment.

For example, the MPQA lexicon corpus subjectivity lexicon has 6885 words each marked for
whether it is strongly or weakly biased positive or negative. Some examples:

+ : admirable, beautiful, confident, dazzling, ecstatic, favor, glee, great

- : awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh, hate

8
3.5 Naive Bayes for other text classification tasks

3.5.1 Spam Detection and Naïve Bayes

Spam detection—deciding whether an email is unsolicited bulk mail—was one of the earliest
applications of naïve Bayes in text classification (Sahami et al., 1998). Rather than treating all
words as individual features, effective systems often use predefined sets of words or patterns,
along with non-linguistic features.

For instance, the open-source tool SpamAssassin uses a range of handcrafted features:

• Specific phrases like "one hundred percent guaranteed"

• Regex patterns like mentions of millions of dollars


• Structural properties like HTML with a low text-to-image ratio

• Non-linguistic metadata, such as the email’s delivery path

Other examples of SpamAssassin features include:

• Subject lines written entirely in capital letters

• Urgent phrases like "urgent reply"

• Keywords such as "online pharmaceutical"

• HTML anomalies like unbalanced head tags

• Claims such as "you can be removed from the list"

3.5.2 Language Identification

In contrast, tasks like language identification rely less on words and more on subword units like
character n-grams or even byte n-grams. These can capture statistical patterns at the start or end
of words, especially when spaces are included as characters.

A well-known system, langid.py (Lui & Baldwin, 2012), starts with all possible n-grams of
lengths 1–4 and uses feature selection to narrow down to the 7,000 most informative.

Training data for language ID systems often comes from multilingual sources such as Wikipedia
(in 68+ languages), newswire, and social media. To capture regional and dialectal diversity,
additional corpora include:

9
• Geo-tagged tweets from Anglophone regions like Nigeria or India

• Translations of the Bible and Quran

• Slang from Urban Dictionary

• Corpora of African American Vernacular English (Blodgett et al., 2016)

These diverse sources help models capture the full range of language use across different
communities and contexts (Jurgens et al., 2017).
3.6 Naive Bayes as a Language Model
• Naive Bayes classifiers can use any sort of feature: dictionaries, URLs, email addresses,
network features, phrases, and so on.
• A naive Bayes model can be viewed as a set of class-specific unigram language models,
in which the model for each class instantiates a unigram language model.
• Assign a probability to each word P(word|c), the model also assigns a probability to each

sentence: (15)
Example: Consider a naive Bayes model with the classes positive (+) and negative (-) and the
following model parameters:

Each of the two columns above instantiates a language model that can assign a probability to
the sentence “I love this fun film”:

P(“I love this fun film”+) = 0.1 * 0.1 * 0.01 * 0.05 * 01=5 * 10-7 P(“I

love this fun film” - ) = 0.2 * 0.001 * 0.01* 0.005 * 0.1=1.0 * 10-9

The positive model assigns a higher probability to the sentence: P(s|pos) > P(s|neg).

Note: This is just the likelihood part of the naive Bayes model; once we multiply in the prior a
full naive Bayes model might well make a different classification decision.

10
3.7 Evaluation: Precision, Recall, F-measure

Text classification evaluation often starts with binary detection tasks.


Example 1: Spam Detection

• Goal: Label each text as spam (positive) or not spam (negative).


• Need to compare:
o System’s prediction o Gold label (human-

defined correct label)

Example 2: Social Media Monitoring for a Brand


• Scenario: CEO of Delicious Pie Company wants to track mentions on social media.
• Build a system to detect tweets about Delicious Pie.
• Positive class: Tweets about the company.
• Negative class: All other tweets.
Why we need metrics:

• To evaluate how well a system (e.g., spam detector or pie-tweet detector) performs.

• Confusion Matrix:

o A table that compares system predictions


vs. gold (human) labels.

o Each cell represents a type of outcome:

▪ True Positive (TP): Correctly predicted positives (e.g.,


actual spam
labeled as spam).

▪ False Negative (FN): Actual positives incorrectly labeled


as negative (e.g., spam labeled as non-spam).

11
• Accuracy:
o Formula: (Correct predictions) / (Total predictions).
o Appears useful but misleading for unbalanced classes.

• Why accuracy can fail:


o Real-world data is often skewed (e.g., most tweets are not about pie).
o Example:

▪ 1,000,000 tweets → only 100 about pie.

▪ A naive classifier labels all tweets as "not about pie".

▪ Result: 99.99% accuracy, but 0 useful results.


o Conclusion: Accuracy is not a reliable metric when the positive class is rare.
That’s why, instead of relying on accuracy, we often use two more informative metrics:
precision and recall (as shown in Fig).

• Precision measures the percentage of items labeled as positive by the system that are
actually positive (according to human-annotated “gold” labels).

Precision = true positives/ true positives + false positives

• Recall measures the percentage of actual positive items that were correctly identified by
the system.

Recall = true positives/ true positives + false negatives

These metrics address the issue with the “nothing is pie” classifier. Despite its seemingly
excellent 99.99% accuracy, it has a recall of 0 —because it misses all 100 true positive cases,
identifying none. Its precision is also meaningless, since it detects nothing (since there are no
true positives, and 100 false negatives, the recall is 0/100).

Unlike accuracy, precision and recall focus on true positives, helping us measure how well the
system finds the things it’s actually supposed to detect.

To combine both precision and recall into a single metric, we use the F-measure (van Rijsbergen,
1975), with the most common version being the F1 score:

12
The ß parameter differentially weights the importance of recall and precision, based perhaps on
the needs of an application. Values of ß > 1 favor recall, while values of ß < 1 favor precision.
When ß = 1, precision and recall are equally balanced; this is the most frequently used metric,
and is called Fβ=1 or just F1:

(16)

3.7.1 Evaluating with more than two classes

For sentiment analysis we generally have 3 classes (positive, negative, neutral) and even
more classes are common for tasks like part-of-speech tagging, word sense disambiguation,
semantic role labeling, emotion detection, and so on. Luckily the naive Bayes algorithm is
already a multi-class classification algorithm.

Consider the sample confusion matrix for a hypothetical 3-way one-of email
categorization decision (urgent, normal, spam) shown in Fig. The matrix shows, for example,
that the system mistakenly labeled one spam document as urgent, and we have shown how to
compute a distinct precision and recall value for each class.

Confusion matrix for a three-class categorization task, showing for each pair of classes
(c1,c2), how many documents from c1 were (in)correctly assigned to c2.

In order to derive a single metric that tells us how well the system is doing, we can combine these
values in two ways.

1. In macroaveraging, we compute the performance for each class, and then average over
classes.
2. In microaveraging, we collect the decisions for all classes into a single confusion matrix,
and then compute precision and recall from that table.

Fig. shows the confusion matrix for each class separately, and shows the computation of
microaveraged and macroaveraged precision.

13
As the figure shows, a microaverage is dominated by the more frequent class (in this case spam),
since the counts are pooled. The macroaverage better reflects the statistics of the smaller classes,
and so is more appropriate when performance on all the classes is equally important
3.8 Test sets and Cross-validation

Training & Testing for Text Classification:

1. Standard Procedure:

o Train the model on the training set. o Use the development set (devset)

to tune parameters and choose the best model.


o Evaluate the final model on a separate test set.

2. Issue with Fixed Splits:


o Fixed training/dev/test sets may lead to small dev/test sets.

o Smaller test sets might not be representative of overall performance.

3. Solution – Cross-Validation (as shown in Fig):


o Cross-validation allows use of all data for training and testing.

o Process:
▪ Split data into k folds.

▪ For each fold:

▪ Train on k-1 folds, test on the remaining fold.

▪ Repeat k times, average the test errors.

o Example: 10-fold cross-validation (train on 90%, test on 10%, repeated 10


times).

4. Limitation of Cross-Validation:

14
o All data is used for testing →
can't analyze the data in
advance (avoiding
"peeking").
o Looking at data is important
for feature design in NLP
systems.

5. Common Compromise:

o Split off a fixed test set.

o Do 10-fold cross-validation
on the training set.
o Use test set only for final
evaluation.
3.9 Statistical Significance Testing
• When building NLP systems, we often need to compare performance between two
systems (e.g., a new model vs. an existing one).
• Simply observing different scores (e.g., accuracy, F1) isn't enough — we need to know if
the difference is statistically significant.
• This is where statistical hypothesis testing comes in.
• Inspired by Dror et al. (2020) and Berg-Kirkpatrick et al. (2012), these tests help
determine if the observed improvement is real or due to chance.
• Example:
o Classifier A (e.g., logistic regression) vs. Classifier B (e.g., naive Bayes). o
Metric M (e.g., F1-score), tested on dataset x.
o Let M(A, x) be the score for A, and δ(x) be the difference in performance between
A and B.

(19)
Understanding Effect Size and Significance
• We want to know if δ(x) > 0, meaning A (logistic regression) performs better than B
(naive Bayes).
• δ(x) is the effect size — larger δ means a bigger performance gap.
• But a positive δ alone isn’t enough.
o Example: A has 0.04 higher F1 than B — is that meaningful?
• Problem: The difference might be due to chance on this specific test set.
• What we really want to know:

15
o Would A still outperform B on another test set or under different conditions?
• That’s why we need statistical testing, not just raw differences.
Statistical Hypothesis Testing Paradigm
• We compare models by setting up two formal hypotheses:
(20)

o Null hypothesis (H₀): There's no real difference between A and B — any


observed difference is due to chance.
o Alternative hypothesis (H₁): There is a real performance difference between A
and B.
• Statistical tests help us decide whether to reject H₀ in favor of H₁ based on the data.
Null Hypothesis and p-value

• Null hypothesis (H₀): Assumes δ(x) ≤ 0 — A is not better than B.


• We want to see if we can reject H₀ and support H₁ (that A is better).

• We imagine δ(x) over many possible test sets.

• The p-value measures how likely we are to observe our δ(x), or a larger one, if H₀ were
true. (21)
• A low p-value suggests our result is unlikely due to chance, supporting H₁.
Interpreting p-values and Statistical Testing in NLP
• The p-value is the probability of observing a performance difference δ(x) (or larger),
assuming A is not better than B (null hypothesis H₀).
• If δ(x) is large (e.g., A’s F1 = 0.9 vs. B’s = 0.2), it's unlikely under H₀ → low p-value →
we reject H₀.
• If δ(x) is small, it's more plausible under H₀ → higher p-value → we may fail to reject
H₀.
What Counts as “Small”? o Common p-value
thresholds: 0.05 or 0.01
• If p < threshold, the result is considered statistically significant (we reject H₀ and
conclude A is likely better than B).
How Do We Compute the p-value in NLP?
• NLP avoids parametric tests (like t-tests or ANOVAs) because they assume certain
distributions that often don't apply.
• Instead, we use non-parametric tests that rely on sampling methods.
Key Idea:

16
• Simulate many variations of the experiment (e.g., using different test sets x′).

• Compute δ(x′) for each → this gives a distribution of δ values.

• If the observed δ(x) is in the top 1% (i.e., p-value < 0.01), it's unlikely under H₀ → reject
H₀.
Common Non-Parametric Tests in NLP:

1. Approximate Randomization (Noreen, 1989)

2. Bootstrap Test (paired version is most common)

o Compares aligned outputs from two systems (e.g., A vs. B on the same inputs xi).
oMeasures how consistently one system outperforms the other across samples.
3.9.1 The Paired Bootstrap Test
The bootstrap test is a flexible, non-parametric method that can be applied to any evaluation
metric—like precision, recall, F1, or BLEU.
What is bootstrapping?
It involves repeatedly sampling with replacement from an original dataset to create many
"bootstrap samples" or virtual test sets. The key assumption is that the original sample is
representative of the larger population.
Example
Imagine a small classification task with 10 test documents. Two classifiers, A and B, are
evaluated:

• Each document outcome falls into one of four categories:


o Both A and B correct o Both incorrect o
A correct, B wrong
o A wrong, B correct
• If A has 70% accuracy and B has 50%, then the performance difference δ(x) = 0.20.
How bootstrap works:

1. Generate a large number (e.g., 100,000) of new test sets by sampling 10 documents with
replacement from the original set.

2. For each virtual test set, recalculate the accuracy difference between A and B.

3. Use the distribution of these differences to estimate a p-value, telling us how likely the
observed δ(x) is under the null hypothesis (that A is not better than B).

This helps determine whether the observed performance difference is statistically significant or
just due to random chance.

17
Figure: The paired bootstrap test: Examples of b pseudo test sets x (i) being created from an initial true test
set x. Each pseudo test set is created by sampling n = 10 times with replacement; thus an individual sample
is a single cell, a document with its gold label and the correct or incorrect performance of classifiers A and
B.
With the b bootstrap test sets, we now have a sampling distribution to analyze whether
A’s advantage is due to chance. Following Berg-Kirkpatrick et al. (2012), we assume the null
hypothesis (H₀)—that A is not better than B—so the average δ(x) should be zero or negative. If
our observed δ(x) is much higher, it would be surprising under H₀. To measure this, we
calculate the p-value by checking how often the sampled δ(xᵢ) values exceed the observed δ(x).

We use the notation 1(x) to mean “1 if x is true, and 0 otherwise.” Although the expected value
of δ(X) over many test sets is 0, this isn't true for bootstrapped test sets due to the bias in the
original test set, so we compute the p-value by counting how often δ (x(i)) exceeds the expected
δ(x) by δ(x) or more.

(22)

If we have 10,000 test sets and a threshold of 0.01, and in 47 test sets we find δ(x(i)) ≥ 2δ(x), the
p-value of 0.0047 is smaller than 0.01. This suggests the result is surprising, allowing us to reject
the null hypothesis and conclude A is better than B.

18
Fig. A version of the paired bootstrap algorithm

The full algorithm for the bootstrap is shown in Fig. It is given a test set x, a number of samples
b, and counts the percentage of the b bootstrap test sets in which δ (x *(i)) > 2δ (x). This percentage
then acts as a one-sided empirical p-value.

3.10 Avoiding Harms in Classification (Summary)

• Classifiers can cause harm, including representational harms (e.g., reinforcing


stereotypes).
o Example: Sentiment analysis systems rated sentences with African American
names more negatively than identical ones with European American names.
• Toxicity classifiers may falsely label non-toxic content as toxic, especially when it
references marginalized groups or dialects (e.g., AAVE), leading to silencing.
• Harms can arise from:

o Biased training data o Biased labels or resources (e.g., lexicons,

embeddings) o Model design choices

• No universal fix exists, so transparency is key.


• A proposed solution: release model cards (Mitchell et al., 2019), which include:

o Training algorithms and parameters o Training data sources, motivation, and


preprocessing o Evaluation data sources, motivation, and preprocessing o
Intended use and users o Model performance across different demographic or
other groups and environmental situations

19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy