0% found this document useful (0 votes)
55 views

CMPS242 Machine Learning Final Project Report: 1. Problem Statement

This document is the final project report for a machine learning course. The project focused on using k-medoids clustering to help businesses find potential competitors by identifying similar businesses. The report describes the problem statement, k-medoids algorithm used, various distance metrics tested (e.g. L1, L2, Jaccard, Dice), features engineered from the data (e.g. star ratings, reviews, attributes), and normalization techniques applied to distance metrics. The goal of the project was to cluster similar businesses to help identify competitors for a business considering expanding to a new location.

Uploaded by

Komal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

CMPS242 Machine Learning Final Project Report: 1. Problem Statement

This document is the final project report for a machine learning course. The project focused on using k-medoids clustering to help businesses find potential competitors by identifying similar businesses. The report describes the problem statement, k-medoids algorithm used, various distance metrics tested (e.g. L1, L2, Jaccard, Dice), features engineered from the data (e.g. star ratings, reviews, attributes), and normalization techniques applied to distance metrics. The goal of the project was to cluster similar businesses to help identify competitors for a business considering expanding to a new location.

Uploaded by

Komal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CMPS242 Machine Learning Final Project Report

Eriq Augustine EAUGUSTI @ UCSC . EDU


Student ID: 1116667
Varun Embar VEMBAR @ UCSC . EDU
Student ID: 1566148
Dhawal Joharapurkar DJOHARAP @ UCSC . EDU
Student ID: 1566168
Xiao Li XLI 111@ UCSC . EDU
Student ID: 1461332
Team 0: Para-normal Distributions - https://github.com/eriq-augustine/242-2016

1. Problem Statement • Since we are only using pairwise comparisons, we


do not need to be able to calculate the average of a
Our project is focused around helping business owners find feature. This is better for non-numeric features like
potential competitors if they choose to open a branch in a strings or sets.
new location. Using this information the business owners
can not only find potential competitors, but also get an idea • Since we are only using pairwise comparisons, once
of how well their business will be received in the new lo- we compute the dissimilarity between the data points,
cality. we can do away with the actual data points, and this
leads to a reduction in the memory usage and makes
We say that a “competitor” is any similar business, with the the computation more efficient.
idea that a similar business caters to a similar clientele and
is therefore a competitors. Therefore our task involves find- • Since our medoid is a real data point (as opposed to
ing similar businesses in other regions. We do this by first a centroid), we can use this data point as a “repre-
clustering similar businesses. Once we have the clusters, to sentative” when visualizing our data (or perhaps even
find similar businesses in a location, we look into the clus- sampling it).
ter to which a given restaurant belongs to, and then display
restaurants in that cluster that are close to the location. Our stopping condition is either when we have run through
10 iterations of clustering (experimentally found to be suf-
We use the k-medoids algorithm to cluster the businesses as ficient with this dataset), or when the current set of clusters
not all attributes are numeric in nature and we need a richer is the same as either the last run or the run before it. We
set of dissimilarity scores. We run experiments with var- compare not just the most recent run so that we can prevent
ious parameter settings, dissimilarity scores, and features unnecessary runs when the clusters are jittering, or when
and report the rand index on a “gold standard” data set. outlier points are constantly shifting their membership back
and forth between two or more clusters.
2. Algorithm Formulation
We use the K-medoid clustering algorithm to cluster busi- 3. Distance Metrics
nesses. K-medoids is a clustering technique that tries to To compare our features, we use several different distance
minimize the pairwise dissimilarity between data points metrics. In this section, we will discuss only metrics that
that are assigned to a cluster and the medoid of that cluster. were used in our final experiments. However, we imple-
The medoid is a data point in data set that best represent mented and explored several different metrics that will be
the cluster center. The medoid is analogous to the centroid covered in Appendix B.
in the K-Means algorithm.
Our distance metrics fall into two categories: numeric and
The K-Medoids algorithm has the following advantage set-based. For numeric, we implemented L1 and L2 dis-
over K-Means. tances. For set-based, we implemented Jaccard and Dice.
CMPS242 Machine Learning Final Project Report

To make combining the distance of different features to- 3. Logistic - Use the Sigmoid function to squash the dis-
gether easier, we also attempt to normalize the output of tance down to a 0-1 range. Since all our distance met-
the distance to ideally be in the range 0 to 1. rics return non-negative values, we can ensure that the
following returns a value in the range of 0 to 1:
3.1. L1 (Manhattan) Distance
1
The Manhattan distance of two numbers x and y is calcu- distance(d) = ( − 0.5) ∗ 2
1 − exp(−d)
lated by:
4. Features
n
X
d= |xi − yi | Our set of features can be divided into three types of fea-
i=1 tures:

3.2. L2 (Euclidean) Distance • Numeric Features


The Euclidean distance of two numbers x and y is calcu- – Star Rating - The star ratings of a business,
lated by: rounded to half-stars.
v – Total Review Count - The total number of re-
u n
uX views present for the business.
d = t (xi − yi )2 – Available Review Count - The number of re-
i=1 views available for the business in the data set.
– Mean Review Length - Average length of a re-
3.3. Jaccard Distance view for the business.
The Jaccard distance of two set A and B is calculated by: – Mean Word Length - Average length of the
words in the reviews for the business.
|A ∩ B| – Number of Words - Total number of words in
dj (A, B) = 1 − the reviews of the business.
|A ∪ B|
– Mean Words per Review - Average number of
3.4. Dice distance words per review available for the business.

The Dice distance of two set A and B is calculated by: • Descriptive Features
The dataset also contains some textual data such as at-
2|A ∩ B| tributes, categories, review text, etc. which describe
dd (A, B) = 1 − the businesses. We construct features using these
|A| + |B|
texts, but to improve efficiency of our model we en-
code the string data as a set of unique integers by
3.5. Normalizations
building maps over all possible values that these fea-
Because different distance metrics work with different out- tures can take. So, our output feature is a set of iden-
put ranges, we need to normalize the distances metrics to tifiers of the words present for the business.
something consistent. For example, the range of Jaccard
– Attribute Features - The dataset contains infor-
Distance is between 0 and 1 while the range of Manhattan
mation about attributes of businesses which de-
Distance is 0 to infinity.
scribe the operations of the business. They ex-
To figure out what works best, we will try three different ist as key-value pairs in the data, for example -
normalization methods: (WiFi, no), (Delivery, false), (Attire, casual). We
squash the key-value pairs together and add these
1. Raw - No normalization is applied. together to create a set of attributes that a busi-
ness has, which we then use as a feature.
2. Logarithmic - Use a logarithm to squash down the – Category Features - The dataset also contains
value closer to zero. Since we did not want to let val- some categorical information about the business,
ues less than 1 grow, we put a hinge at 1 and just let for example, whether the business is a restaurant,
all values less than 1 go to zero. cafe, food place, burgers place, etc. We construct
the feature which is the set of categories that the
distance(d) = ln(max(1, d)) business has assigned to it.
CMPS242 Machine Learning Final Project Report

– Key Words - These are words that Yelp has de- • a - No. of pairs that are assigned to same cluster in
fined to help users in filtering out the businesses both X and Y
that appear in the search results. They’re words
• b - No. of pairs that are assigned to different clusters
that delineate businesses as they’re mostly cate-
in both X and Y
gorical words such as restaurant, cafes, etc. We
use these key words and look for their occur- • c - No. of pairs that are assigned to same cluster in X
rences in the reviews of the businesses and return but to different clusters in Y
a set of key words that they contain.
• d - No. of pairs that are assigned to different clusters
– Top Words - This set contains the most fre- in X but to same cluster in Y
quently occurring words in the reviews of the
text, after taking care of the stop words. We used The rand index is then given by R = a+b
a+b+c+d
a general English language stop words list con-
taining 562 stop words.
5.2. Generating Gold Standard Clusters
• Temporal Features Since we do not have the gold standard clusters, we cluster
We have two time-related features pertaining to the a subset of the data and evaluate the algorithm on these data
functioning hours of businesses points.

– Total Hours - Total number of hours the business We looks at various restaurant chains such as “Taco Bell”,
is open during the week “Starbucks” etc. that are present in the data and assign all
the stores belonging to the same chain as a new cluster. We
– Open Hours - This is a feature encoded informa- extracted the top 15 restaurant chains and their details are
tion about the functioning hours of the business given in Table 1. We only generate pairs within a restaurant
over the week. We divided the hours in a day in chain. Since it is not clear if a McDonald’s and a Burger
the following way to help us attribute the func- King should be in the same or different cluster, we do not
tioning times to a business. look at pairs across various chains.
∗ Open between 6AM - 12PM: The restaurant
However if we only have positive pairing, then the rand in-
functions in the morning, or serves breakfast.
dex can be trivially maximized by assigning all data points
∗ Open between 12PM - 3PM: The restaurant
to the same cluster. To counteract this, we also need pairs
functions in the afternoon, or serves lunch.
of restaurants that should not be in the same cluster. We
∗ Open between 5PM - 9PM: The restaurant collected a list of 285 “fine-dining” restaurants (restaurants
functions in the evening, or serves dinner. which have the highest price range) and create pairs such
∗ Open between 9PM - 2AM: The restaurant that, the first restaurant comes from the “fast food” list like
functions post dinner, or late night. “McDonald’s” and “Taco Bell” and the other from the fine-
However, to make our feature more robust, we dining list. These pairs should not be in the same cluster.
specify that to encode that a business functions
during a specific time-span specified above, it 6. Experiments
must be open for at least 4 days in a week during
those hours. We again return a set of the time- We use all 3069 restaurants present in the gold standard
spans that a business operates during as a set. dataset for our experiments.
There are four different parameters that we can tune in our
5. Evaluation algorithm:

In order to evaluate the correctness of the generated clus- 1. D - the set of features
ters, we create sets of restaurants that are similar. Using
these sets as the “gold standard” clusters, we report the 2. K - the number of clusters
Rand Index. 3. F - the function used to normalize various distance
metrics
5.1. Rand Index
4. S - the distance measure used to measure set similarity
Rand Index is used in data clustering to measure the simi-
larity between two cluster assignments. Given a set of ele- We run multiple experiments, keeping a few of these pa-
ments S, and two partitions of S, X = {X1 , X2 , . . . , xn } rameters fixed, and altering others to better understand the
and Y = {Y1 , Y2 , . . . , Ym }, we compute the following sensitivity of each parameter.
CMPS242 Machine Learning Final Project Report

Restaurants No of branches Normalization Set Distance Rand Index


Starbucks 527 Log Dice 0.685091
Subway 408 Log Jaccard 0.672223
McDonald’s 365 Logistic Dice 0.826786
Taco Bell 193 Logistic Jaccard 0.785894
Burger King 167 None Dice 0.667862
Pizza Hut 159 None Jaccard 0.667858
Wendy’s 149
Panda Express 122 Table 3. K = 10, D = NAW
Dunkin’ Donuts 122
Domino’s Pizza 107
KFC 99
Chipotle Mexican Grill 97
k Rand Index
Dairy Queen 96
8 0.7872
Papa John’s Pizza 92
10 0.8267
Jack in the Box 81
12 0.7780
Fine Dining 100
14 0.7652
Table 1. List of restaurant chains 16 0.7486
18 0.7464

Features Rand Index Table 4. D = NAW, F = logistic, S = Jaccard


A 0.9070
W 0.8523
N 0.6498
AW 0.9295
WN 0.6948
ization does not make any difference to the metrics.
AN 0.7269
NAW 0.8267 In our third experiment, we modify the number of clusters
K, keeping the other parameters fixed. We set the set dis-
Table 2. K = 10, F = logistic, S = Dice tance to Jaccard and the normalization to logistic and use
all the features (Numeric, Attribute, and Word). The met-
rics are given in table 4.
In the first experiment, we modify the feature set keeping We observe that the best setting for K is 10. We also ob-
other parameters fixed. The possible features are N (Nu- serve that as we increase the number of clusters from 10,
meric features), A (Attribute descriptive features (attributes the rand index decreases. This could be due to the good
and categories)), W (Word descriptive features (key words clusters being split into many clusters resulting in decreas-
and top words))1 . We set K = 10, F to logistic normaliza- ing rand index. Another reason could be that the num-
tion, and S to Dice. The results are shown in table 2. ber of “same clusters” pairs in the gold standard dataset
We observe that the combination of Attribute and Word is much higher than the number of “across cluster” pairs.
gives the best performance, followed by Attribute alone. This could result in rand index preferring fewer and larger
We also observe that numeric features lead to deterioration clusters over many small clusters.
of performance. The full results can be seen in Appendix D.
In the second experiment, we keep the number of clusters
and the feature set fixed and alter the normalization func- 7. Conclusion
tion F and the set distance. We use the setting K = 10, and
use all the features (Numeric, Attribute, and Word). The Clustering works fairly well for this dataset. The signals
results are shown in table 3. that most strongly indicate that two businesses are the same
come from the reviews rather than the more structured data
We observe that the best performance is achieved when we like number of stars. Our work with the textual features
use Jaccard similarity. We also observe that using normal- was limited to just a few instances of “low-hanging fruit”
1
Because of time constraints, temporal features were not in- But if the success of those features is any indication of the
cluded in these experiments. Partial results for temporal features richness of the reviews, then the reviews should be the fo-
can be found in Appendix A. cus of future work.
CMPS242 Machine Learning Final Project Report

Appendices B.2. Needleman-Wunsch Distance


The distance of two string ’a’ and ’b’ is calculated by
Needleman-Wunsch algorithm. We calculate the distance
A. Temporal Feature Experiments by comparing the letters in two strings. The scoring way is
Time constraints lead us to not being able to complete the as follows:
experiments with Temporal features. However, we do have
partial results. • Match = 1: two letters ai and bj are the same
It does not look like Temporal features are stronger than • Mismatch = -1: two letters ai and bj are the different
Attribute or Word features. However, we see Temporal fea-
tures performing better than most combinations which in- • Indel = -1:
clude Numeric features. We already know that the Numeric
– Delete: one letter in the string ’a’ aligns to a gap
features are detrimental, but it is very interesting that Tem-
in the string ’b’.
poral beats it when we consider what is represented by each
feature. Numeric features contain information such as the – Insert: one letter in the string ’b’ aligns to a gap
restaurants star rating, but Temporal features just have in- in the string ’a’.
formation about when a place is open.
The pseudo-code for the Needleman-Wunsch algorithm is
Feature Set K Scalar Normalization Set Distance Rand Index as follows:
NAWT 8 Log Dice 0.694153
NAWT 8 Log Jaccard 0.686108 for i = 0 to length(a) do
NAWT 8 Logistic Dice 0.774644
NAWT 8 Logistic Jaccard 0.764851
F (i, 0) ← −i
NAWT 10 Log Dice 0.700142 end for
NAWT 10 Log Jaccard 0.695933
NAWT 10 Logistic Dice 0.761750
for j = 0 to length(b) do
NAWT 10 Logistic Jaccard 0.740966 F (0, j) ← −j
NAWT 12 Log Dice 0.694220
NAWT 12 Log Jaccard 0.698511
end for
NAWT 12 Logistic Dice 0.787857 for i = 1 to length(a) do
NAWT 12 Logistic Jaccard 0.749919
NAWT 14 Log Dice 0.704690
NAWT 14 Log Jaccard 0.708275 for j = 1 to length(b) do
NAWT 14 Logistic Dice 0.765841
NAWT 14 Logistic Jaccard 0.747909
AWT 10 N/A Jaccard 0.718515 if a[i] == a[j] then
NAT 10 Logistic Jaccard 0.742525
NWT 10 Logistic Jaccard 0.703385
M atch ← F (i − 1, j − 1) + 1
T 10 N/A Jaccard 0.779835 else
M ismatch ← F (i − 1, j − 1) − 1
B. Additional Distance Metrics end if
Insert ← F (i, j − 1) − 1
We implemented and explored using string similarity met- Delete ← F (i − 1, j) − 1
rics to compute distance. However, we were not able to F (i, j) ← max(M atch, M ismatch, Insert, Delete)
find suitable features to use string distance with. end for
end for
B.1. Levenshtein Distance
Then we need to normalize the Needleman-Wunsch dis-
The distance of two strings a and b is evaluate by Leven- tance N W a, b by transaction function to the range of [0,
shtein distance: 1]. la and lb are the length of string ’a’ and ’b ’.
max(i, j),

 if min(i,j)=0 N W a, b − max(la , lb )

  dis =
leva,b (i, j) =  leva,b (i − 1, j) + 1 −2 ∗ max(la , lb )


 min = leva,b (i, j − 1) + 1 otherwise
leva,b (i − 1, j − 1) + 1( ai 6= bj )

C. Implementation Details
(1)
Then we normalize the Levenshtein distance leva,b by C.1. Code
transaction function to the range of [0, 1], la and lb rep-
resent the length of ’a’ and ’b’. Our code is hosted on GitHub: https://github.com/eriq-
augustine/242-2016. Since we have implemented all our
leva,b methods ourselves, numpy is the only dependency required
dis = to run our code.
max(la , lb )
CMPS242 Machine Learning Final Project Report

C.2. Data C.3.2. M ULTIPROCESSING


To better handle and analyze the Yelp data, we first put Just the precomputation of the distance matrix for our
it into a relational database. We are using PostgreSQL. ground truth involves calculating 4,707,846 similarities
Our resulting schema was 12 tables in Boyce-Codd nor- (which is still nothing when compared to the 799,980,000
mal form. The create table statements can be found in our required for all restaurants). To speed this up, we took ad-
repository at data/sql/create.sql. vantage of multiprocessing and shared memory.
The script to parse the Yelp data and convert it to SQL in-
sert statements can be found at data/sql/parse.rb. D. Full Results
Having the data in a relational database also gives us the Below are the full results from our experiments on differ-
advantage of indexes and precomputations. Non-trivial in- ent parameters. The “Feature Set” column describes the
formation like the term frequencies over all reviews can be features that were included in that run.
precomputed and stored in tables for use in more complex
features. Additionally, we can tune our feature query with • N - Numeric Features
indexes targeted at our specific features. The SQL file that
handles precomputations and optimizations can be found at • A - “Attribute” Descriptive Features (attributes and
data/sql/optimize.sql. categories)
• W - “Word” Descriptive Features (key words and top
C.3. Optimizations words)
Clustering may be a fairly simple task, but it can quickly
get very resource intensive. Note that scalar normalizations only make sense in the
presence of numeric features (’N’), while set distance only
C.3.1. M EMORY makes sense in the presence of non-numeric features (’A’
& ’W’).
The Yelp dataset contains approximately 40000 restaurants
(based on the given categories). Since the resulting dis- Feature Set K Scalar Normalization Set Distance Rand Index
tance matrix is symmetric matrix and we do not need to NAW 8 Log Dice 0.660370
NAW 8 Log Jaccard 0.661637
keep the values on the diagonal (all points are 0 distance NAW 8 Logistic Dice 0.787228
to themselves), that means we need to compute exactly NAW 8 Logistic Jaccard 0.790557
NAW 8 None Dice 0.659138
n · (n − 1)/2, or 40000 · 39999/2 = 799, 980, 000. NAW 8 None Jaccard 0.659138
NAW 10 Log Dice 0.685091
A naive method would be too keep these values in a map NAW 10 Log Jaccard 0.672223
NAW 10 Logistic Dice 0.826786
structure comprised of maps to floats. Assuming a float NAW 10 Logistic Jaccard 0.785894
takes up 64 bits and an int takes 64 bits, then this means NAW 10 None Dice 0.667862
NAW 10 None Jaccard 0.667858
that a naive representation of the distance matrix would NAW 12 Log Dice 0.680260
require a float for the value and two integers for the keys NAW 12 Log Jaccard 0.677921
NAW 12 Logistic Dice 0.778079
which would take up at least: 799, 980, 000 ∗ (3 ∗ 64) = NAW 12 Logistic Jaccard 0.763160
153, 596, 160, 000bits = 17.88GB. NAW 12 None Dice 0.669829
NAW 12 None Jaccard 0.669829
This is a but unwieldy for most laptops. We can improve NAW 14 Log Dice 0.681115
NAW 14 Log Jaccard 0.674080
this by using a single array instead of a map. Then, we only NAW 14 Logistic Dice 0.765209
need to store the actual distance value and not the keys. NAW 14 Logistic Jaccard 0.757906
NAW 14 None Dice 0.670226
This would reduce our size cost down to; 799, 980, 000 ∗ NAW 14 None Jaccard 0.670226
64 = 51, 198, 720, 000bits = 5.96GB. NAW 16 Log Dice 0.675980
NAW 16 Log Jaccard 0.674917
NAW 16 Logistic Dice 0.748661
However, we can further reduce our memory cost by choos- NAW 16 Logistic Jaccard 0.760910
ing a more cost-efficient data type. Since we are normal- NAW 16 None Dice 0.670803
NAW 16 None Jaccard 0.670918
izing our distance functions to return a value in the range NAW 18 Log Dice 0.676348
of 0 to 1, we do not expect our distances to grow too large. NAW 18 Log Jaccard 0.686562
NAW 18 Logistic Dice 0.746411
Therefore, we can feel safe using a smaller data type such NAW 18 Logistic Jaccard 0.757685
as a 16-bit float. This will further reduce the size down NAW 18 None Dice 0.669489
to: 799, 980, 000 ∗ 16 = 12, 799, 680, 000bits = 1.49GB, continued...
1/12 our original cost.
CMPS242 Machine Learning Final Project Report

Feature Set K Scalar Normalization Set Distance Rand Index Feature Set K Scalar Normalization Set Distance Rand Index
NAW 18 None Jaccard 0.669458 NW 18 Logistic Jaccard 0.681717
AW 10 N/A Dice 0.929555 A 10 N/A Dice 0.907089
AW 10 N/A Jaccard 0.855210 A 10 N/A Jaccard 0.969376
AW 12 N/A Dice 0.870459 A 12 N/A Dice 0.898305
AW 12 N/A Jaccard 0.864035 A 12 N/A Jaccard 0.910638
AW 14 N/A Dice 0.851381 A 14 N/A Dice 0.869441
AW 14 N/A Jaccard 0.842278 A 14 N/A Jaccard 0.911598
AW 16 N/A Dice 0.850255 A 16 N/A Dice 0.826885
AW 16 N/A Jaccard 0.844351 A 16 N/A Jaccard 0.907550
AW 18 N/A Dice 0.848773 N 8 Log N/A 0.657650
AW 18 N/A Jaccard 0.851001 N 8 Logistic N/A 0.644067
NA 8 Log Dice 0.659055 N 8 None N/A 0.659098
NA 8 Log Jaccard 0.658976 N 10 Log N/A 0.657301
NA 8 Logistic Dice 0.742567 N 10 Logistic N/A 0.649856
NA 8 Logistic Jaccard 0.770920 N 10 None N/A 0.667856
NA 10 Log Dice 0.670728 N 12 Log N/A 0.669908
NA 10 Log Jaccard 0.671137 N 12 Logistic N/A 0.665688
NA 10 Logistic Dice 0.726959 N 12 None N/A 0.669810
NA 10 Logistic Jaccard 0.737731 N 14 Log N/A 0.673658
NA 10 None Dice 0.667858 N 14 Logistic N/A 0.661820
NA 10 None Jaccard 0.667858 N 14 None N/A 0.670462
NA 12 Log Dice 0.671808 N 16 Log N/A 0.669040
NA 12 Log Jaccard 0.673073 N 16 Logistic N/A 0.666306
NA 12 Logistic Dice 0.723300 N 16 None N/A 0.670721
NA 12 Logistic Jaccard 0.729915 N 18 Log N/A 0.669534
NA 12 None Dice 0.669829 N 18 Logistic N/A 0.661045
NA 12 None Jaccard 0.669829 N 18 None N/A 0.669127
NA 14 Log Dice 0.667990 W 10 N/A Dice 0.852379
NA 14 Log Jaccard 0.669195 W 10 N/A Jaccard 0.816539
NA 14 Logistic Dice 0.714399 W 12 N/A Dice 0.860230
NA 14 Logistic Jaccard 0.722061 W 12 N/A Jaccard 0.815057
NA 14 None Dice 0.670182 W 14 N/A Dice 0.859987
NA 14 None Jaccard 0.670232 W 14 N/A Jaccard 0.814819
NA 16 Log Dice 0.678210 W 16 N/A Dice 0.819333
NA 16 Log Jaccard 0.676306 W 16 N/A Jaccard 0.798225
NA 16 Logistic Dice 0.712950 W 18 N/A Dice 0.773908
NA 16 Logistic Jaccard 0.726035 W 18 N/A Jaccard 0.769316
NA 16 None Dice 0.670561
NA 16 None Jaccard 0.670614
NA 18 Log Dice 0.678099
NA 18 Log Jaccard 0.676272
NA 18 Logistic Dice 0.708987
NA 18 Logistic Jaccard 0.719783
NA 18 None Dice 0.669363
NA 18 None Jaccard 0.669419
NW 10 Log Dice 0.669719
NW 10 Log Jaccard 0.684106
NW 10 Logistic Dice 0.694825
NW 10 Logistic Jaccard 0.669269
NW 10 None Dice 0.667752
NW 10 None Jaccard 0.667750
NW 12 Log Dice 0.671185
NW 12 Log Jaccard 0.684292
NW 12 Logistic Dice 0.694171
NW 12 Logistic Jaccard 0.674548
NW 12 None Dice 0.669814
NW 12 None Jaccard 0.669810
NW 14 Log Dice 0.670793
NW 14 Log Jaccard 0.677972
NW 14 Logistic Dice 0.694405
NW 14 Logistic Jaccard 0.675936
NW 14 None Dice 0.670697
NW 14 None Jaccard 0.670578
NW 16 Log Dice 0.672472
NW 16 Log Jaccard 0.676489
NW 16 Logistic Dice 0.692137
NW 16 Logistic Jaccard 0.681405
NW 16 None Dice 0.670888
NW 16 None Jaccard 0.670706
NW 18 Logistic Dice 0.698928
continued...

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy