Skip to content

confusion matrix metrics #285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Sep 15, 2021
Merged

confusion matrix metrics #285

merged 26 commits into from
Sep 15, 2021

Conversation

msokoloff1
Copy link
Contributor

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

It first computes the confusion matrix for each metric and then sums across all classes

Args:
ground_truth : Label containing human annotations or annotations known to be correct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we avoided using this name in the label import project because it didnt exist anywhere else in the application. Can you double check that we want to bake this into a public-facing method? I think maybe we can just call them label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, we need to come up with some consistent language. The problem here is that Label means "data and annotations" not "human made annotations". There needs to be some distinction between human generated and machine generated labels. I am open to changing the name but I already exposed this concept with the iou metric functions for our first mea release :). Although these are positional args so if we want to change this we probably can. We should probably do it soon though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i honestly have no stake in it, i guess i just wanted to point it out. I think Gareth asked if we could change the name in label imports, so he might be a person to run it by.
I definitely dont have a problem with it, and honestly, i think ground truth label is a good name, as thats the name in the industry, so i dont know why labelbox wouldnt go that direction.

f"Found `{type(prediction)}` and `{type(ground_truth)}`")

if isinstance(prediction.value, Radio):
return radio_confusion_matrix(ground_truth.value, prediction.value)
Copy link
Contributor

@GrantEating GrantEating Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont follow why were returning the confusion matrices for just the first prediction/ground truth values? dont we need to find an average for all of the values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because at this point we have grouped annotations by class. So there can only be one or zero of a classification in an image. I updated the docstring to explain this.

@@ -158,3 +157,19 @@ def _create_feature_lookup(features: List[FeatureSchema],
for feature in features:
grouped_features[getattr(feature, key)].append(feature)
return grouped_features


def has_no_matching_annotations(ground_truths: List[ObjectAnnotation],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a matching annotation? wouldnt a matching prediction and label have the same schemaId? Would it make sense to check for that in this function, or is it not that simple?
Would it make sense to make sure the lengths are the same?

Copy link
Contributor Author

@msokoloff1 msokoloff1 Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All annotations passed to this function have the same schema ids. So this function checks if we even need to run the metric calculation code - if there is no ground truth or no prediction then there is nothing to compute.

Comment on lines +22 to +25
expected = [0, 0, 0, 0]
for expected_values in example.expected.values():
for idx in range(4):
expected[idx] += expected_values[idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt this whole block just re-generating a list, expected, which is the same list as example.expected.values()?
Do we need this block at all? can we just do

            assert score[0].value == tuple(
                example.expected.values()), f"{example.predictions},{example.ground_truths}"

?

Copy link
Contributor Author

@msokoloff1 msokoloff1 Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are actually accumulating all expected values.

expected = [0, 0, 0, 0]
for expected_values in example.expected.values():
    for idx in range(4):
        expected[idx] += expected_values[idx]
assert ... # Each of the values are added to expected before this assert statement is reached.

if the assert was within the outer for loop, then what you are suggesting would be true.

Comment on lines +33 to +41
def test_custom_confusion_matrix_metric():
with open('tests/data/assets/ndjson/custom_confusion_matrix_import.json',
'r') as file:
data = json.load(file)

label_list = NDJsonConverter.deserialize(data).as_list()
reserialized = list(NDJsonConverter.serialize(label_list))
assert json.dumps(reserialized,
sort_keys=True) == json.dumps(data, sort_keys=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this test test_custom_confusion_matrix_metric ? it seems like it just deserialize and re-serializes a json file, then asserts that its equal to the file data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the serialization and deserialization of the custom metric. Not the actual calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. I might recommend naming it something about the serialization, then but that just a nit!

label = project.create_label(data_row=data_row, label=rand_gen(str))
time.sleep(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a lot of time. Is there any way to avoid this? I assume that things were not being created in time? is there a way to "await" (for lack of a pythonic verison of the word) these function calls?

Copy link
Contributor Author

@msokoloff1 msokoloff1 Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything is technically awaited in python (unless using asyncio or some other async abstraction). The problem is the async stuff on the backend that can't be awaited like items that are added to the queue to be processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also once label imports are complete, I can throw out this garbage hack to upload labels for tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah. dang. 10s is a while to add to a test, but if the processing takes that long, makes sense!

metric_name: str
aggregation: ConfusionMatrixAggregation

def to_common(self) -> ConfusionMatrixMetric:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is common?

Copy link
Contributor Author

@msokoloff1 msokoloff1 Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the standard format for annotations in the repo. This terminology is used elsewhere. I am not a huge fan of this naming convention but I didn't have a great alternative. The type hints should make this a little less ambiguous.

return [tps, fps, 0, fns]


def mask_confusion_matrix(ground_truths: List[ObjectAnnotation],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont feel like i was able to review these functions at all, fyi

@GrantEating
Copy link
Contributor

As badly as a i want to say LGTM, i dont feel confident with my review here. I left some comments, but most arent very useful.
Happy to stamp, but its more of a LGTM? haha.

This PR is worth being proud of, though. Really shows an accumulation of a lot of knowledge and hard work. I am excited to see people use this, youve really owned custom metrics.

@msokoloff1 msokoloff1 changed the base branch from develop to ms/expand-metrics September 14, 2021 14:17
@msokoloff1 msokoloff1 changed the base branch from ms/expand-metrics to develop September 14, 2021 14:24
@msokoloff1 msokoloff1 changed the base branch from develop to examples September 14, 2021 14:27
@msokoloff1 msokoloff1 changed the base branch from examples to develop September 14, 2021 14:27
@msokoloff1 msokoloff1 merged commit 3bb2aaf into develop Sep 15, 2021
@msokoloff1 msokoloff1 deleted the ms/compute-conf-matrix branch September 15, 2021 09:40
msokoloff1 added a commit that referenced this pull request Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy