Add Contrast Profile Tutorial #800

ken-maeda · 2023-03-02T10:03:45Z

Pull Request Checklist

#465 reproducing the paper for tutorial.

Below is a simple checklist but please do not hesitate to ask for assistance!

review-notebook-app · 2023-03-02T10:03:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2023-03-02T10:26:42Z

Codecov Report

Patch coverage has no change and project coverage change: -0.13 ⚠️

Comparison is base (a4bb1e1) 99.25% compared to head (5407968) 99.12%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #800      +/-   ##
==========================================
- Coverage   99.25%   99.12%   -0.13%     
==========================================
  Files          82       83       +1     
  Lines       13101    13898     +797     
==========================================
+ Hits        13003    13776     +773     
- Misses         98      122      +24

see 50 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

seanlaw · 2023-03-03T13:47:09Z

@ken-maeda Thank you for this contribution. Please allow me some time to review

review-notebook-app · 2023-03-07T11:21:35Z

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2023-03-07T11:21:35Z
----------------------------------------------------------------

T(+) requires at least two behaviors.

Can you explain why it requires at least two behaviors?

Maybe "behaviors" isn't the right word and you mean "at least two instances of the positive case"?

review-notebook-app · 2023-03-07T11:21:36Z

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2023-03-07T11:21:36Z
----------------------------------------------------------------

Line #1.    ecg_df = pd.read_csv("14172m.csv", index_col=0)

Instead of repeating astype(float) so many times later, you should just do:

ecg_df = pd.read_csv("14172m.csv", index_col=0, usecols=[1]).astype(float)

Also, it would be nice just to show or print out what is in ecg_df.head() . What does the dataframe look like?

review-notebook-app · 2023-03-07T11:21:38Z

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2023-03-07T11:21:37Z
----------------------------------------------------------------

Can we add some comments on what we are looking at? Why does the bottom look so much more regular and with a repeated pattern?

review-notebook-app · 2023-03-07T11:21:40Z

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2023-03-07T11:21:39Z
----------------------------------------------------------------

Line #1.    v_query = ecg_df.iloc[5930:5930+127, 1].values.astype(float)

Why is the window size 127 and not 128?

review-notebook-app · 2023-03-07T11:21:41Z

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2023-03-07T11:21:40Z
----------------------------------------------------------------

I don't understand the significance of this. Your point isn't clear. Where did v_query come from? Why should the reader care about that?

It would be useful to discuss the bottom plot (distance profile) and how to interpret it.

Why is it useful/important to show distance profile?

ken-maeda commented on 2023-03-07T12:08:54Z
----------------------------------------------------------------

v_query is typical norml ecg query we can see everywhere in dataset. so it is picked up one randomly.

The purpose of this distance profile is finding desired behavior by just comparing v_query(typical ecg signal) with desired behavior(rare signal). As assumption, it could be highest in distance profile. But it didn't happen.

review-notebook-app · 2023-03-07T11:21:42Z

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2023-03-07T11:21:41Z
----------------------------------------------------------------

What is "plato"?

ken-maeda commented on 2023-03-07T12:10:04Z
----------------------------------------------------------------

I add more descriptino to

Contrast Profile

The subsequence in 𝐓(+) corresponding to the highest point in the Contrast Profile is called the Plato.

ken-maeda · 2023-03-07T12:08:55Z

v_query is typical norml ecg query we can see everywhere in dataset. so it is picked up one randomly.

The purpose of this distance profile is finding desired behavior by just comparing v_query(typical ecg signal) with desired behavior(rare signal). As assumption, it could be highest in distance profile. But it didn't happen.

View entire conversation on ReviewNB

ken-maeda · 2023-03-07T12:10:05Z

I add more descriptino to

Contrast Profile

The subsequence in 𝐓(+) corresponding to the highest point in the Contrast Profile is called the Plato.

View entire conversation on ReviewNB

ken-maeda · 2023-03-07T12:52:44Z

I appriciate your feedback, I fixed those.

NimaSarajpoor · 2023-03-08T02:29:45Z

@ken-maeda
I have a suggestion for you.

I think it is better to develop the notebook section by section. So, for each section that you add, you can wait to get some feedback and then apply those, and then after getting the green light, you can move forward. Right now, you may know what is going on in your notebook, however, the main goal is to make sure the reader can understand what is going on! You can keep the current notebook somewhere in your local pc. Then, you can start again by just providing the first section or a couple of sections. So, your notebook should only contain a couple of parts in the beginning. Then, you can add sections to it step by step.

Currently, if I do not understand a part of your notebook, I try to read other parts to better understand the concept. However, this is not desirable. I think this is a red flag. There should be a flow in your tutorial and I believe each segment should be understandable on its own.

Also, the text is as important as the code. In fact, I think it is more important particularly in tutorials. So, try to be extra careful when you explain a concept. You want to be crystal clear in every single step as much as possible.

NimaSarajpoor · 2023-03-08T03:42:36Z

Regarding contrast profile, this is how I see it:

we can see each subsequence of length m as a data point in $R^{m}$ space. For the sake of visualization, let's illustrate the problem in 2D space.

First, let's review the definitions of T(+) and T(-).

T(+) : contains at least two instances that are unique to the phenomena of interest.
T(-) : contains no instances of interest (and instead, I think we should say it contains the regular, obvious patterns in `T(+)`)

Before we talk about T(-) and T(+), it is better to just talk about T. Let's assume the figure below shows the subsequences of T.

If I look at this data, I can see that the regular, obvious pattern is where the crowded part is. However, the motif we might be interested in can be the motif pair (A, B). Note that this motif pair may not be easily captured as their distance is greater than the distance of any other point and its nearest neighbour.

So, what can we do? We can create T(-) which just contains the regular behaviour of our Data. We then use T(+) to denote the remaining ones.

Now, we can see that the d = dist(A, A_nn_in_Tneg) - dist(A, A_nn_in_Tpos) has a high value. Let's call it contrast distance. The contrast profile, cp, is an array where cp[i] is the contrast distance that corresponds to the i-th subsequence in T(+). The peak of this contrast profile can reveal the motif pair (A, B).

Question: Can we see it as twin-freak problem? In other words, this might be an anomaly that appears more than once. So, we can easily find the motif pair (A, B) by finding the subsequence that has the greatest distance to its 2nd nearest neighbour.

Answer: I do not know! I think a good way to investigate this is to get some data and find a pair of subsets using each of these two approaches: (1) twin-freak (2) contrast profile, and see if they result in different outcomes.

ken-maeda · 2023-03-08T04:43:49Z

@NimaSarajpoor
I'm sorry for causing trouble, and greatful your kind guidance. I uploaded new notebook first section of Tutorial_Contrast_Profile2.ipynb. I should have considered the contrast profile concept more.

I added its scatter plot should best to understand the "constrat" concept to the notebook.

T(+) : contains at least two instances that are unique to the phenomena of interest.
T(-) : contains no instances of interest (and instead, I think we should say it contains the regular, obvious patterns in T(+))

This might be tricky precondition, the robustness for this precondition also is argued. I thought the notebook should be enoguh only for expalining contrast profile conceptl.

NimaSarajpoor · 2023-03-09T04:19:14Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,249 @@
+{


Small values in the Matrix Profile are called motifs, and large values are called discords.

I can see that this sentence is from the introduction part of the paper; however, I think it is not meaningful to call small value in matrix profile, a motif. Because a motif is a subsequence (NOT a value) whose distance to its nearest neighbour is small.

I think the authors provided a better definition in the abstract of the paper:
"Time series motifs refer to two particularly close subsequences, whereas time series discords indicate subsequences that are far from their nearest neighbors."

It may be usefull to score subsequences with
I think the paper is clearer here. In the paper, it says: "....score subsequences with a meta-data that reflects that ..."

Also, I think it would be a good idea to introduce T(+) and T(-) here.

This is exactly the property we desire.

Why? I mean how can we get benefit from it? According to my understanding and what I read in the paper, this property can be used to find a subsequence that can uniquely identify a class. In other words, this can be used in classification. Is that correct? I think it would be nice to provide an example to explain the significance of contrast profile. I know there is an example in paper (see Fig. 1), but I think it is a little bit complicated. So, let's try and see if we can come up with a good example to show the importance of contrast profile. Think about it.

Reply via ReviewNB

NimaSarajpoor · 2023-03-09T04:19:14Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,249 @@
+{


Let's walk through by running example of a noisy electrocardiogram(ECG).

Maybe remove this line.

We proposed to compute the Contrast Profile only when we belive that the two following assumptions are likely to be true:

Before we start talking about the contrast profile, I think it would be a nice idea to show the data and talk about the problem. What is the problem? What are we trying to find here?

Reply via ReviewNB

Before we start talking about the contrast profile, I think it would be a nice idea to show the data and talk about the problem. What is the problem? What are we trying to find here?
Add simple example in the beginning

NimaSarajpoor · 2023-03-09T04:19:14Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,249 @@
+{


Do we use column 0 later? If not, then it might be a good idea to just read the csv for that column only. So, we can do:
T = pd.read_csv(..., usecols=["1"]).to_numpy(np.float64)

Reply via ReviewNB

I modified the dataset itself, it was originally exstracted dataset.

NimaSarajpoor · 2023-03-09T04:19:14Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,249 @@
+{


Is there a particular reason behind using the name v here? Can we use T instead? Like... T_pos and T_neg ?

ALSO: How do we know the indices 63630, 68129, etc? This may create confusion. Reading those indices made me think that using contrast profile requires us having some prior knowledge about those indices. If that is not the case, then can you provide a brief explanation on how one can create the datasets T(+) and T(-) in a real-world problem?
If you think that needs to have its own section, then you might explain that you are considering those indices to just show an example here.

Reply via ReviewNB

Is there a particular reason behind using the name v here? Can we use T instead? Like... T_pos and T_neg
Fixed

NimaSarajpoor · 2023-03-09T04:19:15Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,249 @@
+{


(1) Why are they desirable? Can we add some explanation as why one might be interested in finding those patterns? What do these patterns mean? In other words, what is the target class here?

(2) The code says label="desired instances") However, the figure's legend says "desired behaviour"

(3) Right after this block, it might be useful to compute matrix profile, and discover motif and discord, and show that the discovered motifs/discords do not reveal the patterns we are looking for.

Reply via ReviewNB

1) add discription
2) fixed
3) add matrix profile section

NimaSarajpoor · 2023-03-09T04:19:15Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,249 @@
+{


T(+) looks noisy, 𝐓(+) has to include contains at least two instances

Please rewrite it as follows: T(+) contains at least two instances that are of our interest.

Reply via ReviewNB

NimaSarajpoor · 2023-03-09T05:09:50Z

@ken-maeda

I'm sorry for causing trouble, and greatful your kind guidance.

No need to be sorry. I provided a few comments. Let's start with those. Please do not add any new section. Let's take care of the current sections first. Please feel free to discuss something if you feel there is a need for that.

seanlaw · 2023-03-10T01:12:46Z

@ken-maeda Instead of uploading a .png file, I would prefer if you could add the code that could create/recreate the image and have it inside the notebook

NimaSarajpoor · 2023-03-24T00:20:36Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,511 @@
+{


... patterns(PVCs) ...

... patterns (PVCs) ... [Notice the space I added]

Also, what is PVC? should reader know about this? If not, then you can/should remove it. If yes, then you should avoid using abbreviation unless you are going to use it again later. Even in such case, you should first provide the original phrase.

Those are target we try to find.

Those are the targets we try to find.

Reply via ReviewNB

I removed the abbreviation.

docs/Tutorial_Contrast_Profile2.ipynb

NimaSarajpoor · 2023-03-24T00:20:37Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,511 @@
+{


Great job in providing a clear plot!

Reply via ReviewNB

NimaSarajpoor · 2023-03-24T00:20:37Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,511 @@
+{


We can find clealy the discord by Matrix profile. It indicate the discord in Matrix profile by far.

Maybe we should say: "In this case, the discord, indicated by matrix profile, is what we are looking for."

What do you think? Because, any matrix profile with a unique global maxima can reveal a discord. Whether or not that discord is what we are looking for is a different story. Here, we use matrix profile to find discord, and we are lucky that the discovered discord is what we are looking for.

Reply via ReviewNB

I agree with you

NimaSarajpoor · 2023-03-24T00:20:37Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,511 @@
+{


Matrix profile indicates slided wrong place as discords.

As follow up to what I explained in my comment above, I think we should rephrase it. We may say: "The discord discovered by matrix profile is not what are looking for."

I am just trying to avoid using the word "wrong" here. In my head, contrast matrix profile is another tool that can be used to reveal something. Wether that thing is what we are loooking for or not depends on our perspective and our goal. In this case, we will probably see that contrast abstract profile can reveal something that is not captured by matrix profile. But, is that always useful? Well, that depends on the domain. In other words, a user does not know if the discord discovered by matrix profile is useful. The same goes for contrast matrix profile. A user can compute both and then investigate their outcome to see which reveals better information about the data.

ost important thing is Matrix profile also doesn't indicate remarkable distance for the discord.

How can one know if it is a remarkable distance for discord or not? I think this sentence is a little bit biased. This is because we already saw the matrix profile computed for the first half of Tand we now feel the maximum distance shown in figure above is not large enough. So, I think we should remove this sentence as we prefer to not be subjective in our analysis.

=> Because similar anomalies are nearest neighbor relationship.

Maybe we should say this first:

"Because the targets we are looking for are similar to each other, they can be nearest neighbor of each other and hence their corresponding distance in matrix profile will be small. Therefore, none of them will be detected via matrix profile"

We find those similar anomalies from the dataset with those following specification.

Now, I think this is the place that we should do our best and provide a clear explanation. We can say: "To find the desirable anomalies, we should first understand what properties distinguish them from the other patterns. Note that these two anomalies are similar to each other but they are dissimmilar to other (regular) patterns. This is the main concept behind the contrast matrix profile."

This is exactly the property we desire. In other words, we need to prepare two following data.

Then you can create a section here and name it "Contrast matrix profile". Then, we say:
"To compute the contrast matrix profile, two time series data are needed as follows: "

Take a look at the plot below to confirm the condictions

Are we showing any important information here? If not, then we can just remove this line and the code you have provided for plotting the two time series T_p and T_n

Reply via ReviewNB

I replaced sententce with your suggested contents.

NimaSarajpoor · 2023-03-24T00:25:24Z

@ken-maeda
I reviewed up to the section Loading the ECG data for Contrast Profile. Whenever you address a comment, you can go to ReviewNB (see the top of this PR) and then click on "Resolve Conversation" whenever you are done with that comment.

I think you have done a great job so far, and the notebook becomes more and more clear.

…to contrast_abs

ken-maeda · 2023-03-27T11:54:21Z

@NimaSarajpoor I appriciate your kind feedback, I updated the notebook markdown.

NimaSarajpoor · 2023-04-12T04:57:17Z

@ken-maeda
From my point of view, things are good so far. My only comment is the last figure as I think it is a little bit crowded (and I am not sure if there is a better way to break it down or not). Also, you may want to revise the last sentence. I think the last sentence talks about the desirable patterns but what you should have discussed is that the discovered discords via matrix profile (shown in "orange") is not desirable and someting like that.

ken-maeda · 2023-04-20T03:03:18Z

@NimaSarajpoor I changed to separate last plot(maybe redundant?), as you mentioned, it was crowded it is hard to recognized where is indicated. I fixed the last sentence also.

NimaSarajpoor · 2023-04-23T00:01:10Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


Please change

If a dataset has similar more than one discords, we should know how those discords are calculated in Matrix Profile.

to:

If a dataset has two similar subsequences that are far from the rest of subsequences, they may not be discovered as motif or discord by matrixprofile.

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:10Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


Let's see the entire dataset first.

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:10Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


Maybe remove the line Define index of desired instances on the purpose?

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:10Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


Line #5. desired0_idx, desired1_idx = 550, 2030
Please move this line to the top of the next code cell, where you show the desired instances. Also, please add one blank line after it (in the next code cell) for sake of readability.

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:11Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


(1) Is there extra space between the first two sentences?
(2)According to..., note that I replaced the dot with comma
(3) please modify the second sentence as follows:
.... of premature venticular contractions, with start index of 580 and 2030 (see figure below).

(4) please modify the third sentence as follows:
Those are the two instances we would like to discover.

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:11Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


please replace upper with top

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:11Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


I think you should replace the window with a vertical line in the matrix profile figure. For example, see the section Find potential anomalies (discords) using stump in this page: https://stumpy.readthedocs.io/en/latest/Tutorial_STUMPY_Basics.html

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:11Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


Please replace this sentence with:

"As shown in this figure, the motifs discovered by the matrix profile (of T_p) does not reveal our two desirble subsequences, the ones that were shown in the previous figure. "

I think this can help readers understand why we need to use contrast profile technique later to discover the two desirbale subsequences.

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:01:11Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,498 @@
+{


Cool. I can better understand this figure. I think what you are trying to say is that the discord discovered by matrix profile is not what we are looking for. I suggest to do the following instead:

1 - compute the discord index and its nearest neighbor using matrix profile.
idx = np.argmax(MP_PP[:,0]) # start index of discord nn_idx = MP_PP[:,1] # start index of nearest neighbor of discord

2- show two figures only, and both of them should be T(+). In one, you plot T(+) and show the discord and its nearest neighbor subsequences, according to the computed idx andnn_idx. In another figure, you plot T(+) again but you show the desirbale subsequences instead.

Please choose a proper title for each figure. For the first figure, we may say: "the discord and its nearest neighbor discovered by the matrix profile", and for the second figure, we may say: "the desirbale subsequences of our interest"

Now, instead of four figures, we only show two figrues, and I think it should be easily understandable. Note that the figure of matrix profile itself (the last two figures shown above) does not help that much. The third figure only shows where the matrix profile is maximum . and the fourth figure might be confusing. So, I think using two lines of code is better than showing two more figures here.

Reply via ReviewNB

NimaSarajpoor · 2023-04-23T00:03:50Z

@ken-maeda
I provided my final touch on the notebook. After addressing the comments, we should see what @seanlaw thinks about this notebook.

If everything is okay, we can then move forward and add the part where you compute the contrast profile and show that it can discover the subsequences of our interest.

@ken-maeda
How do you feel about the progress so far?

ken-maeda · 2023-05-14T13:28:13Z

@NimaSarajpoor
I'm sorry for the delay, I have fixed the notebook in the point you mentioned. I hope the notebook I created is fine now.

NimaSarajpoor · 2023-05-15T01:40:40Z

@ken-maeda
Thank you for addressing the comments. While there is still some room for improvement, we can do it later. I think @seanlaw can take a look at the notebook now and see if he has any opinion / suggestion.

@seanlaw
Do you have any comment on the second notebook, docs/Tutorial_Contrast_Profile2.ipynb ?

seanlaw · 2023-05-15T10:55:29Z

Let me find some time to provide some comments

seanlaw · 2023-05-15T15:31:15Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,485 @@
+{


There seems to be an extra space at the beginning of the title " Novel Time Series...". Please remove the space
"matrix profile" are two separate words, not one. Please add a space
If a dataset has two similar subsequences that are far from the rest of subsequences Can you please elaborate on this sentence? I am not getting your point. If they are similar enough then they may not be the top motif but it is possible to be, say, in the top 50 motifs?
Usually, in the opening paragraph, we want to try an explain what the problem is that contrast profiles can help solve. Can you clearly explain the problem? It is okay to borrow from the published paper

Reply via ReviewNB

seanlaw · 2023-05-15T15:31:15Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,485 @@
+{


Can you please place the ECG data into one of the comments of the PR and then link to that uploaded CSV?

Reply via ReviewNB

seanlaw · 2023-05-15T15:31:15Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,485 @@
+{


I don't think it is correct to call the grey areas "anomalies". Are they ever referred to as anomalies in the paper?

Edit: Below, you use the term "phenomena" and I think this is reasonable and suitable instead of "anomaly".

Reply via ReviewNB

seanlaw · 2023-05-15T15:31:16Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,485 @@
+{


Why do we set normalize=False? Did they do that in the original paper?

Why are we looking for the discord index first? It seems like the wrong order of steps. I feel like the first thing that somebody would do is to:

Apply stump to the entire time series
Then they realize that the "interesting" subsequence pair isn't in their top motif(s)
Finally, we explain why a naive approach is insufficient to find the "interesting" subsequence pair

Then, we can motivate the question, "How would/could we find the "interesting" subsequence pair?" and then provide clear and concrete steps on how to do it. Otherwise, this section seems out of place or presented too soon.

I think we have to think about "what is the simplest thing that the user would have tried?" and then, when that fails, we build upon that knowledge and move past it with some better suggestion(s)

Reply via ReviewNB

docs/Tutorial_Contrast_Profile2.ipynb

seanlaw · 2023-05-15T15:31:16Z

docs/Tutorial_Contrast_Profile2.ipynb

@@ -0,0 +1,485 @@
+{


Why are we only showing the first 2 rows? What is the point?

Reply via ReviewNB

ken-maeda · 2023-05-21T11:32:59Z

Apply stump to the entire time series
what is the simplest thing that the user would have tried?

Regarding the point whether it is natural to calculate the Matrix Profile for the entire series first, the characteristic we are trying to find this time is neither motif nor discord, so I feel there is no motivation to calculate stump directly. Therefore, even if you simply calculate it for the entire series, I don't think there is much to say from the event itself, so I compared when there is a single discord and when there are multiple discords, and built on what can be said from there.
It might be better to make it easy to understand what I'm trying to do from the very beginning.

seanlaw · 2023-05-21T17:31:32Z

Regarding the point whether it is natural to calculate the Matrix Profile for the entire series first, the characteristic we are trying to find this time is neither motif nor discord, so I feel there is no motivation to calculate stump directly. Therefore, even if you simply calculate it for the entire series, I don't think there is much to say from the event itself, so I compared when there is a single discord and when there are multiple discords, and built on what can be said from there.
It might be better to make it easy to understand what I'm trying to do from the very beginning.

I think the point is that stump will not be able to help you here precisely because the subsequence is neither a top motif or a top discord. Certainly, if you traverse down to the top-N motifs then you might eventually find it. I think it is important to motivate "why" computing the full matrix profile is not enough and also demonstrate its ineffectiveness for this particular problem (i.e., when the subsequence of interest does not have a nearest neighbor that is as close as other motifs)

ken-maeda · 2023-05-22T16:26:27Z

I think it is important to motivate "why" computing the full matrix profile is not enough and also demonstrate its ineffectiveness for this particular problem

It is challenging to set a goal prior to calculating the Matrix Profile across the whole data.
When we have a signal pattern we want to find, searching for motifs or discords with the Matrix Profile may not seem natural. Users might question what to do in such cases and may find it unnatural to take action using the naive Matrix Profile.
Analyzing the result of applying the Matrix Profile to the whole data is difficult.
As you mentioned, whether we can find what we're looking for largely depends on parts of the signal other than the current characteristic. Therefore, it's hard to say from the results what would be better from the perspective of the current characteristic .
Elements that should be explained in the introduction and elements that can be explained.
I want to determine that.
Currently, the overall flow is:
3-1. If a discord is included once, it can be found.
3-2. If a discord is included twice, it cannot be found.
3-3. So, what should we do?
Regarding this flow, I thought that it would be better to write more concisely at the beginning about what the Contrast Profile brings. However, what do you think should be written in the introduction?

seanlaw · 2023-05-23T15:24:40Z

3-1. If a discord is included once, it can be found.
3-2. If a discord is included twice, it cannot be found.
3-3. So, what should we do?
Regarding this flow, I thought that it would be better to write more concisely at the beginning about what the Contrast Profile brings. However, what do you think should be written in the introduction?

If you look at the comments, I don't think "discord" or "anomaly" is the right word here as a discord is referring to "a subsequence that has a one-nearest neighbor that is very far away". In our example, the one nearest neighbor may not necessarily be very far away and, instead, it is that the subsequence of interest isn't discovered in the first few motifs. The paper refers to them as "phenomena of interest" and not "discord"

When I look at the contrast profile paper, it presents the problem as:

Imagine that you have a time series that usually has a reasonably well defined set of one or more repeating patterns and other subsequences are not expected
Then, something occurs that induces a new subsequence (a phenomena) that has never been observed before AND it isn't quite a discord because the phenomena has repeated itself at least once more and while this nearest neighbor might be "close" to the first manifestation, it isn't as "close" to the nearest neighbor of previously known motifs (i.e., the repeating patterns in 1.)

So, you have some historical data where everything is well known and then you encounter an event that causes a new, never seen before subsequence AND it happens a second time. And so it becomes obvious to ask the question, "given how rare the phenomena is, how might we go about detecting it in a systematic way?"

The first (naive) thing that you might try is to compute the matrix profile for the entire time series and then iterate through the top-N motifs via stumpy.motif and then consider how similar the ith motif is to the first i-1 motifs. However, there is a "better" way... Then introduce the contrast profile (including its assumptions, what it aims to do, and what it isn't/can't do)

ken-maeda · 2023-05-28T13:17:31Z

What I'm trying to say is that it doesn't make sense to me why we're calculating the matrix profile for the entire series when there's not enough reason to expect that the phenomena we're trying to discover will be detected as motifs. It feels like we're just saying, 'We tried calculating it and this is what happened.'

So, you have some historical data where everything is well known and then you encounter an event that causes a new, never seen before subsequence AND it happens a second time. And so it becomes obvious to ask the question, "given how rare the phenomena is, how might we go about detecting it in a systematic way?"

In this way, the phenomenon we wish to find this time is mentioned, but I believe no mention is made of the relationship with other intervals or motifs that occur in other intervals. Under these circumstances, would a user try to find it as a motif?

Particularly in this case, where other parts clearly have repetitive subsequence, it feels unnatural to calculate motifs to find the phenomena.

.

seanlaw · 2023-05-30T00:40:31Z

Particularly in this case, where other parts clearly have repetitive subsequence, it feels unnatural to calculate motifs to find the phenomena.

@ken-maeda If this is the case, you really need to explain this fact about the data at the beginning when you present the data. I don't think it is enough to assume that the reader will notice the repetition. Instead, you must point out the facts even if it is obvious. Then, you should explicitly explain that computing the full matrix profile won't really help you find the phenomena and why that is. Better yet, I was advocating that you just compute the matrix profile and show its limitations in that the phenomena won't be captured easily by the matrix profile (or using the motifs function).

ken-maeda · 2023-06-07T01:06:32Z

As you suggested, I tried it. Does this seem to be okay?

top-motif location by motifs

We were able to find something close to the motif we want to find as the top 10th motif.

seanlaw · 2023-06-07T11:05:15Z

As you suggested, I tried it. Does this seem to be okay? We were able to find something close to the motif we want to find as the top 10th motif.

Yes! What code did you use to get that information? Was it stumpy.stump followed by stumpy.motifs?

So, I think it's important to explicitly point out/motivate that, in this example, we only know to search to the 10th motif because we are lucky enough to see/visualize where the phenomena is but we have no idea if it will be the 10th motif or the 100th motif. In the real world, the phenomena might be much harder to spot with the naked eye because the repetition within data might be much longer/noisier. And so we need a different way to solve this problem.

Maybe a poor-man's version would be to compare ith motif with the all of the motifs that were discovered before it and ask how different/similar it is? But this is painful and more art than science.

ken-maeda · 2023-06-12T13:25:09Z

mp = stumpy.stump(T, m, normalize=False)
motifs = stumpy.motifs(T, mp[:, 0])

I simply calculated the above example from the motifs.
Considering the following flow for the introduction, how far do you think it's appropriate to explain?

1."By naively using stump, you can discover motifs and discords.
2.In this case, the signal we want to find are similar, so it should be possible to findthem as some top-motif when we compute stump.
3.We are calculating stump, but the signal we want to find have clear characteristics in their size(amplitude), so we calculate with normalization set to False, which is True as the default in stump.
4.It could be calculated as the 10th top-motif.
5.However, in reality, we don't know as which top-motif it will be calculated. There is a more reliable way to find signal with these characteristics. This can be achieved with a concept called Contrast Profile, which I would like to explain in detail."

seanlaw · 2023-06-12T13:42:34Z

@ken-maeda I think this level of detail is more than fine. However, I think it is important to point out that there is repetition in the data and so finding the "phenomena" (only repeated once) is challenging since the majority of the top motifs will look like the repeated pattern.

ken-maeda · 2023-06-12T13:51:48Z

@seanlaw OK, I'm going to create another notebook from the introduction, following your new advice.

seanlaw · 2023-06-14T10:24:46Z

docs/Tutorial_Contrast_Profile3_introduction.ipynb

@@ -0,0 +1,226 @@
+{


In all of our tutorials, we try to reproduce specific figures from the original paper (without alteration). Which figure is this reproducing from the paper? This looks different Figure 4. Did you alter the time series? If yes, can you please use the same time series as provided in the original paper? The goal here is reproducibility of the work.

Also, the original paper does not say anything about setting normalize=False and so we probably shouldn't confuse the reader by doing it here either.

Reply via ReviewNB

The introduction I'm currently creating is a reproduction of Figure 2, not Figure 4. We have discussed up to this point that the original figure is abstract and hard to understand, so it's necessary to explain what it means with actual signal. Therefore, I'm using parts from the original data that were not used in the paper but are suitable for explanation this time. So, the normalization issue is not something to be referenced from the paper. If that is a problem, I think it could have been pointed out before creation.I checked the flow of the introduction above to prevent such confusion... and inconsistent with past disscussion.

As for FIgure 4 and other example, I have created it according to the paper in the notebooks until now.

ken-maeda and others added 8 commits September 6, 2022 00:12

Matrix Profile Top Ten draft

d692b1a

Merge branch 'TDAmeritrade:main' into main

92ed8cb

Merge branch 'TDAmeritrade:main' into main

9a33699

Merge branch 'TDAmeritrade:main' into main

8f7bf2e

Merge branch 'TDAmeritrade:main' into main

4ab7a31

init

51e73f2

Merge branch 'TDAmeritrade:main' into contrast_abs

a71dc69

change file name

a71dcdf

sync

05f5d80

fix md

fcf15a2

recreate book

1565ebb

NimaSarajpoor reviewed Mar 9, 2023

View reviewed changes

fix docs

c3bd216

Merge branch 'TDAmeritrade:main' into contrast_abs

f1b5331

ken-maeda requested a review from NimaSarajpoor March 23, 2023 15:38

NimaSarajpoor reviewed Mar 24, 2023

View reviewed changes

ken-maeda added 2 commits March 27, 2023 20:49

fix md

d209d1c

Merge branch 'contrast_abs' of https://github.com/ken-maeda/stumpy in…

8ca2bdb

…to contrast_abs

fix last fig, docs

0c06f45

NimaSarajpoor reviewed Apr 23, 2023

View reviewed changes

fix docs, plots

033ae5b

seanlaw reviewed May 15, 2023

View reviewed changes

[ADD] notebook version3

5407968

seanlaw reviewed Jun 14, 2023

View reviewed changes

Add Contrast Profile Tutorial #800

Are you sure you want to change the base?

Add Contrast Profile Tutorial #800

Conversation

ken-maeda commented Mar 2, 2023

Pull Request Checklist

review-notebook-app bot commented Mar 2, 2023

codecov-commenter commented Mar 2, 2023 • edited Loading

Codecov Report

seanlaw commented Mar 3, 2023

review-notebook-app bot commented Mar 7, 2023 • edited Loading

review-notebook-app bot commented Mar 7, 2023 • edited Loading

review-notebook-app bot commented Mar 7, 2023 • edited Loading

review-notebook-app bot commented Mar 7, 2023 • edited Loading

review-notebook-app bot commented Mar 7, 2023 • edited Loading

review-notebook-app bot commented Mar 7, 2023 • edited Loading

Contrast Profile

ken-maeda commented Mar 7, 2023

ken-maeda commented Mar 7, 2023

Contrast Profile

ken-maeda commented Mar 7, 2023

NimaSarajpoor commented Mar 8, 2023 • edited Loading

NimaSarajpoor commented Mar 8, 2023 • edited Loading

ken-maeda commented Mar 8, 2023

NimaSarajpoor Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor commented Mar 9, 2023 • edited Loading

seanlaw commented Mar 10, 2023

NimaSarajpoor Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NimaSarajpoor commented Mar 24, 2023 • edited Loading

ken-maeda commented Mar 27, 2023

NimaSarajpoor commented Apr 12, 2023 • edited Loading

ken-maeda commented Apr 20, 2023

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

NimaSarajpoor commented Apr 23, 2023 • edited Loading

ken-maeda commented May 14, 2023

NimaSarajpoor commented May 15, 2023 • edited Loading

seanlaw commented May 15, 2023

codecov-commenter commented Mar 2, 2023 •

edited

Loading

review-notebook-app bot commented Mar 7, 2023 •

edited

Loading

review-notebook-app bot commented Mar 7, 2023 •

edited

Loading

review-notebook-app bot commented Mar 7, 2023 •

edited

Loading

review-notebook-app bot commented Mar 7, 2023 •

edited

Loading

review-notebook-app bot commented Mar 7, 2023 •

edited

Loading

review-notebook-app bot commented Mar 7, 2023 •

edited

Loading

NimaSarajpoor commented Mar 8, 2023 •

edited

Loading

NimaSarajpoor commented Mar 8, 2023 •

edited

Loading

NimaSarajpoor Mar 9, 2023 •

edited

Loading

NimaSarajpoor Mar 9, 2023 •

edited

Loading

NimaSarajpoor Mar 9, 2023 •

edited

Loading

NimaSarajpoor Mar 9, 2023 •

edited

Loading

NimaSarajpoor Mar 9, 2023 •

edited

Loading

NimaSarajpoor Mar 9, 2023 •

edited

Loading

NimaSarajpoor commented Mar 9, 2023 •

edited

Loading

NimaSarajpoor Mar 24, 2023 •

edited

Loading

NimaSarajpoor Mar 24, 2023 •

edited

Loading

NimaSarajpoor Mar 24, 2023 •

edited

Loading

NimaSarajpoor Mar 24, 2023 •

edited

Loading

NimaSarajpoor commented Mar 24, 2023 •

edited

Loading

NimaSarajpoor commented Apr 12, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor Apr 23, 2023 •

edited

Loading

NimaSarajpoor commented Apr 23, 2023 •

edited

Loading

NimaSarajpoor commented May 15, 2023 •

edited

Loading

seanlaw May 15, 2023 •

edited

Loading

seanlaw May 15, 2023 •

edited

Loading

seanlaw May 15, 2023 •

edited

Loading

seanlaw May 15, 2023 •

edited

Loading

seanlaw May 15, 2023 •

edited

Loading

seanlaw commented Jun 7, 2023 •

edited

Loading

seanlaw Jun 14, 2023 •

edited

Loading

ken-maeda Jun 14, 2023 •

edited

Loading