Content-Length: 449425 | pFad | https://github.com/numpy/numpy/issues/10448

81 BUG interp wrong if input xp isn't sorted · Issue #10448 · numpy/numpy · GitHub
Skip to content

BUG interp wrong if input xp isn't sorted #10448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nschloe opened this issue Jan 21, 2018 · 20 comments
Open

BUG interp wrong if input xp isn't sorted #10448

nschloe opened this issue Jan 21, 2018 · 20 comments
Labels
57 - Close? Issues which may be closable unless discussion continued component: numpy.lib

Comments

@nschloe
Copy link
Contributor

nschloe commented Jan 21, 2018

MWE:

import numpy

xp = [0.0, 1.0]
fp = [0.2, 0.3]
print(numpy.interp(0.5, xp, fp))  # 0.25, correct

xp = [1.0, 0.0]
fp = [0.3, 0.2]
print(numpy.interp(0.5, xp, fp))  # 0.2, WRONG
@mhvk
Copy link
Contributor

mhvk commented Jan 21, 2018

The docstring explicitly states:

xp : 1-D sequence of floats
    The x-coordinates of the data points, must be increasing if argument
    `period` is not specified. Otherwise, `xp` is internally sorted after
    normalizing the periodic boundaries with ``xp = xp % period``.

The description is accurate: sorting does happen if a period is specified:

np.interp(0.5, xp, fp, period=np.inf)
# 0.25

But perhaps we should make np.interp better... At least, it would not be unreasonable to expect it to work for monotonously increasing or decreasing values. Sorting should definitely be optional, as it is quite slow. And I guess even now one could add to the docstring that if sorting is wanted for non-periodic data, passing period=np.inf works.

@eric-wieser
Copy link
Member

At least, it would not be unreasonable to expect it to work for monotonously increasing or decreasing values.

I'm not a huge fan of this, as it adds runtime to interp to check that an already-sorted sequence is sorted. For similar reasons, I'd avoid using np.digitize(x, bins), and use np.searchsorted(bins, x, side='right') instead, which is the same but without the redundant check.

Essentially, either your data is:

  • In a random order, which is unlikely to be meaningful to interp
  • Sorted in a known increasing/decreasing order, in which case requiring a manual reverse would be valuable
  • Sorted in one unknown order - in this case, the caller can still do a faster job of detecting this by just comparing a[0] with a[-1], taking advantage of the knowledge that it is sorted.

Is it possible to trivially check in searchsorted that the array is actually sorted during the iteration?

@mhvk
Copy link
Contributor

mhvk commented Jan 21, 2018

@eric-wieser - the only reason I suggested it is that it would be a very cheap addition (to searchsorted I guess). But I agree with your point that it is equally trivial for the caller to determine it.

@nschloe
Copy link
Contributor Author

nschloe commented Jan 21, 2018

Essentially, either your data is: [...]

Good points. Another option would be to raise an exception is the input data is not sorted as expected. The check

numpy.all(a[1:] > a[:-1])

is pretty cheap.

@pwolfram
Copy link
Contributor

pwolfram commented Aug 3, 2018

I'd concur that this is a big problem. I don't recall that np.interp always worked this way. It can easily introduce bugs in code if x is monotonically decreasing instead of increasing:

In [113]: np.interp(xp, x, y)
Out[113]: -1e+34

In [114]: np.interp(-xp, -x, y)
Out[114]: 5.274509803921655

Thankfully this isn't a subtle bug... this time. Throughout this process I probably read the documentation in ipython 2-4 times via np.interp? There was nothing to suggest that monotonicity was required although in hindsight I should have been suspicous from

xp : 1-D sequence of floats
    The x-coordinates of the data points, must be increasing if argument
    `period` is not specified. Otherwise, `xp` is internally sorted after
    normalizing the periodic boundaries with ``xp = xp % period``.

Even a simple assert that a[0] < a[-1] would be a good idea to prevent this type of behavior from happening but I believe the doc string probably needs editing for clarity.

I don't believe many people think of an interpolant as requiring a monotonic increase, e.g., https://www.mathworks.com/help/matlab/ref/interp1.html, so requiring this implicitly without any warning when there is an error (which could produce a very, very nasty bug) is dangerous to the community at large using this function.

If runtime is this important, perhaps the function should be renamed to np.incrinterp and there could be a similar np.decrinterp function. But, I suppose that https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html is anticipated to the be scientific workhorse.

Perhaps something an edit to the docstring would be good from the current

One-dimensional linear interpolation.

Returns the one-dimensional piecewise linear interpolant to a function
with given values at discrete data-points.

to

One-dimensional linear interpolation, requiring monotonically increasing data.

Returns the one-dimensional piecewise linear interpolant to a function
with given values at discrete data-points that are monotonically increasing in
the independent variable.

@pwolfram
Copy link
Contributor

pwolfram commented Aug 3, 2018

@eric-wieser, thanks in advance for your consideration of this request to clarify np.interp in some way to help others in the future avoid bugs and save time.

@pwolfram
Copy link
Contributor

pwolfram commented Aug 3, 2018

Most simply, I'd recommend moving

Does not check that the x-coordinate sequence `xp` is increasing.

from the notes to the main doc string so this is one of the first things folks see. This won't affect runtime and will get the key message across clearly.

@euronion
Copy link
Contributor

So I ran into (nearly) this problem and it caused me headaches to find out what was causing the unexpected behaviour. It turned out I provided a monotonically increasing (but not strictly) xp to np.interp:

MWE

import numpy as np
x = np.array([0, 1, 1])
y = np.array([0, 1, 0])

print(np.interp(x, x, y))

Output

array([0., 0., 0.])

I honestly would have expected to get an error on this from numpy before I found the discussion. I had skimmed the documentation beforehand but did assume increasing not meaning strictly increasing .
Would it be possible to include the strictly in the documentation as well and a warning that this is not beeing checked for? :)

@mhvk
Copy link
Contributor

mhvk commented Feb 28, 2019

@euronion - it is not surprising the code doesn't know what to do if you try to interpolate through a bi-valued function... Still, since equal-valued is such a corner case, adding "strictly" to the documentation is not a bad idea - could you perhaps make a quick PR? The github interface should suffice for this (relevant file is numpy/lib/funcion_base.py)

euronion added a commit to euronion/numpy that referenced this issue Feb 28, 2019
Adding 'strictly' to make the conditions to be met by the `xp` argument even more clear.
Following the suggestion in numpy#10448 (comment) .
@euronion
Copy link
Contributor

@mhvk - Sure thing. I hope I did not violate to many guidelines for commits and PRs .

Regarding the topic:
I would personally prefer to have numpy check for this, even with the performance penalty.

@mhvk
Copy link
Contributor

mhvk commented Feb 28, 2019

THanks, note the PR must be to the numpy branch!

On topic: I quite strongly prefer not checking (the check is not cheap compared to the operation), but can see an argument for at least having an auto-sort option. Indeed, period=np.inf will do that for you (though it gives a warning).

@euronion
Copy link
Contributor

THanks, note the PR must be to the numpy branch!

My mistake, sorry.

Indeed, period=np.inf will do that for you (though it gives a warning).

Will keep this in mind, thanks for pointing it out.

@coroa
Copy link

coroa commented Mar 1, 2019

import numpy as np
x = np.array([0, 1, 1])
y = np.array([0, 1, 0])

print(np.interp(x, x, y))

Output

array([0., 0., 0.])

Note that the result is correct under the assumption that you are trying to interpolate a right-continuous function, see also my comment at scipy/scipy#9886 (comment).

@akiss-ic
Copy link

It seems that there is hesitation to provide checking for this very insidious behavior because of impact on runtime. Could checking for increasing monotonicity not be enabled by an argument? I feel that the vast majority of use cases, the bugs produced by this behavior is more detrimental to the runtime hit, but if runtime is of critical importance to the use case, one could disable checking by passing in a simple check_monotonicty argument boolean.

@jorenham
Copy link
Member

jorenham commented May 29, 2025

The docs are pretty clear about this now. The first line reads:

One-dimensional linear interpolation for monotonically increasing sample points.

And the xp parameter is described as

The x-coordinates of the data points, must be increasing if argument period is not specified. Otherwise, xp is internally sorted after normalizing the periodic boundaries with xp = xp % period.

Then there's also an explicit warning:

The x-coordinate sequence is expected to be increasing, but this is not explicitly enforced. However, if the sequence xp is non-increasing, interpolation results are meaningless.

So this behavior is explicitly mentioned in three different places, making it pretty hard to miss.


Could checking for increasing monotonicity not be enabled by an argument? I feel that the vast majority of use cases, the bugs produced by this behavior is more detrimental to the runtime hit, but if runtime is of critical importance to the use case, one could disable checking by passing in a simple check_monotonicty argument boolean.

If you don't know if your data is sorted, then why not just use something like np.sort(xp, stable=True)? I doubt that a check_monotonicty=True would be faster for sorted arrays, and will crash your program otherwise. So I don't see any benefit in adding it.

@jorenham jorenham added the 57 - Close? Issues which may be closable unless discussion continued label May 29, 2025
@akiss-ic
Copy link

akiss-ic commented May 29, 2025

Yes I am aware that the docs call it out, but users are not looking at the docs every single time they use the function.

A frequent use of interp is in routine data analysis and one-off small jobs, not just as part of larger codebases. Very frequently you are getting data from various sources which may or may not be monotonic and/or may or may not be monotonically increasing. When doing data analysis, you're thinking about the transformations and analysis, not the mechanics of numpy. It is very easy to forget that interp demands a strictly monotonic increasing x when you're focused on the analysis. The fact that it gives you often reasonable, but very wrong values is very deceiving. At best this causes frustration and takes debugging time until you realize that it is an issue with the data and interp and go and rectify it, at worst you believe the data and that causes a host of downstream issues. I have hit this issues countless times when doing data analysis.

The fact that other interpolating methods that require strict increasing monotonicity do this check by default, e.g.scipy.interpolate.CubicSpline and throw a ValueError when it is not met shows that many communities and users feel this is an important feature.

@jorenham
Copy link
Member

Yes I am aware that the docs call it out, but users are not looking at the docs every single time they use the function.

Adding an optional keyword argument requires you to be aware of monotonicity requirement too, so that won't solve this problem.


But like I said before, if your data might not be sorted, then you should probably just sort it.

But if you don't want to that for whatever reason, then I'd suggest using an assert instead:

assert np.all(np.sort(xp, stable=True) == xp)

I would consider this to be more idiomatic than the keyword argument you suggested.

@mhvk
Copy link
Contributor

mhvk commented May 29, 2025

Agreed with @jorenham - also because there are more than a few cases where sorting the data just gives a different wrong answer (e.g., if xp, yp are smooth but non-monotonic - sorting then definitely won't help, you would actually have to specify the branch).

@akiss-ic
Copy link

akiss-ic commented May 29, 2025

Well I would've suggested the check occur by default as it is done with other packages, but I know that would be a breaking change.

Fine, just wanted to voice my opinion and agree with the others earlier in this thread who voiced similar sentiments. I see that this is not a popular one among the maintainers of numpy.

@akiss-ic
Copy link

By the way, I do think this issue has become more relevant as Scipy has moved its interp1d function to legacy status and now explicitly directs users to numpy interp. interp1d has no requirement on monotonicity whatsoever, so former users of that functionality will not be expecting such a requirement. Yes they should read the docs first before using a new function, but we all know that many will look at the call signature and use it blindly.

Furthermore their new page on 1D interpolation with an example of using np.interp for 1D linear interpolation does not mention the monotonicity requirement at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
57 - Close? Issues which may be closable unless discussion continued component: numpy.lib
Projects
None yet
Development

No branches or pull requests

9 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/numpy/numpy/issues/10448

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy