`copilot-theorem`: Produce a counterexample when the What4 backend falsifies a property #589

RyanGlScott · 2025-01-29T19:43:13Z

Description

Currently, the Copilot.Theorem.What4.prove function returns a list of results, where each result contains a SatResult that describes whether a property is Valid, Invalid, or Unknown. The Invalid result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean.

It would be helpful if Copilot.Theorem.What4 could offer an API to prove or disprove a property such that disproven properties come with a concrete counterexample. This counterexample information could then be interpreted by users.

Type

Feature: Add counterexample capabilities to the What4 backend in copilot-theorem.

Additional context

None.

Requester

Ryan Scott (Galois).

Method to check presence of bug

Not applicable (not a bug).

Expected result

Introduce a new function to Copilot.Theorem.What4 that mirrors the type signature of prove, except that it returns a variant of SatResult where the Invalid equivalent encodes counterexample information. copilot-theorem users can then interpret the results of the counterexample in Copilot specifications.

Desired result

Introduce a new function to Copilot.Theorem.What4 that mirrors the type signature of prove, except that it returns a variant of SatResult where the Invalid equivalent encodes counterexample information. copilot-theorem users can then interpret the results of the counterexample in Copilot specifications.

Proposed solution

Introduce a new prove' :: Solver -> Spec -> IO [(Name, SatResult' CounterExample)] function (names subject to change during review), where SatResult' is defined to be:

data SatResult' = Valid' | Invalid' CounterExample | Unknown'

And CounterExample records enough information about a concrete counterexample such that a Copilot user could display it.

Further notes

None.

The text was updated successfully, but these errors were encountered:

ivanperez-keera · 2025-02-13T16:09:04Z

Change Manager: Confirmed that the issue exists.

ivanperez-keera · 2025-02-13T16:09:19Z

Technical Lead: Confirmed that the issue should be addressed.

ivanperez-keera · 2025-02-13T16:10:32Z

Technical Lead: Issue scheduled for fixing in Copilot 4.3.

Fix assigned to: @RyanGlScott .

Lacking a `Show` instance makes it difficult to display type information in panic messages. This commit derives a basic `Show` instance so that `copilot-theorem` can display `Type`s whenever an internal invariant is violated.

…Copilot-Language#589. Previously, the `XEmptyArray` and `XArray` data constructors did not record evidence that their array element types were instances of the `Typed` class, which made it impossible to use in contexts where `Typed` is required. This commit adds the necessary constraints to each data constructor to make their array elements `Typed`.

Copilot-Language#589. Previously, the `valFromExpr` function was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging. This is an internal-only refactoring that should not have any changes in user-facing behavior.

…t-Language#589. `CounterExample` was not exported from `Copilot.Theorem.What4`, and moreover, its current definition was insufficient to capture all of the information that would be desirable from a counterexample. This commit removes `CounterExample` in preparation for a follow-up commit that adds an improved version of `CounterExample`. As `CounterExample` was not exported, this change is purely an internal refactoring.

…nguage#589. This is an internal refactoring where the body of the `prove` function is split out into a separate `proveInternal` function. In a subsequent commit, we will use `proveInternal` to implement a variant of `prove` that returns more information.

…opilot-Language#589. Previously, the `CopilotValue` data type lacked a `Show` or `ShowF` instances, which made it impractical to display them. This commit adds a `Show` instance so that a value of type `CopilotValue a` can be shown, and it also adds a `ShowF` instance so that a value of type `Some CopilotValue` can be shown.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `prove'` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` where the `Invalid` equivalent encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications.

…#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To define a companion function to `prove` that also returns counterexample information upon a failed proof, it is convenient to be able to display `Type` information in panic messages. This commit derives a basic `Show` instance for `Type` so that `copilot-theorem` can display them whenever an internal invariant is violated.

…Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To update the `valFromExpr` function in order to produce concrete array values for counterexample purposes, we need to call the `Array` data constructor, which has a `Typed` constraint. However, the `XEmptyArray` and `XArray` data constructors do not record evidence that their array element types were instances of the `Typed` class, which makes it impossible to use them in `valFromExpr`. This commit adds the necessary constraints to each data constructor to make their array elements `Typed`.

Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.

…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.

Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.

…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.

…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.

…#589.

RyanGlScott · 2025-02-24T22:43:40Z

Implementor: Solution implemented, review requested.

Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.

…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.

Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.

…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.

…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.

…#589.

Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.

…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.

Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.

…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.

…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.

…#589.

ivanperez-keera · 2025-02-28T03:14:33Z

Change Manager: Verified that:

Solution is implemented:

The code proposed compiles and passes all tests. Details:
Build log: https://github.com/Copilot-Language/copilot/runs/37925988037

The solution proposed produces the expected result. Details:
The following Dockerfile runs an example included in the PR that demonstrates the new feature by printing the results of proving some combination of valid and invalid properties, for which counterexamples are produced when applicable. I have checked by visual inspection that the results are produced, and that the message is well formatted and informative enough:

FROM ubuntu:focal

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update

RUN apt-get install --yes \
      libz-dev \
      git \
      curl \
      gcc \
      g++ \
      make \
      libgmp3-dev  \
      pkg-config \
      z3

RUN mkdir -p $HOME/.ghcup/bin
RUN curl https://downloads.haskell.org/~ghcup/0.1.19.2/x86_64-linux-ghcup-0.1.19.2 -o $HOME/.ghcup/bin/ghcup
RUN chmod a+x $HOME/.ghcup/bin/ghcup
ENV PATH=$PATH:/root/.ghcup/bin/
ENV PATH=$PATH:/root/.cabal/bin/

SHELL ["/bin/bash", "-c"]

RUN ghcup install ghc 9.10
RUN ghcup install cabal 3.2
RUN ghcup set ghc 9.10
RUN cabal update

SHELL ["/bin/bash", "-c"]
CMD git clone $REPO && cd $NAME && git checkout $COMMIT && cd .. \
  && cabal v1-sandbox init \
  && cabal v1-install alex happy --constraint='happy <= 2' \
  && cabal v1-install $NAME/copilot**/ \
  && cabal v1-exec -- runhaskell $NAME/copilot/examples/what4/ArithmeticCounterExamples.hs \
  && echo "Success"

Command (substitute variables based on new path after merge):

$ docker run -e "REPO=https://github.com/GaloisInc/copilot-1" -e "NAME=copilot-1" -e "COMMIT=9861071718850fb8ef85e533a6c0d6d083f69094" -it copilot-verify-589

Implementation is documented. Details:
All new top-level definitions include haddock documentations, as well as the internals of the code and the examples.
Change history is clear.
Commit messages are clear.
Changelogs are updated.
Examples are updated. Details:
A new example is introduced to demonstrate the feature.
Required version bumps are evaluated. Details:
Bump required; the new feature changes the types of some value constructors in copilot-theorem, thus affecting the public API.

ivanperez-keera · 2025-02-28T03:14:40Z

Change Manager: Implementation ready to be merged.

ivanperez-keera added CR:Type:Feature Admin only: Change request pertaining to new features requested CR:Status:Initiated Admin only: Change request that has been initiated labels Feb 13, 2025

ivanperez-keera added CR:Status:Confirmed Admin only: Change request that has been acknowledged by the change manager and removed CR:Status:Initiated Admin only: Change request that has been initiated labels Feb 13, 2025

ivanperez-keera added CR:Status:Accepted Admin only: Change request accepted by technical lead and removed CR:Status:Confirmed Admin only: Change request that has been acknowledged by the change manager labels Feb 13, 2025

ivanperez-keera added CR:Status:Scheduled Admin only: Change requested scheduled and removed CR:Status:Accepted Admin only: Change request accepted by technical lead labels Feb 13, 2025

ivanperez-keera added this to the 4.3 milestone Feb 13, 2025

ivanperez-keera assigned RyanGlScott Feb 13, 2025

ivanperez-keera added CR:Status:Implementation Admin only: Change request that is currently being implemented and removed CR:Status:Scheduled Admin only: Change requested scheduled labels Feb 13, 2025

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025

copilot-core: Document changes in CHANGELOG. Refs Copilot-Language#589.

4d974ea

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025

copilot-theorem: Document changes in CHANGELOG. Refs Copilot-Language…

76e3773

…#589.

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025

copilot-core: Document changes in CHANGELOG. Refs Copilot-Language#589.

5a7a14d

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025

copilot-theorem: Document changes in CHANGELOG. Refs Copilot-Language…

0b308e1

…#589.

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025

copilot: Document changes in CHANGELOG. Refs Copilot-Language#589.

2321709

RyanGlScott mentioned this issue Feb 24, 2025

copilot-theorem: Add function to produce counterexamples for invalid properties. Refs #589. #595

Merged

ivanperez-keera added CR:Status:Verification Admin only: Change request that is currently being verified and removed CR:Status:Implementation Admin only: Change request that is currently being implemented labels Feb 25, 2025

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025

copilot-core: Document changes in CHANGELOG. Refs Copilot-Language#589.

3f26523

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025

copilot-theorem: Document changes in CHANGELOG. Refs Copilot-Language…

c3a9be4

…#589.

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025

copilot: Document changes in CHANGELOG. Refs Copilot-Language#589.

2aaeea1

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025

copilot-core: Document changes in CHANGELOG. Refs Copilot-Language#589.

a3d2310

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025

copilot-theorem: Document changes in CHANGELOG. Refs Copilot-Language…

134bbdc

…#589.

RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025

copilot: Document changes in CHANGELOG. Refs Copilot-Language#589.

9861071

ivanperez-keera closed this as completed in f39b9da Feb 28, 2025

ivanperez-keera added CR:Status:Closed Admin only: Change request that has been completed and removed CR:Status:Verification Admin only: Change request that is currently being verified labels Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`copilot-theorem`: Produce a counterexample when the What4 backend falsifies a property #589

`copilot-theorem`: Produce a counterexample when the What4 backend falsifies a property #589

RyanGlScott commented Jan 29, 2025 •

edited

Loading

ivanperez-keera commented Feb 13, 2025

ivanperez-keera commented Feb 13, 2025

ivanperez-keera commented Feb 13, 2025

RyanGlScott commented Feb 24, 2025

ivanperez-keera commented Feb 28, 2025

ivanperez-keera commented Feb 28, 2025

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

copilot-theorem: Produce a counterexample when the What4 backend falsifies a property #589

copilot-theorem: Produce a counterexample when the What4 backend falsifies a property #589

Comments

RyanGlScott commented Jan 29, 2025 • edited Loading

ivanperez-keera commented Feb 13, 2025

ivanperez-keera commented Feb 13, 2025

ivanperez-keera commented Feb 13, 2025

RyanGlScott commented Feb 24, 2025

ivanperez-keera commented Feb 28, 2025

ivanperez-keera commented Feb 28, 2025

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

`copilot-theorem`: Produce a counterexample when the What4 backend falsifies a property #589

`copilot-theorem`: Produce a counterexample when the What4 backend falsifies a property #589

RyanGlScott commented Jan 29, 2025 •

edited

Loading