Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copilot-theorem: Produce a counterexample when the What4 backend falsifies a property #589

Closed
RyanGlScott opened this issue Jan 29, 2025 · 6 comments · Fixed by #595
Closed
Assignees
Labels
CR:Status:Closed Admin only: Change request that has been completed CR:Type:Feature Admin only: Change request pertaining to new features requested
Milestone

Comments

@RyanGlScott
Copy link
Collaborator

RyanGlScott commented Jan 29, 2025

Description

Currently, the Copilot.Theorem.What4.prove function returns a list of results, where each result contains a SatResult that describes whether a property is Valid, Invalid, or Unknown. The Invalid result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean.

It would be helpful if Copilot.Theorem.What4 could offer an API to prove or disprove a property such that disproven properties come with a concrete counterexample. This counterexample information could then be interpreted by users.

Type

  • Feature: Add counterexample capabilities to the What4 backend in copilot-theorem.

Additional context

None.

Requester

  • Ryan Scott (Galois).

Method to check presence of bug

Not applicable (not a bug).

Expected result

Introduce a new function to Copilot.Theorem.What4 that mirrors the type signature of prove, except that it returns a variant of SatResult where the Invalid equivalent encodes counterexample information. copilot-theorem users can then interpret the results of the counterexample in Copilot specifications.

Desired result

Introduce a new function to Copilot.Theorem.What4 that mirrors the type signature of prove, except that it returns a variant of SatResult where the Invalid equivalent encodes counterexample information. copilot-theorem users can then interpret the results of the counterexample in Copilot specifications.

Proposed solution

Introduce a new prove' :: Solver -> Spec -> IO [(Name, SatResult' CounterExample)] function (names subject to change during review), where SatResult' is defined to be:

data SatResult' = Valid' | Invalid' CounterExample | Unknown'

And CounterExample records enough information about a concrete counterexample such that a Copilot user could display it.

Further notes

None.

@ivanperez-keera ivanperez-keera added CR:Type:Feature Admin only: Change request pertaining to new features requested CR:Status:Initiated Admin only: Change request that has been initiated labels Feb 13, 2025
@ivanperez-keera
Copy link
Member

Change Manager: Confirmed that the issue exists.

@ivanperez-keera ivanperez-keera added CR:Status:Confirmed Admin only: Change request that has been acknowledged by the change manager and removed CR:Status:Initiated Admin only: Change request that has been initiated labels Feb 13, 2025
@ivanperez-keera
Copy link
Member

Technical Lead: Confirmed that the issue should be addressed.

@ivanperez-keera ivanperez-keera added CR:Status:Accepted Admin only: Change request accepted by technical lead and removed CR:Status:Confirmed Admin only: Change request that has been acknowledged by the change manager labels Feb 13, 2025
@ivanperez-keera
Copy link
Member

Technical Lead: Issue scheduled for fixing in Copilot 4.3.

Fix assigned to: @RyanGlScott .

@ivanperez-keera ivanperez-keera added CR:Status:Scheduled Admin only: Change requested scheduled and removed CR:Status:Accepted Admin only: Change request accepted by technical lead labels Feb 13, 2025
@ivanperez-keera ivanperez-keera added this to the 4.3 milestone Feb 13, 2025
@ivanperez-keera ivanperez-keera added CR:Status:Implementation Admin only: Change request that is currently being implemented and removed CR:Status:Scheduled Admin only: Change requested scheduled labels Feb 13, 2025
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
Lacking a `Show` instance makes it difficult to display type information in
panic messages.

This commit derives a basic `Show` instance so that `copilot-theorem` can
display `Type`s whenever an internal invariant is violated.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
…Copilot-Language#589.

Previously, the `XEmptyArray` and `XArray` data constructors did not record
evidence that their array element types were instances of the `Typed` class,
which made it impossible to use in contexts where `Typed` is required.

This commit adds the necessary constraints to each data constructor to make
their array elements `Typed`.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
Copilot-Language#589.

Previously, the `valFromExpr` function was lacking cases for `XEmptyArray` and
`XArray`, so it would fail if the function was called on these values.

This commit adds these missing cases, which make use of the `Typed` evidence
added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a
case for structs, which prove more challenging.

This is an internal-only refactoring that should not have any changes in
user-facing behavior.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
…t-Language#589.

`CounterExample` was not exported from `Copilot.Theorem.What4`, and moreover,
its current definition was insufficient to capture all of the information that
would be desirable from a counterexample.

This commit removes `CounterExample` in preparation for a follow-up commit that
adds an improved version of `CounterExample`. As `CounterExample` was not
exported, this change is purely an internal refactoring.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
…nguage#589.

This is an internal refactoring where the body of the `prove` function is split
out into a separate `proveInternal` function. In a subsequent commit, we will
use `proveInternal` to implement a variant of `prove` that returns more
information.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
…opilot-Language#589.

Previously, the `CopilotValue` data type lacked a `Show` or `ShowF` instances,
which made it impractical to display them.

This commit adds a `Show` instance so that a value of type `CopilotValue a` can
be shown, and it also adds a `ShowF` instance so that a value of type `Some
CopilotValue` can be shown.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 14, 2025
Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

This introduces a new `prove'` function to `Copilot.Theorem.What4` that mirrors
the type signature of `prove`, except that it returns a variant of `SatResult`
where the `Invalid` equivalent encodes counterexample information.
`copilot-theorem` users can then interpret the results of the counterexample in
Copilot specifications.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To define a companion function to `prove` that also returns counterexample
information upon a failed proof, it is convenient to be able to display `Type`
information in panic messages.

This commit derives a basic `Show` instance for `Type` so that
`copilot-theorem` can display them whenever an internal invariant is violated.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
…Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To update the `valFromExpr` function in order to produce concrete array values
for counterexample purposes, we need to call the `Array` data constructor,
which has a `Typed` constraint. However, the `XEmptyArray` and `XArray` data
constructors do not record evidence that their array element types were
instances of the `Typed` class, which makes it impossible to use them in
`valFromExpr`.

This commit adds the necessary constraints to each data constructor to make
their array elements `Typed`.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `valFromExpr` function (which produces concrete values when making a
counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would
fail if the function was called on these values.

This commit adds these missing cases, which make use of the `Typed` evidence
added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a
case for structs, which prove more challenging.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
…properties. Refs Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

This introduces a new `proveWithCounterExample` function to
`Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that
it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid`
equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem`
users can then interpret the results of the counterexample in Copilot
specifications.

As part of this commit, we change the definition of the `CounterExample` data
type. This is safe to do, as `CounterExample` was completely unused prior to
this commit, nor was it exported.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show`
and `ShowF` instances, which makes it impractical for users to display them.

This commit adds `Show` and `ShowF` instances for all three data types so that
they can be shown.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
…ilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

A prior commit has introduced a `proveWithCounterExample` function, which
provides a counterexample when a property is proven invalid.

This commit updates the test suite to ensure that basic uses of
`proveWithCounterExample` work as intended.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
…lot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To demonstrate how to effectively use the newly added `proveWithCounterExample`
function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs`
function that behaves like `examples/what4/Arithmetic.hs`, but using
`proveWithCounterExamples` instead of `prove`.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 24, 2025
@RyanGlScott
Copy link
Collaborator Author

Implementor: Solution implemented, review requested.

@ivanperez-keera ivanperez-keera added CR:Status:Verification Admin only: Change request that is currently being verified and removed CR:Status:Implementation Admin only: Change request that is currently being implemented labels Feb 25, 2025
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `valFromExpr` function (which produces concrete values when making a
counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would
fail if the function was called on these values.

This commit adds these missing cases, which make use of the `Typed` evidence
added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a
case for structs, which prove more challenging.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
…properties. Refs Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

This introduces a new `proveWithCounterExample` function to
`Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that
it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid`
equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem`
users can then interpret the results of the counterexample in Copilot
specifications.

As part of this commit, we change the definition of the `CounterExample` data
type. This is safe to do, as `CounterExample` was completely unused prior to
this commit, nor was it exported.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show`
and `ShowF` instances, which makes it impractical for users to display them.

This commit adds `Show` and `ShowF` instances for all three data types so that
they can be shown.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
…ilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

A prior commit has introduced a `proveWithCounterExample` function, which
provides a counterexample when a property is proven invalid.

This commit updates the test suite to ensure that basic uses of
`proveWithCounterExample` work as intended.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
…lot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To demonstrate how to effectively use the newly added `proveWithCounterExample`
function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs`
function that behaves like `examples/what4/Arithmetic.hs`, but using
`proveWithCounterExamples` instead of `prove`.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 25, 2025
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `valFromExpr` function (which produces concrete values when making a
counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would
fail if the function was called on these values.

This commit adds these missing cases, which make use of the `Typed` evidence
added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a
case for structs, which prove more challenging.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
…properties. Refs Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

This introduces a new `proveWithCounterExample` function to
`Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that
it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid`
equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem`
users can then interpret the results of the counterexample in Copilot
specifications.

As part of this commit, we change the definition of the `CounterExample` data
type. This is safe to do, as `CounterExample` was completely unused prior to
this commit, nor was it exported.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show`
and `ShowF` instances, which makes it impractical for users to display them.

This commit adds `Show` and `ShowF` instances for all three data types so that
they can be shown.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
…ilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

A prior commit has introduced a `proveWithCounterExample` function, which
provides a counterexample when a property is proven invalid.

This commit updates the test suite to ensure that basic uses of
`proveWithCounterExample` work as intended.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
…lot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To demonstrate how to effectively use the newly added `proveWithCounterExample`
function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs`
function that behaves like `examples/what4/Arithmetic.hs`, but using
`proveWithCounterExamples` instead of `prove`.
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
RyanGlScott added a commit to GaloisInc/copilot-1 that referenced this issue Feb 27, 2025
@ivanperez-keera
Copy link
Member

Change Manager: Verified that:

  • Solution is implemented:
    • The code proposed compiles and passes all tests. Details:
      Build log: https://github.com/Copilot-Language/copilot/runs/37925988037
    • The solution proposed produces the expected result. Details:
      The following Dockerfile runs an example included in the PR that demonstrates the new feature by printing the results of proving some combination of valid and invalid properties, for which counterexamples are produced when applicable. I have checked by visual inspection that the results are produced, and that the message is well formatted and informative enough:
      FROM ubuntu:focal
      
      ENV DEBIAN_FRONTEND=noninteractive
      RUN apt-get update
      
      RUN apt-get install --yes \
            libz-dev \
            git \
            curl \
            gcc \
            g++ \
            make \
            libgmp3-dev  \
            pkg-config \
            z3
      
      RUN mkdir -p $HOME/.ghcup/bin
      RUN curl https://downloads.haskell.org/~ghcup/0.1.19.2/x86_64-linux-ghcup-0.1.19.2 -o $HOME/.ghcup/bin/ghcup
      RUN chmod a+x $HOME/.ghcup/bin/ghcup
      ENV PATH=$PATH:/root/.ghcup/bin/
      ENV PATH=$PATH:/root/.cabal/bin/
      
      SHELL ["/bin/bash", "-c"]
      
      RUN ghcup install ghc 9.10
      RUN ghcup install cabal 3.2
      RUN ghcup set ghc 9.10
      RUN cabal update
      
      SHELL ["/bin/bash", "-c"]
      CMD git clone $REPO && cd $NAME && git checkout $COMMIT && cd .. \
        && cabal v1-sandbox init \
        && cabal v1-install alex happy --constraint='happy <= 2' \
        && cabal v1-install $NAME/copilot**/ \
        && cabal v1-exec -- runhaskell $NAME/copilot/examples/what4/ArithmeticCounterExamples.hs \
        && echo "Success"
      Command (substitute variables based on new path after merge):
      $ docker run -e "REPO=https://github.com/GaloisInc/copilot-1" -e "NAME=copilot-1" -e "COMMIT=9861071718850fb8ef85e533a6c0d6d083f69094" -it copilot-verify-589
      
  • Implementation is documented. Details:
    All new top-level definitions include haddock documentations, as well as the internals of the code and the examples.
  • Change history is clear.
  • Commit messages are clear.
  • Changelogs are updated.
  • Examples are updated. Details:
    A new example is introduced to demonstrate the feature.
  • Required version bumps are evaluated. Details:
    Bump required; the new feature changes the types of some value constructors in copilot-theorem, thus affecting the public API.

@ivanperez-keera
Copy link
Member

Change Manager: Implementation ready to be merged.

@ivanperez-keera ivanperez-keera added CR:Status:Closed Admin only: Change request that has been completed and removed CR:Status:Verification Admin only: Change request that is currently being verified labels Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CR:Status:Closed Admin only: Change request that has been completed CR:Type:Feature Admin only: Change request pertaining to new features requested
2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy