-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copilot-theorem
: Produce a counterexample when the What4 backend falsifies a property
#589
Labels
CR:Status:Closed
Admin only: Change request that has been completed
CR:Type:Feature
Admin only: Change request pertaining to new features requested
Milestone
Comments
Change Manager: Confirmed that the issue exists. |
Technical Lead: Confirmed that the issue should be addressed. |
Technical Lead: Issue scheduled for fixing in Copilot 4.3. Fix assigned to: @RyanGlScott . |
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
Lacking a `Show` instance makes it difficult to display type information in panic messages. This commit derives a basic `Show` instance so that `copilot-theorem` can display `Type`s whenever an internal invariant is violated.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
…Copilot-Language#589. Previously, the `XEmptyArray` and `XArray` data constructors did not record evidence that their array element types were instances of the `Typed` class, which made it impossible to use in contexts where `Typed` is required. This commit adds the necessary constraints to each data constructor to make their array elements `Typed`.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
Copilot-Language#589. Previously, the `valFromExpr` function was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging. This is an internal-only refactoring that should not have any changes in user-facing behavior.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
…t-Language#589. `CounterExample` was not exported from `Copilot.Theorem.What4`, and moreover, its current definition was insufficient to capture all of the information that would be desirable from a counterexample. This commit removes `CounterExample` in preparation for a follow-up commit that adds an improved version of `CounterExample`. As `CounterExample` was not exported, this change is purely an internal refactoring.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
…nguage#589. This is an internal refactoring where the body of the `prove` function is split out into a separate `proveInternal` function. In a subsequent commit, we will use `proveInternal` to implement a variant of `prove` that returns more information.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
…opilot-Language#589. Previously, the `CopilotValue` data type lacked a `Show` or `ShowF` instances, which made it impractical to display them. This commit adds a `Show` instance so that a value of type `CopilotValue a` can be shown, and it also adds a `ShowF` instance so that a value of type `Some CopilotValue` can be shown.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `prove'` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` where the `Invalid` equivalent encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 14, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To define a companion function to `prove` that also returns counterexample information upon a failed proof, it is convenient to be able to display `Type` information in panic messages. This commit derives a basic `Show` instance for `Type` so that `copilot-theorem` can display them whenever an internal invariant is violated.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
…Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To update the `valFromExpr` function in order to produce concrete array values for counterexample purposes, we need to call the `Array` data constructor, which has a `Typed` constraint. However, the `XEmptyArray` and `XArray` data constructors do not record evidence that their array element types were instances of the `Typed` class, which makes it impossible to use them in `valFromExpr`. This commit adds the necessary constraints to each data constructor to make their array elements `Typed`.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 24, 2025
Implementor: Solution implemented, review requested. |
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 25, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
RyanGlScott
added a commit
to GaloisInc/copilot-1
that referenced
this issue
Feb 27, 2025
Change Manager: Verified that:
|
Change Manager: Implementation ready to be merged. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CR:Status:Closed
Admin only: Change request that has been completed
CR:Type:Feature
Admin only: Change request pertaining to new features requested
Description
Currently, the
Copilot.Theorem.What4.prove
function returns a list of results, where each result contains aSatResult
that describes whether a property isValid
,Invalid
, orUnknown
. TheInvalid
result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results ofprove
mean.It would be helpful if
Copilot.Theorem.What4
could offer an API to prove or disprove a property such that disproven properties come with a concrete counterexample. This counterexample information could then be interpreted by users.Type
copilot-theorem
.Additional context
None.
Requester
Method to check presence of bug
Not applicable (not a bug).
Expected result
Introduce a new function to
Copilot.Theorem.What4
that mirrors the type signature ofprove
, except that it returns a variant ofSatResult
where theInvalid
equivalent encodes counterexample information.copilot-theorem
users can then interpret the results of the counterexample in Copilot specifications.Desired result
Introduce a new function to
Copilot.Theorem.What4
that mirrors the type signature ofprove
, except that it returns a variant ofSatResult
where theInvalid
equivalent encodes counterexample information.copilot-theorem
users can then interpret the results of the counterexample in Copilot specifications.Proposed solution
Introduce a new
prove' :: Solver -> Spec -> IO [(Name, SatResult' CounterExample)]
function (names subject to change during review), whereSatResult'
is defined to be:And
CounterExample
records enough information about a concrete counterexample such that a Copilot user could display it.Further notes
None.
The text was updated successfully, but these errors were encountered: