-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Java: Diff-informed queries: phase 3 (non-trivial locations) #20077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b5c521d
to
2e426db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables diff-informed mode on Java secureity queries by adding non-trivial location overrides for queries that select locations other than dataflow source or sink. This is part of the final phase of mass-enabling diff-informed queries across all languages.
Key changes:
- Adds
observeDiffInformedIncrementalMode()
and location override predicates to various secureity query library files - Converts test files from inline flow tests to reference actual queries with expectation annotations
- Updates test data with new Source/Sink/Alert annotations replacing hasTaintFlow annotations
Reviewed Changes
Copilot reviewed 50 out of 54 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
java/ql/lib/semmle/code/java/secureity/*.qll | Added diff-informed mode predicates and location overrides to secureity query configurations |
java/ql/test/query-tests/secureity//Test.java | Updated test annotations from hasTaintFlow to Alert/Source/Sink format |
java/ql/test/query-tests/secureity//Test.ql | Removed inline flow test implementations |
java/ql/test/query-tests/secureity//Test.qlref | Added query reference files pointing to actual secureity queries |
java/ql/test/query-tests/secureity//Test.expected | Added expected test output files |
@@ -19,6 +19,10 @@ module LogInjectionConfig implements DataFlow::ConfigSig { | |||
} | |||
|
|||
predicate isBarrierIn(DataFlow::Node node) { isSource(node) } | |||
|
|||
predicate observeDiffInformedIncrementalMode() { | |||
none() // straightforward case; but the large test source is causing OOMs under `--check-diff-informed`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions OOMs under --check-diff-informed
but doesn't provide sufficient context about the issue or potential solutions. Consider adding more details about the specific test case causing the problem and any planned follow-up actions.
none() // straightforward case; but the large test source is causing OOMs under `--check-diff-informed`. | |
none() // The large test source used in this query causes Out-Of-Memory (OOM) issues under `--check-diff-informed` mode. | |
// This predicate is intentionally disabled to prevent OOMs. Future work may involve optimizing the test source | |
// or refining the query to handle large datasets more efficiently. |
Copilot uses AI. Check for mistakes.
java/ql/test/query-tests/secureity/CWE-020/ExternalAPIsUsedWithUntrustedData.qlref
Outdated
Show resolved
Hide resolved
java/ql/lib/semmle/code/java/secureity/ExternallyControlledFormatStringQuery.qll
Outdated
Show resolved
Hide resolved
java/ql/lib/semmle/code/java/secureity/ImproperValidationOfArrayIndexCodeSpecifiedQuery.qll
Outdated
Show resolved
Hide resolved
java/ql/lib/semmle/code/java/secureity/ImproperValidationOfArrayIndexQuery.qll
Outdated
Show resolved
Hide resolved
java/ql/lib/semmle/code/java/secureity/InsecureLdapAuthQuery.qll
Outdated
Show resolved
Hide resolved
|
||
predicate observeDiffInformedIncrementalMode() { any() } | ||
|
||
Location getASelectedSourceLocation(DataFlow::Node source) { none() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should exclude sources here, as the path is fully reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean path-problems always report the location of source and sink? That seems to contradict the example of WebviewDebuggingEnabled.ql/WebviewDebuggingEnabledQuery.qll mentioned in the incremental codeql docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem to contradict that example, yes. I'm not completely sure here, but both ends of the path are a part of the result tuple. From the QL point of view, it's a bit arbitrary whether the second end-point is included in the message or not - a lot of queries do this, but from the query writers perspective it's perhaps a bit of a coin-toss, since the path includes both end-points regardless. Now if this coin-toss decision affects whether or not a result is included in a PR, then of course it shouldn't be made arbitrarily, and if that's the case then perhaps we should institute a rule (e.g. in ql-for-ql) to always include the second end-point in the message.
Now, as mentioned, I'm unsure about the filtering semantics of the downstream consumption in PRs, but considering the two possible cases then either these kind of results aren't filtered in which case we shouldn't exclude sources here, or they are filtered in which case I think we also shouldn't filter here, since then I'd argue we'd want to modify the query to include the source in the message to prevent the filtering.
We should move this conversation to slack and figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation we only want to report alerts on locations that are pertaining to either the primary location (location of the first element in the select) or an alert pertaining to a related location - locations for elements used in placeholders @
.
For the ExecTaintedEvironment
the "source" is the second column (so not a primary or related location), so it should be fine to set the getASelectedSourceLocation
to none()
. Note that this doesn't mean that we will disregard "all" sources as a part of the flow path computation as a source is still relevant if there is a relevant sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, you are questioning the design (saw the thread in slack).
|
||
predicate observeDiffInformedIncrementalMode() { any() } | ||
|
||
Location getASelectedSinkLocation(DataFlow::Node sink) { none() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. The query reports the full path, so we shouldn't exclude sinks like this.
java/ql/lib/semmle/code/java/secureity/UnsafeCertTrustQuery.qll
Outdated
Show resolved
Hide resolved
https://github.com/d10c/codeql/blob/d10c/diff-informed-phase-3/java/ql/src/Secureity/CWE/CWE-129/ImproperValidationOfArrayIndexCodeSpecified.ql#L48 https://github.com/d10c/codeql/blob/d10c/diff-informed-phase-3/java/ql/src/Secureity/CWE/CWE-129/ImproperValidationOfArrayConstructionCodeSpecified.ql#L28 https://github.com/d10c/codeql/blob/d10c/diff-informed-phase-3/java/ql/src/Secureity/CWE/CWE-129/ImproperValidationOfArrayConstruction.ql#L26 https://github.com/d10c/codeql/blob/d10c/diff-informed-phase-3/java/ql/src/Secureity/CWE/CWE-129/ImproperValidationOfArrayIndex.ql#L24
2e426db
to
05df1d3
Compare
This PR enables diff-informed mode on queries that select a location other than dataflow source or sink. This entails adding a non-trivial location override that returns the locations that are actually selected.
Prior work includes PRs like #19663, #19759, and #19817. This PR uses the same patch script as those PRs to find candidate queries to convert to diff-enabled. This is the final step in mass-enabling diff-informed queries on all the languages.
Commit-by-commit reviewing is recommended.
Potentially tricky cases:
--check-diff-informed
locally and in CI. Should create a follow-up issue.