-
Notifications
You must be signed in to change notification settings - Fork 27
Store layer-specific attachments #8598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis update introduces a system for managing dataset layer attachments, including schema changes, data models, and logic to scan, store, and serialize attachment metadata. It enables explicit listing and handling of attachment files within datasets. Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
conf/evolutions/131-special-files.sql (2)
5-10
: Adddataset_layer_special_files
table to store special file metadata.
The table captures_dataset
,layerName
,path
, andtype
. To prevent duplicate entries and improve query performance, consider adding a composite primary key or unique constraint, for example on (_dataset
,layerName
,path
), and indexing fields frequently queried (e.g.,type
).CREATE TABLE webknossos.dataset_layer_special_files( - _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL, - layerName TEXT NOT NULL, - path TEXT NOT NULL, - type TEXT NOT NULL -); + _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL, + layerName TEXT NOT NULL, + path TEXT NOT NULL, + type TEXT NOT NULL, + PRIMARY KEY (_dataset, layerName, path) +); # Optionally, add an index for faster lookups by type +CREATE INDEX ON webknossos.dataset_layer_special_files(type);
3-4
: Validate schema version before migration.
Using anASSERT
guard is good, but consider transforming it into aRAISE EXCEPTION
for clearer failure semantics and consistent transaction rollback behavior.tools/postgres/schema.sql (1)
172-178
: Definedataset_layer_special_files
table in core schema.
Including the new table here is essential for fresh installs. To enforce data integrity and speed up lookups, add a composite primary key and consider indexingtype
.CREATE TABLE webknossos.dataset_layer_special_files( - _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL, - layerName TEXT NOT NULL, - path TEXT NOT NULL, - type TEXT NOT NULL -); + _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL, + layerName TEXT NOT NULL, + path TEXT NOT NULL, + type TEXT NOT NULL, + PRIMARY KEY (_dataset, layerName, path) +); +CREATE INDEX ON webknossos.dataset_layer_special_files(type);webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
336-337
: Prefer named parameter incopy
for robustness
dataSourceWithSpecialFiles.copy(id)
relies on the current parameter order of theDataSource
case-class.
If the constructor order ever changes, the wrong field may be overwritten without a compiler error.- dataSourceWithSpecialFiles.copy(id) + dataSourceWithSpecialFiles.copy(id = id)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
278-309
:withSpecialFiles
duplicates per-format logic and misses deduplication edge-cases
- The large
match
block must be manually kept in sync with every new*DataLayer
subtype ➜ high maintenance burden.mergeSpecialFiles
prevents duplicates only across existing vs new files, but duplicates withinnewSpecialFiles
remain.Refactor suggestion (sketch):
trait SpecialFilesMixin { self: Product => def withSpecialFiles(files: Seq[SpecialFile]): self.type } def withSpecialFiles(newFiles: List[SpecialFile]): DataLayer = { if (newFiles.isEmpty) return this val dedupedNew = newFiles.distinctBy(_.source.toString) this match { case sf: SpecialFilesMixin => sf.withSpecialFiles(dedupedNew).asInstanceOf[DataLayer] case _ => this } }This removes the exhaustive
match
, automatically supports future layer types, and performs full de-duplication withdistinctBy
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
MIGRATIONS.unreleased.md
(1 hunks)app/models/dataset/Dataset.scala
(3 hunks)app/models/dataset/DatasetService.scala
(1 hunks)conf/evolutions/131-special-files.sql
(1 hunks)conf/evolutions/reversions/131-special-files.sql
(1 hunks)tools/postgres/schema.sql
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala (2)
SpecialFile
(20-23)SpecialFile
(25-43)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)
specialFiles
(108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
additionalAxes
(106-106)specialFiles
(108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
additionalAxes
(106-106)specialFiles
(108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
additionalAxes
(106-106)specialFiles
(108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)
specialFiles
(108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
additionalAxes
(106-106)specialFiles
(108-108)
🔇 Additional comments (22)
MIGRATIONS.unreleased.md (1)
12-12
: LGTM: Migration for special files added.The addition of the new migration script for special files is correctly referenced in the unreleased migrations list.
app/models/dataset/DatasetService.scala (1)
39-39
: LGTM: New DatasetLayerSpecialFilesDAO dependency added.The service constructor has been updated to include the new DAO for special files, which is necessary for managing the persistence of special files associated with dataset layers.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)
87-87
: LGTM: Added specialFiles field to VolumeTracingLayer.The optional
specialFiles
field is correctly added to theVolumeTracingLayer
case class with a default value ofNone
. This is consistent with the data layer model extensions throughout the codebase to support special files.conf/evolutions/reversions/131-special-files.sql (1)
1-9
: LGTM: Reversion script follows best practices.The reversion script properly:
- Uses a transaction for atomicity
- Verifies the current schema version before proceeding
- Drops the
dataset_layer_special_files
table- Updates the schema version back to 130
- Commits the transaction
This provides a clean rollback path if needed.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
19-20
: LGTM: Added import for SpecialFile.The import statement is correctly added to support the new special files functionality.
108-109
: LGTM: Added specialFiles method implementation.The
specialFiles
method is correctly implemented with a default return value ofNone
. This maintains consistency with the special files functionality being added throughout the system.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (2)
38-39
: Ensure backward compatibility with default None.
The newspecialFiles: Option[Seq[SpecialFile]] = None
field is optional and defaults toNone
, preserving compatibility with existing JSON schemas and clients that do not expect this field. TheJson.format
macro will automatically include it when present.
57-58
: AddspecialFiles
to segmentation layer model.
Mirroring the data layer,N5SegmentationLayer
now includesspecialFiles: Option[Seq[SpecialFile]] = None
. The defaultNone
ensures old payloads remain valid and the JSON formatter seamlessly handles the new field.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2)
47-48
: SupportspecialFiles
inWKWDataLayer
.
AppendingadditionalAxes
andspecialFiles
(defaulting toNone
) maintains backward compatibility for JSON serialization and downstream consumers that are unaware of special files.
65-66
: EnablespecialFiles
onWKWSegmentationLayer
.
AddingspecialFiles: Option[Seq[SpecialFile]] = None
aligns segmentation layers with data layers and ensures JSON macros include this metadata only when detected.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2)
38-38
: IntegratespecialFiles
intoZarrDataLayer
.
With the defaultNone
, this addition preserves existing client behavior and is automatically picked up by theJson.format
macro.
58-58
: AddspecialFiles
toZarrSegmentationLayer
.
Defaulting toNone
ensures that JSON payloads without this field continue to parse correctly, while new payloads can carry the special files metadata.conf/evolutions/131-special-files.sql (1)
12-14
: Advance schema version to 131.
Updating thereleaseInformation
record and committing finalizes the migration. Ensure that the corresponding reversion script reverts the version to 130.tools/postgres/schema.sql (1)
24-25
: Bootstrap schema version to 131.
This initialINSERT
sets the schema to v131. Verify that the migration scripts and bootstrapping logic do not attempt to insert the same version twice, which could lead to duplicate-key or version mismatches.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2)
39-40
: Appropriate addition of specialFiles field.The new
specialFiles
parameter correctly follows the same pattern asadditionalAxes
, maintaining consistency with the rest of the codebase.
58-59
: Appropriate addition of specialFiles field.The new
specialFiles
parameter correctly follows the same pattern asadditionalAxes
, maintaining consistency with the rest of the codebase.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (2)
38-39
: Appropriate addition of specialFiles field.The new
specialFiles
parameter correctly follows the same pattern asadditionalAxes
, maintaining consistency with the rest of the codebase.
57-58
: Appropriate addition of specialFiles field.The new
specialFiles
parameter correctly follows the same pattern asadditionalAxes
, maintaining consistency with the rest of the codebase.webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala (1)
1-44
: Well-structured implementation of special file model.The new SpecialFile model is well-designed with proper enumeration types, case class structure, and JSON serialization support. The companion object appropriately encapsulates file extensions and directory names while exposing them through the
types
method.A few observations:
- All special file types currently use the same file extension (
hdf5
)- The URI scheme documentation correctly indicates
file://
for local files- The model structure aligns well with the PR objectives for special file detection and storage
app/models/dataset/Dataset.scala (3)
825-829
: Appropriate constructor parameter addition.The addition of
datasetLayerSpecialFilesDAO
to the constructor follows the dependency injection pattern used throughout the codebase.
946-946
: Correct integration of special files update.The new call to
datasetLayerSpecialFilesDAO.updateSpecialFiles
is appropriately placed alongside other layer property updates, ensuring consistent transaction handling.
987-1005
: Well-implemented DAO for special files.The
DatasetLayerSpecialFilesDAO
implementation follows the same patterns as other DAO classes in the codebase. TheupdateSpecialFiles
method properly handles clearing existing entries and inserting new ones within a transaction.The implementation correctly:
- Deletes existing special files for the dataset
- Inserts new entries for each special file in the provided data layers
- Uses the
replaceSequentiallyAsTransaction
helper for proper transaction handling
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Outdated
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Outdated
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
conf/evolutions/reversions/133-special-files.sql (1)
1-3
: Enhance schema version assertion for clarity
While theASSERT
ensures the schema version is 133, using an explicit conditional withRAISE EXCEPTION
can produce more informative error messages (including the actual current version), improving debuggability if the check fails.Example diff:
-do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 133, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql; +DO $$ +DECLARE + current_version INTEGER := (SELECT schemaVersion FROM webknossos.releaseInformation); +BEGIN + IF current_version <> 133 THEN + RAISE EXCEPTION 'Schema version mismatch: expected 133, found %', current_version; + END IF; +END +$$ LANGUAGE plpgsql;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
MIGRATIONS.unreleased.md
(1 hunks)conf/evolutions/133-special-files.sql
(1 hunks)conf/evolutions/reversions/133-special-files.sql
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- conf/evolutions/133-special-files.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- MIGRATIONS.unreleased.md
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (2)
conf/evolutions/reversions/133-special-files.sql (2)
9-9
: Transaction handling looks correct
UsingSTART TRANSACTION;
andCOMMIT TRANSACTION;
properly scopes the revert.
5-7
:⚠️ Potential issueComplete the rollback by dropping associated sequences and indexes
Dropping just the table leaves behind theserial
‐backing sequence and any indexes created in the forward migration. To avoid leaving orphaned objects, explicitly drop them and considerCASCADE
for dependent objects.Suggested diff:
-DROP TABLE IF EXISTS webknossos.dataset_layer_special_files; +DROP TABLE IF EXISTS webknossos.dataset_layer_special_files CASCADE; +-- Remove the sequence created for the serial primary key +DROP SEQUENCE IF EXISTS webknossos.dataset_layer_special_files_id_seq; +-- If any custom indexes were added in the forward migration, drop them here: +-- DROP INDEX IF EXISTS webknossos.idx_dataset_layer_special_files_<index_name>;Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this! It does indeed raise some important questions.
Yes, I think we do want the special files also in the datasource-properties.json, compare the discussion in #8567
However, I think we might not want them grouped into a “special files” key there, I think norman’s suggestion works better. Also we need to take care of the paths (should be possible to have them relative,absolute,remote…)
On top of this, the hdf5 format will soon no longer be the only option for those files, as several of them are currently rewritten as zarr. I’m not sure how to represent that here, maybe you have a good idea.
I also don’t think that the datasource-properties.jsons should be automatically rewritten all the time. In fact, this may fail for some datastores that don’t have write access on their own filesystems.
As a general rule, whenever we rewrite the json file, we should back up its old contents, there should already be code for that.
@normanrz What do you think, should WK at all try to detect these files and then propagate that into the DB/json? Or should the DB just mirror what is already in the json? We could possibly write a migration script for all existing datasource-properties.jsons to add explicit links to these files. It could then become user (vx) responsibility to list these files in the json if they want them to be accessible in WK.
tools/postgres/schema.sql
Outdated
_dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL, | ||
layerName TEXT NOT NULL, | ||
path TEXT NOT NULL, | ||
type TEXT NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe type should be an enum?
I think WK should detect the special files and store them in the DB, without rewriting the |
Fair enough. I think it still makes sense to adapt our case classes (and possibly db schema) to match the schema discussed in #8567 |
Maybe it even makes sense to already do #8567 as part of this PR? Because if the fields are available in the json the datastore doesn't need to autodetect anymore. |
If I understand it correctly in that case autodetection would only happen once and then never again, since the fields already exist and further scanning is stopped even if new files were to be added. Is this intended? |
As I understand, yes. Users would have to edit the json if they want new files to be registered. Right @normanrz ? |
In my earlier comment
I mentioned that the json wouldn't be rewritten. So, I guess the autodetection would need to be done every time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
conf/evolutions/reversions/134-special-files.sql (2)
5-7
: ConsiderCASCADE
for type drops or verify no dependencies
DROP TABLE IF EXISTS
followed byDROP TYPE IF EXISTS
is correctly ordered, but if any dependent objects still reference these enums, the type drops will fail. If you intend to remove all dependents, appendCASCADE
; otherwise, validate that no residual dependencies exist before running this script.Suggested diff:
-DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_TYPE; -DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_DATAFORMAT; +DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_TYPE CASCADE; +DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_DATAFORMAT CASCADE;
9-9
: Safeguard the version update with aWHERE
clause
To avoid accidentally updating the schema version if it has already changed, narrow the update to only the expected row:-UPDATE webknossos.releaseInformation SET schemaVersion = 133; +UPDATE webknossos.releaseInformation +SET schemaVersion = 133 +WHERE schemaVersion = 134;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
MIGRATIONS.unreleased.md
(1 hunks)app/models/dataset/Dataset.scala
(4 hunks)conf/evolutions/134-special-files.sql
(1 hunks)conf/evolutions/reversions/134-special-files.sql
(1 hunks)tools/postgres/schema.sql
(3 hunks)util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala
- conf/evolutions/134-special-files.sql
🚧 Files skipped from review as they are similar to previous changes (13)
- MIGRATIONS.unreleased.md
- webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala
- webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala
- tools/postgres/schema.sql
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala
- app/models/dataset/Dataset.scala
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (2)
conf/evolutions/reversions/134-special-files.sql (2)
1-2
: Ensure atomic reversion with transaction
UsingSTART TRANSACTION;
at the top is good—it guarantees the reversion is all-or-nothing.
11-11
: Finalize the transaction
COMMIT TRANSACTION;
cleanly completes the atomic reversion.
Instead of "special files" what about calling them "attachments" or "attached files"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tools/postgres/schema.sql (1)
165-173
: LGTM: Well-designed schema with proper enum typesThe new enum types and attachment table are well-structured. The implementation addresses the previous review feedback about using enum types for the
type
field, providing better type safety and data integrity. The enum values appropriately cover the expected attachment file types and data formats.
🧹 Nitpick comments (3)
conf/evolutions/134-dataset-layer-attachments.sql (1)
1-18
: LGTM! Well-structured database migration with proper safety measures.The migration script follows best practices with transaction management, schema version assertion, and atomic operations. The ENUM values appear complete based on the PR context.
Consider adding explicit constraints or documentation about the intended uniqueness semantics for the
dataset_layer_attachments
table. Currently, the table allows duplicate entries for the same dataset/layer/type combination, which may or may not be intentional.-- Consider adding a unique constraint if duplicates should not be allowed: -- ALTER TABLE webknossos.dataset_layer_attachments -- ADD CONSTRAINT unique_dataset_layer_attachment -- UNIQUE (_dataset, layerName, type, path);webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
330-368
: Well-implemented attachment merging logic.The
withAttachments
method provides sophisticated merging behavior:Strengths:
- Handles merging of existing attachments with new ones intelligently
- Deduplicates list-based attachments (meshes, agglomerates, connectomes)
- Prioritizes new single-value attachments (segmentIndex, cumsum) appropriately
- Comprehensive pattern matching covers all concrete layer types
Minor consideration:
The merging logic always prioritizes new values forsegmentIndex
andcumsum
. Consider if this behavior aligns with business requirements - should there be validation or warnings when overwriting existing values?webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (1)
35-48
: Consider refactoring duplicate scanning logicAll scanning methods follow the same pattern with only minor variations. This duplication could lead to maintenance issues.
Consider extracting a common scanning method:
private def scanForFiles(layerDirectory: Path, directoryName: String, extension: String, selectFirst: Boolean = false): Box[Seq[AttachedFile]] = { val dir = layerDirectory.resolve(directoryName) if (Files.exists(dir)) { PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(extension)) match { case Full(paths) if paths.nonEmpty => val files = paths.map(path => AttachedFile(path.toUri, extension)) Full(if (selectFirst) files.take(1) else files) case _ => Full(Seq.empty) } } else { Full(Seq.empty) } }Then each scan method could be simplified to:
def scanForMeshFiles(layerDirectory: Path): Seq[AttachedFile] = scanForFiles(layerDirectory, directoryName, scanExtension).openOr(Seq.empty)Also applies to: 57-71, 80-94, 103-116, 125-139
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
MIGRATIONS.unreleased.md
(1 hunks)app/models/dataset/Dataset.scala
(4 hunks)app/models/dataset/DatasetService.scala
(1 hunks)conf/evolutions/134-dataset-layer-attachments.sql
(1 hunks)conf/evolutions/reversions/134-dataset-layer-attachments.sql
(1 hunks)tools/postgres/schema.sql
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- conf/evolutions/reversions/134-dataset-layer-attachments.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- MIGRATIONS.unreleased.md
🔇 Additional comments (30)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)
87-87
: LGTM! Clean addition of attachment support.The optional
attachments
parameter follows the established pattern and maintains backward compatibility with theNone
default value.app/models/dataset/DatasetService.scala (1)
39-39
: LGTM! Clean dependency injection addition.The addition of
datasetLayerAttachmentsDAO
parameter follows proper dependency injection patterns and enables the service to manage dataset attachments.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
20-20
: LGTM! Import addition supports the new attachment functionality.The import of
DatasetAttachments
is necessary for the newattachments
method override.
108-108
: LGTM! Consistent attachment support implementation.The
attachments
override returningNone
is appropriate forEditableMappingLayer
and follows the established pattern for adding attachment support across layer types.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (3)
38-39
: LGTM: Consistent attachment field addition.The addition of the optional
attachments
field to both N5 layer types is well-implemented:
- Uses
Option[DatasetAttachments]
for proper null safety- Defaults to
None
for backward compatibility- Maintains consistent parameter positioning across layer types
Also applies to: 57-58
38-39
: LGTM: Consistent attachments field additionThe addition of the
attachments
field toN5DataLayer
follows the expected pattern with appropriate typing and default value.
57-58
: LGTM: Consistent attachments field additionThe addition of the
attachments
field toN5SegmentationLayer
matches the pattern used in the data layer counterpart.webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (7)
231-231
: LGTM: Core trait extension for attachments.Adding the
attachments
field to theDataLayerLike
trait is the correct approach to ensure all layer implementations can support attachments uniformly.
525-525
: LGTM: Consistent abstract layer updates.The updates to
AbstractDataLayer
andAbstractSegmentationLayer
maintain consistency:
- Added
attachments
field with proper default- Updated companion object
from
methods to preserve attachments during conversion- Maintains parameter ordering and type safety
Also applies to: 545-545, 567-567, 589-589
231-231
: LGTM: Attachments field addition to traitThe addition of the
attachments
method to theDataLayerLike
trait is consistent with the overall design pattern.
525-525
: LGTM: AbstractDataLayer attachments fieldThe addition of the
attachments
field toAbstractDataLayer
follows the established pattern.
545-545
: LGTM: Updated companion object methodThe
from
method correctly copies theattachments
field from the source layer.
567-567
: LGTM: AbstractSegmentationLayer attachments fieldThe addition of the
attachments
field toAbstractSegmentationLayer
is consistent with the pattern.
589-589
: LGTM: Updated companion object methodThe
from
method correctly copies theattachments
field from the source layer.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (3)
48-48
: LGTM: Consistent WKW layer attachment support.The addition of the
attachments
field to WKW layer types follows the established pattern:
- Consistent with other layer type implementations
- Proper Optional typing and default value
- Maintains backward compatibility and clean parameter alignment
Also applies to: 66-66
48-48
: LGTM: Consistent attachments field additionThe addition of the
attachments
field toWKWDataLayer
follows the expected pattern with appropriate typing and default value.
66-66
: LGTM: Consistent attachments field additionThe addition of the
attachments
field toWKWSegmentationLayer
is consistent with the pattern used across other layer types.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (3)
39-39
: LGTM: Completes consistent attachment support across all layer types.The addition of the
attachments
field to Precomputed layer types maintains the established pattern across all data layer implementations:
- Consistent API design across N5, WKW, Zarr, and Precomputed formats
- Proper Optional typing ensures backward compatibility
- Clean integration with existing parameter structure
This completes the systematic addition of attachment support across all layer types in the codebase.
Also applies to: 58-58
39-39
: LGTM: Consistent attachments field additionThe addition of the
attachments
field toPrecomputedDataLayer
follows the established pattern correctly.
58-58
: LGTM: Consistent attachments field additionThe addition of the
attachments
field toPrecomputedSegmentationLayer
completes the consistent pattern across all layer types.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2)
29-41
: LGTM: Clean addition of attachments supportThe addition of the optional
attachments
parameter with a default value ofNone
maintains backward compatibility while enabling the new attachment functionality. The implementation follows consistent patterns with other data layer classes.
47-60
: LGTM: Consistent implementation across layer typesThe
attachments
parameter addition mirrors theZarr3DataLayer
implementation, ensuring consistency across both data layer and segmentation layer types. Good adherence to the established pattern.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2)
27-40
: LGTM: Improved formatting and attachments supportThe reformatted parameter list significantly improves readability with proper alignment, and the addition of the
attachments
parameter maintains the consistent pattern established across other data layer classes. The default value ofNone
ensures backward compatibility.
46-60
: LGTM: Consistent implementation with improved readabilityThe parameter list formatting and
attachments
field addition follow the same excellent pattern asZarrDataLayer
, ensuring consistency across both layer types while significantly improving code readability.tools/postgres/schema.sql (1)
24-24
: LGTM: Appropriate schema version bumpThe schema version update to 134 correctly reflects the addition of the new attachment-related database objects and follows proper migration versioning practices.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
327-332
: LGTM: Clean integration of attachment scanningThe modification elegantly integrates attachment scanning into the existing data source loading flow without disrupting the established logic. The implementation maintains backward compatibility while adding the new functionality seamlessly.
344-358
: LGTM: Well-structured attachment scanning implementationThe
scanForAttachedFiles
method demonstrates excellent separation of concerns by using dedicated scanning functions for each file type. The functional approach withwithAttachments
maintains immutability while cleanly integrating discovered attachments with existing data layers.app/models/dataset/Dataset.scala (1)
1019-1069
: LGTM! Well-structured DAO implementationThe
DatasetLayerAttachmentsDAO
implementation correctly handles attachment updates with proper transaction semantics and type safety. The use of companion object constants for types ensures consistency with the scanning logic.webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (2)
32-33
: Hardcoded file extensions may be too restrictiveThe scanning logic only looks for specific extensions (mostly "hdf5"), but the database schema supports multiple formats ('hdf5', 'zarr3', 'json'). This might prevent detection of valid attachment files in other formats.
Should the scanning logic support multiple file formats? For example:
- Meshes: hdf5, zarr3
- Agglomerates: hdf5, zarr3
- Segment indices: hdf5, zarr3
- Connectomes: hdf5, zarr3
- Cumsum: json
If so, consider updating the scanning logic to check for multiple extensions.
Also applies to: 54-55, 77-78, 100-101, 122-123
121-121
:❓ Verification inconclusive
Incorrect directory name for cumsum files
CumsumFileInfo
uses the same directory name asAgglomerateFileInfo
, which appears to be incorrect. Cumsum files and agglomerate files should likely be in separate directories.Please verify the correct directory name for cumsum files. If they should indeed share the same directory as agglomerates, consider adding a comment explaining this design decision. Otherwise, update to use a distinct directory name.
🏁 Script executed:
#!/bin/bash set -e # Locate CumsumFileInfo and AgglomerateFileInfo definitions rg "CumsumFileInfo" -n rg "AgglomerateFileInfo" -n # Show directoryName assignments in DatasetAttachments.scala rg "val directoryName" -n webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scalaLength of output: 1393
🏁 Script executed:
#!/bin/bash set -e sed -n '1,200p' webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scalaLength of output: 4371
Verify directory name for CumsumFileInfo
CumsumFileInfo.directoryName is currently set to
"agglomerates"
, duplicating the AgglomerateFileInfo folder.Please confirm whether cumsum JSON files are indeed stored alongside the agglomerates HDF5 files. If they belong in their own folder (e.g.
"cumsum"
), update the directoryName; otherwise, add a comment explaining why both file types share the same directory.• File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
– Line 118:val directoryName = "agglomerates"
for CumsumFileInfo
app/models/dataset/Dataset.scala
Outdated
@@ -970,6 +975,7 @@ class DatasetLayerDAO @Inject()( | |||
_ <- datasetMagsDAO.updateMags(datasetId, source.toUsable.map(_.dataLayers)) | |||
_ <- datasetCoordinateTransformationsDAO.updateCoordinateTransformations(datasetId, | |||
source.toUsable.map(_.dataLayers)) | |||
- <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix syntax error in for-comprehension
There's a typo in the for-comprehension binding.
Apply this diff to fix the syntax error:
- - <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
+ _ <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers)) | |
_ <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers)) |
🤖 Prompt for AI Agents
In app/models/dataset/Dataset.scala at line 978, there is a syntax error caused
by an incorrect or incomplete for-comprehension binding. Remove the hyphen and
ensure the line is correctly formatted as part of the for-comprehension. The
line should be properly integrated into the for-comprehension syntax without
extraneous characters.
val dir = layerDirectory.resolve(directoryName) | ||
if (Files.exists(dir)) { | ||
val paths: Box[List[Path]] = | ||
PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(scanExtension)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Silent error handling may hide important issues
Using silent = true
in PathUtils.listFiles
suppresses all errors, which could hide permission issues, I/O errors, or other problems that should be logged or handled.
Consider logging errors instead of silently ignoring them:
val paths: Box[List[Path]] = PathUtils.listFiles(dir, silent = false, PathUtils.fileExtensionFilter(scanExtension))
paths match {
case Full(p) => p.map(path => AttachedFile(path.toUri, scanExtension))
case Failure(msg, _, _) =>
logger.warn(s"Failed to scan $directoryName directory: $msg")
Seq.empty
case _ => Seq.empty
}
Also applies to: 61-61, 84-84, 107-107, 129-129
🤖 Prompt for AI Agents
In
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
at line 39, the use of silent = true in PathUtils.listFiles suppresses errors,
potentially hiding important issues like permission or I/O errors. Change silent
to false to enable error logging and handling, and add a match statement to log
any failures, ensuring errors are visible and can be addressed appropriately.
...astore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
Outdated
Show resolved
Hide resolved
def withAttachments(attachments: DatasetAttachments): DataLayer = { | ||
def mergeAttachments(existingAttachmentsOpt: Option[DatasetAttachments], | ||
newAttachments: DatasetAttachments): Option[DatasetAttachments] = | ||
existingAttachmentsOpt match { | ||
case None => Some(newAttachments) | ||
case Some(existingFiles) => | ||
val segmentIndex = newAttachments.segmentIndex.orElse(existingFiles.segmentIndex) | ||
val connectome = (newAttachments.connectomes ++ existingFiles.connectomes).distinct | ||
val agglomerateFiles = | ||
(newAttachments.agglomerates ++ existingFiles.agglomerates).distinct | ||
val meshFiles = | ||
(newAttachments.meshes ++ existingFiles.meshes).distinct | ||
val cumsumFile = | ||
newAttachments.cumsum.orElse(existingFiles.cumsum) | ||
|
||
Some( | ||
DatasetAttachments( | ||
meshes = meshFiles, | ||
agglomerates = agglomerateFiles, | ||
segmentIndex = segmentIndex, | ||
connectomes = connectome, | ||
cumsum = cumsumFile | ||
)) | ||
} | ||
|
||
this match { | ||
case l: N5DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: N5SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: PrecomputedDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: PrecomputedSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: Zarr3DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: Zarr3SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: ZarrDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: ZarrSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: WKWDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case l: WKWSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | ||
case _ => this | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Review merging logic and pattern matching completeness
The withAttachments
method implementation looks solid overall, but please verify the following:
-
Merging Logic: The current logic prioritizes new attachments for
segmentIndex
andcumsum
(single-value fields) and deduplicates lists forconnectomes
,agglomerates
, andmeshes
. This seems reasonable. -
Pattern Matching Coverage: Ensure all current layer types are covered. The fallback
case _
returnsthis
unchanged, which may silently ignore attachment updates for new layer types.
Consider making the pattern matching exhaustive by removing the wildcard case and explicitly handling each layer type, or add logging when the fallback case is hit:
case _ =>
+ // Log warning about unsupported layer type
+ logger.warn(s"withAttachments not implemented for layer type: ${this.getClass.getSimpleName}")
this
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def withAttachments(attachments: DatasetAttachments): DataLayer = { | |
def mergeAttachments(existingAttachmentsOpt: Option[DatasetAttachments], | |
newAttachments: DatasetAttachments): Option[DatasetAttachments] = | |
existingAttachmentsOpt match { | |
case None => Some(newAttachments) | |
case Some(existingFiles) => | |
val segmentIndex = newAttachments.segmentIndex.orElse(existingFiles.segmentIndex) | |
val connectome = (newAttachments.connectomes ++ existingFiles.connectomes).distinct | |
val agglomerateFiles = | |
(newAttachments.agglomerates ++ existingFiles.agglomerates).distinct | |
val meshFiles = | |
(newAttachments.meshes ++ existingFiles.meshes).distinct | |
val cumsumFile = | |
newAttachments.cumsum.orElse(existingFiles.cumsum) | |
Some( | |
DatasetAttachments( | |
meshes = meshFiles, | |
agglomerates = agglomerateFiles, | |
segmentIndex = segmentIndex, | |
connectomes = connectome, | |
cumsum = cumsumFile | |
)) | |
} | |
this match { | |
case l: N5DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: N5SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: PrecomputedDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: PrecomputedSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: Zarr3DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: Zarr3SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: ZarrDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: ZarrSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: WKWDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case l: WKWSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case _ => this | |
} | |
} | |
case l: WKWSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments)) | |
case _ => | |
// Log warning about unsupported layer type | |
logger.warn(s"withAttachments not implemented for layer type: ${this.getClass.getSimpleName}") | |
this |
🤖 Prompt for AI Agents
In
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
lines 330 to 368, the pattern matching for different layer types includes a
fallback case that silently ignores unsupported types. To improve robustness,
either make the pattern matching exhaustive by explicitly listing all layer
types or add logging in the fallback case to alert when an unsupported layer
type is encountered. This ensures no layer types are unintentionally ignored
without notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (2)
27-39
: Address silent error handling as flagged in previous reviews.The use of
silent = true
suppresses all errors, potentially hiding important issues like permission problems or I/O errors that should be logged.Consider this improvement:
- val paths: Box[List[Path]] = - PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(scanExtension)) + val paths: Box[List[Path]] = + PathUtils.listFiles(dir, silent = false, PathUtils.fileExtensionFilter(scanExtension)) paths match { - case Full(p) => p.map(path => AttachedFile(path.toUri, scanExtension)) - case _ => Seq.empty + case Full(p) => p.map(path => AttachedFile(path.toUri, scanExtension)) + case Failure(msg, _, _) => + // Consider adding logging here + Seq.empty + case _ => Seq.empty }
41-55
: Address silent error handling in scanForSingleFile method.Same issue with silent error handling as in the previous method.
Apply the same fix:
- val paths: Box[List[Path]] = - PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(scanExtension)) + val paths: Box[List[Path]] = + PathUtils.listFiles(dir, silent = false, PathUtils.fileExtensionFilter(scanExtension))
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (1)
22-22
: Consider using a more descriptive field name for data format.The
dataFormat
field stores file extensions (like "hdf5", "json") rather than actual data formats. Consider renaming tofileExtension
orformat
for clarity.-case class AttachedFile(path: URI, dataFormat: String) +case class AttachedFile(path: URI, fileExtension: String)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
MIGRATIONS.unreleased.md
(1 hunks)app/models/dataset/Dataset.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(1 hunks)
🔇 Additional comments (24)
MIGRATIONS.unreleased.md (1)
16-16
: LGTM!The migration guide entry is properly formatted and correctly documents the new database schema evolution for dataset layer attachments.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (4)
38-38
: LGTM!The addition of the optional
attachments
parameter is well-implemented with proper default value and maintains backward compatibility. This change is consistent with the schema extension pattern applied across all data layer classes.
58-58
: LGTM!The
attachments
parameter addition follows the same pattern asZarrDataLayer
, ensuring consistency across segmentation layers.
38-38
: LGTM: Clean addition of optional attachments fieldThe addition of the
attachments
parameter is well-structured with proper positioning, appropriate optional typing, and backward compatibility through the default value.
58-58
: LGTM: Consistent implementation across Zarr layer typesThe attachment field addition follows the same pattern as
ZarrDataLayer
, maintaining consistency across the codebase.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (4)
38-39
: LGTM!The
attachments
parameter addition is properly implemented with backward compatibility and follows the consistent schema extension pattern used across all data layer classes.
57-58
: LGTM!The segmentation layer attachment parameter follows the same pattern, ensuring consistency across the codebase.
38-39
: LGTM: Consistent attachment support implementationThe addition follows the established pattern across all data layer types, ensuring architectural consistency and backward compatibility.
57-58
: LGTM: Maintains consistency across precomputed layer typesThe segmentation layer implementation matches the data layer pattern, ensuring uniform attachment support across the codebase.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
327-332
: LGTM!The integration of attachment scanning into the data source loading process is well-implemented. The code maintains the existing logic flow while enriching data layers with discovered attachments, which aligns perfectly with the PR objectives for detecting special files during dataset scanning.
344-356
: LGTM!The
scanForAttachedFiles
method is cleanly implemented with systematic scanning for all attachment types. The approach of creating aDatasetAttachments
object for each layer and using the specialized scanning functions from the info objects is well-structured and maintainable.
327-332
: LGTM: Clean implementation of attachment scanning without JSON rewritingThe enhancement properly enriches data layers with attachment metadata while avoiding the automatic JSON rewriting concerns discussed in the PR. This aligns well with the approach of detecting special files and storing them in the database without modifying the source JSON files.
344-355
: LGTM: Well-structured attachment scanning implementationThe method provides a clean, functional approach to discovering and attaching special files to data layers. The use of specialized file info objects for different attachment types is well-organized and maintains good separation of concerns.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (2)
10-16
: Well-designed data model for dataset attachments.The
DatasetAttachments
case class provides a clean structure for organizing different types of attachment files, with appropriate use ofSeq
for multiple files andOption
for singular files.
57-65
: Clean implementation of MeshFileInfo.The singleton object follows a consistent pattern with appropriate constants and delegation to the utility methods.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)
87-87
: Consistent addition of attachments parameter.The optional
attachments
parameter with default valueNone
is consistent with the pattern established across other data layer classes in this PR.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2)
47-48
: Consistent pattern for WKWDataLayer attachments.The addition of the optional
attachments
parameter follows the established pattern across layer classes in this PR.
65-66
: Consistent pattern for WKWSegmentationLayer attachments.The addition matches the pattern used in
WKWDataLayer
and other layer classes.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2)
39-40
: Consistent pattern for Zarr3DataLayer attachments.The addition of the optional
attachments
parameter maintains consistency with the pattern established across all layer classes in this PR.
58-59
: Consistent pattern for Zarr3SegmentationLayer attachments.The addition matches the pattern used in
Zarr3DataLayer
and maintains consistency across all layer types.app/models/dataset/Dataset.scala (4)
17-17
: LGTM! Necessary imports for attachment file types.The new imports are correctly added to support the attachment functionality in the
DatasetLayerAttachmentsDAO
class.Also applies to: 20-20, 23-23, 26-27
850-854
: LGTM! Proper dependency injection implementation.The constructor correctly extends the
DatasetLayerDAO
to include the newDatasetLayerAttachmentsDAO
dependency, following the established pattern used for other DAO dependencies in this class.
979-979
: LGTM! Proper integration with existing update flow.The call to
updateAttachments
is correctly placed alongside other layer-related updates and uses proper for-comprehension syntax. This ensures attachment data is kept in sync when layers are updated.
1020-1049
: LGTM! Well-implemented DAO following established patterns.The
DatasetLayerAttachmentsDAO
class is excellently implemented with:
- Proper dependency injection and inheritance from
SimpleSQLDAO
- Consistent clear-and-insert pattern used by other DAO methods in this file
- Comprehensive handling of all attachment types (agglomerates, connectomes, segment indices, meshes, cumsum files)
- Appropriate use of transactions via
replaceSequentiallyAsTransaction
- Proper handling of edge cases with
getOrElse(List.empty)
for optional data structures- Correct SQL query construction with type casting for enum values
The implementation integrates seamlessly with the existing data layer management workflow.
...astore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/postgres/schema.sql
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (1)
tools/postgres/schema.sql (1)
24-24
: The insertion of the initialschemaVersion
(134) inwebknossos.releaseInformation
aligns with the migration index and is correct. No changes needed here.
CREATE TYPE webknossos.ATTACHMENT_FILE_TYPE AS ENUM ('agglomerate', 'connectome', 'segmentIndex', 'mesh', 'cumsum'); | ||
CREATE TYPE webknossos.ATTACHMENT_DATAFORMAT AS ENUM ('hdf5', 'zarr3', 'json'); | ||
CREATE TABLE webknossos.dataset_layer_attachments( | ||
_dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL, | ||
layerName TEXT NOT NULL, | ||
path TEXT NOT NULL, | ||
type webknossos.ATTACHMENT_FILE_TYPE NOT NULL, | ||
dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add primary key and foreign key constraints to dataset_layer_attachments
Currently, the new table lacks a primary key and foreign key relationships, which can lead to duplicate rows and orphaned attachments when datasets or layers are removed. Please define a composite primary key and enforce referential integrity.
Proposed diff:
CREATE TABLE webknossos.dataset_layer_attachments(
_dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
layerName TEXT NOT NULL,
path TEXT NOT NULL,
- type webknossos.ATTACHMENT_FILE_TYPE NOT NULL,
- dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL
+ type webknossos.ATTACHMENT_FILE_TYPE NOT NULL,
+ dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL,
+ PRIMARY KEY (_dataset, layerName, path),
+ FOREIGN KEY (_dataset, layerName)
+ REFERENCES webknossos.dataset_layers(_dataset, name)
+ ON DELETE CASCADE
);
+CREATE INDEX ON webknossos.dataset_layer_attachments(_dataset);
This ensures uniqueness, accelerates queries by dataset, and cascades deletions when a layer is dropped.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In tools/postgres/schema.sql between lines 165 and 173, the
dataset_layer_attachments table lacks primary key and foreign key constraints,
which can cause duplicate entries and orphaned records. Add a composite primary
key, likely on _dataset and layerName, and define foreign key constraints
referencing the datasets and layers tables to enforce referential integrity and
enable cascading deletes. This will improve data consistency and query
performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, works for me! I added some suggestions to the code.
I wonder if we should really store absolute paths for the attachments or rather paths relative to the dataset. I’m also unsure what exactly the plan is for how to use this info later on when requesting data from these files. Should the full path be exposed to the frontend and then passed to the requests (e.g. for requesting data with an agglomerate mapping applied)? Or should we introduce some kind of unique name key to pass around as an identifier? Maybe @normanrz can comment on these questions or maybe you already talked about that.
Also, the scanning currently works only for files with .json or .hdf5 endings respectively. Maybe we should support that for zarr directories too (or at least have a plan for how to do this once #8633 is complete).
|
||
do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 133, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql; | ||
|
||
CREATE TYPE webknossos.ATTACHMENT_FILE_TYPE AS ENUM ('agglomerate', 'connectome', 'segmentIndex', 'mesh', 'cumsum'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming could be more consistent. We now have attachment, layer_attachment, AttachedFile and here attachment_file_type.
How about
table webknossos.dataset_layer_attachments
, case class LayerAttachment
, field attachments
in Layer, sql types LAYER_ATTACHMENT_TYPE
, LAYER_ATTACHMENT_DATAFORMAT
with matching scala enums?
?
path TEXT NOT NULL, | ||
type webknossos.ATTACHMENT_FILE_TYPE NOT NULL, | ||
dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a primary key relation for dataset, layerName and path combined?
Also, dataset and layerName should possibly be foreign keys to the respective tables (we have the foreign key constraints way down in the schema.sql file)
val agglomerateFiles = | ||
(newAttachments.agglomerates ++ existingFiles.agglomerates).distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to remove existing attachments? I’m unsure what we want here, maybe this is fine for the moment.
None | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe scanForSingleFile could use scanForFiles and then use headOption, to save some code duplication? Or would we expect a performance impact from that?
object AgglomerateFileInfo { | ||
|
||
val directoryName = "agglomerates" | ||
private val scanExtension = "hdf5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soon we will have zarr3 agglomerate “files” #8633 and after that also zarr3 meshfiles, segment stats, connectome files. They don’t necessarily have an extension at all, since they are directories. I wonder how we would express this here? Should everything that has no extension be assumed to be zarr3? Should we search for a zarr.json in the directory? Should we introduce the convention that these directories must be named *.zarr?
Alternatively, we could say there is no scanning for those and they have to be registered in the datasource-properties.json (or later the db directly).
cc @normanrz
That is a good question. We should not expose the paths to the users. Especially with S3 paths, we really don't want them to leave the backend. I think we should add a
I don't think we need to implement scanning for Zarr-based attachments. We can register them in the json from day 1. |
URL of deployed dev instance (used for testing):
https://___.webknossos.xyzAttachments files that are stored in the dataset directory and are not the dataset itself, so:
With this PR, during dataset inbox scanning, we scan the attachment directories and look for such special files. If files are found these are then sent to WK and written into the db. Nothing is done with the data in the db for now.
Right now the sources or paths for the files are URIs. File URIs are always absolute so they are not relative to the dataset. So we could not use URIs for files (but it would make matching later harder).
Steps to test:
TODOs:
Issues:
(Please delete unneeded items, merge only when none are left open)
Updated changelog-> No user facing changesUpdated documentation if applicableAdapted wk-libs python client if relevant API parts change