Skip to content

Store layer-specific attachments #8598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Store layer-specific attachments #8598

wants to merge 25 commits into from

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented May 5, 2025

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Attachments files that are stored in the dataset directory and are not the dataset itself, so:

  • precomputed mesh files
  • agglomerate files
  • segment index file
  • connectome files
  • cumsum file

With this PR, during dataset inbox scanning, we scan the attachment directories and look for such special files. If files are found these are then sent to WK and written into the db. Nothing is done with the data in the db for now.

Right now the sources or paths for the files are URIs. File URIs are always absolute so they are not relative to the dataset. So we could not use URIs for files (but it would make matching later harder).

Steps to test:

  • Have datasets with meshes, agglomerate files and segment index, cumsum, connectome files. These should be detected.
  • Scan for datasets
  • Check the database, the attachments should be present here.
  • Also test: Have a dataset that specifies attachments in the datasource-properties.json, these should also be present in the DB.

TODOs:

  • Address suggestions from coderabbit

Issues:


(Please delete unneeded items, merge only when none are left open)

Copy link
Contributor

coderabbitai bot commented May 5, 2025

📝 Walkthrough

Walkthrough

This update introduces a system for managing dataset layer attachments, including schema changes, data models, and logic to scan, store, and serialize attachment metadata. It enables explicit listing and handling of attachment files within datasets.

Changes

Files / Groups Change Summary
conf/evolutions/134-dataset-layer-attachments.sql, conf/evolutions/reversions/134-dataset-layer-attachments.sql, MIGRATIONS.unreleased.md Add migration for new attachment enums and table; add reversion script; update migration documentation.
tools/postgres/schema.sql Add enums and table for dataset layer attachments to the Postgres schema.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala Introduce new data models for dataset attachments and utility functions for scanning and serializing attachment files.
app/models/dataset/Dataset.scala Add DatasetLayerAttachmentsDAO and integrate it into layer update logic; update constructor signatures and imports.
app/models/dataset/DatasetService.scala Add DatasetLayerAttachmentsDAO as a dependency to the service constructor.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala Add attachments field and merging logic to data layer models and traits.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala,
PrecomputedDataLayers.scala,
WKWDataLayers.scala,
Zarr3DataLayers.scala
Add optional attachments field to various data layer case classes.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala Refactor constructor parameter lists and add attachments field to Zarr data layer classes.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala Enhance data source loading to scan for and attach special files to data layers.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala Add override for attachments method returning None.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala Add optional attachments parameter to the constructor.
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala Add leading space to exception string formatting in stack traces.

Assessment against linked issues

Objective Addressed Explanation
Explicitly list mesh, agglomerate, segment index, connectome, and cumsum files in attachments within datasource-properties.json (8567) The code now supports parsing and utilizing attachments data, aligning with the JSON schema.
Parse and utilize attachments information in code, avoiding filesystem probing if available (8567) The code scans for attachment files and attaches metadata, avoiding probing when data is present.

Possibly related PRs

Suggested labels

backend, refactoring

Suggested reviewers

  • MichaelBuessemeyer
  • fm3

Poem

In the warren of code where attachments now dwell,
Each mesh and agglomerate has a story to tell.
With cumsum and connectome, the files align,
In JSON and tables, their paths intertwine.
🐇 Hopping through layers, the rabbit’s delight—
Attachments are tracked, and all is just right!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth requested a review from fm3 May 5, 2025 15:44
@frcroth frcroth marked this pull request as ready for review May 5, 2025 15:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
conf/evolutions/131-special-files.sql (2)

5-10: Add dataset_layer_special_files table to store special file metadata.
The table captures _dataset, layerName, path, and type. To prevent duplicate entries and improve query performance, consider adding a composite primary key or unique constraint, for example on (_dataset, layerName, path), and indexing fields frequently queried (e.g., type).

 CREATE TABLE webknossos.dataset_layer_special_files(
-   _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
-   layerName TEXT NOT NULL,
-   path TEXT NOT NULL,
-   type TEXT NOT NULL
-);
+   _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
+   layerName TEXT NOT NULL,
+   path TEXT NOT NULL,
+   type TEXT NOT NULL,
+   PRIMARY KEY (_dataset, layerName, path)
+);
 
 # Optionally, add an index for faster lookups by type
+CREATE INDEX ON webknossos.dataset_layer_special_files(type);

3-4: Validate schema version before migration.
Using an ASSERT guard is good, but consider transforming it into a RAISE EXCEPTION for clearer failure semantics and consistent transaction rollback behavior.

tools/postgres/schema.sql (1)

172-178: Define dataset_layer_special_files table in core schema.
Including the new table here is essential for fresh installs. To enforce data integrity and speed up lookups, add a composite primary key and consider indexing type.

 CREATE TABLE webknossos.dataset_layer_special_files(
-  _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
-  layerName TEXT NOT NULL,
-  path TEXT NOT NULL,
-  type TEXT NOT NULL
-);
+  _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
+  layerName TEXT NOT NULL,
+  path TEXT NOT NULL,
+  type TEXT NOT NULL,
+  PRIMARY KEY (_dataset, layerName, path)
+);
+CREATE INDEX ON webknossos.dataset_layer_special_files(type);
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)

336-337: Prefer named parameter in copy for robustness

dataSourceWithSpecialFiles.copy(id) relies on the current parameter order of the DataSource case-class.
If the constructor order ever changes, the wrong field may be overwritten without a compiler error.

-            dataSourceWithSpecialFiles.copy(id)
+            dataSourceWithSpecialFiles.copy(id = id)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

278-309: withSpecialFiles duplicates per-format logic and misses deduplication edge-cases

  1. The large match block must be manually kept in sync with every new *DataLayer subtype ➜ high maintenance burden.
  2. mergeSpecialFiles prevents duplicates only across existing vs new files, but duplicates within newSpecialFiles remain.

Refactor suggestion (sketch):

trait SpecialFilesMixin { self: Product =>
  def withSpecialFiles(files: Seq[SpecialFile]): self.type
}

def withSpecialFiles(newFiles: List[SpecialFile]): DataLayer = {
  if (newFiles.isEmpty) return this

  val dedupedNew = newFiles.distinctBy(_.source.toString)
  this match {
    case sf: SpecialFilesMixin =>
      sf.withSpecialFiles(dedupedNew).asInstanceOf[DataLayer]
    case _ => this
  }
}

This removes the exhaustive match, automatically supports future layer types, and performs full de-duplication with distinctBy.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03c8396 and 15ecbc6.

📒 Files selected for processing (16)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/models/dataset/Dataset.scala (3 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • conf/evolutions/131-special-files.sql (1 hunks)
  • conf/evolutions/reversions/131-special-files.sql (1 hunks)
  • tools/postgres/schema.sql (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala (2)
  • SpecialFile (20-23)
  • SpecialFile (25-43)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)
  • specialFiles (108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
  • additionalAxes (106-106)
  • specialFiles (108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
  • additionalAxes (106-106)
  • specialFiles (108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
  • additionalAxes (106-106)
  • specialFiles (108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)
  • specialFiles (108-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
  • additionalAxes (106-106)
  • specialFiles (108-108)
🔇 Additional comments (22)
MIGRATIONS.unreleased.md (1)

12-12: LGTM: Migration for special files added.

The addition of the new migration script for special files is correctly referenced in the unreleased migrations list.

app/models/dataset/DatasetService.scala (1)

39-39: LGTM: New DatasetLayerSpecialFilesDAO dependency added.

The service constructor has been updated to include the new DAO for special files, which is necessary for managing the persistence of special files associated with dataset layers.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)

87-87: LGTM: Added specialFiles field to VolumeTracingLayer.

The optional specialFiles field is correctly added to the VolumeTracingLayer case class with a default value of None. This is consistent with the data layer model extensions throughout the codebase to support special files.

conf/evolutions/reversions/131-special-files.sql (1)

1-9: LGTM: Reversion script follows best practices.

The reversion script properly:

  1. Uses a transaction for atomicity
  2. Verifies the current schema version before proceeding
  3. Drops the dataset_layer_special_files table
  4. Updates the schema version back to 130
  5. Commits the transaction

This provides a clean rollback path if needed.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)

19-20: LGTM: Added import for SpecialFile.

The import statement is correctly added to support the new special files functionality.


108-109: LGTM: Added specialFiles method implementation.

The specialFiles method is correctly implemented with a default return value of None. This maintains consistency with the special files functionality being added throughout the system.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (2)

38-39: Ensure backward compatibility with default None.
The new specialFiles: Option[Seq[SpecialFile]] = None field is optional and defaults to None, preserving compatibility with existing JSON schemas and clients that do not expect this field. The Json.format macro will automatically include it when present.


57-58: Add specialFiles to segmentation layer model.
Mirroring the data layer, N5SegmentationLayer now includes specialFiles: Option[Seq[SpecialFile]] = None. The default None ensures old payloads remain valid and the JSON formatter seamlessly handles the new field.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2)

47-48: Support specialFiles in WKWDataLayer.
Appending additionalAxes and specialFiles (defaulting to None) maintains backward compatibility for JSON serialization and downstream consumers that are unaware of special files.


65-66: Enable specialFiles on WKWSegmentationLayer.
Adding specialFiles: Option[Seq[SpecialFile]] = None aligns segmentation layers with data layers and ensures JSON macros include this metadata only when detected.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2)

38-38: Integrate specialFiles into ZarrDataLayer.
With the default None, this addition preserves existing client behavior and is automatically picked up by the Json.format macro.


58-58: Add specialFiles to ZarrSegmentationLayer.
Defaulting to None ensures that JSON payloads without this field continue to parse correctly, while new payloads can carry the special files metadata.

conf/evolutions/131-special-files.sql (1)

12-14: Advance schema version to 131.
Updating the releaseInformation record and committing finalizes the migration. Ensure that the corresponding reversion script reverts the version to 130.

tools/postgres/schema.sql (1)

24-25: Bootstrap schema version to 131.
This initial INSERT sets the schema to v131. Verify that the migration scripts and bootstrapping logic do not attempt to insert the same version twice, which could lead to duplicate-key or version mismatches.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2)

39-40: Appropriate addition of specialFiles field.

The new specialFiles parameter correctly follows the same pattern as additionalAxes, maintaining consistency with the rest of the codebase.


58-59: Appropriate addition of specialFiles field.

The new specialFiles parameter correctly follows the same pattern as additionalAxes, maintaining consistency with the rest of the codebase.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (2)

38-39: Appropriate addition of specialFiles field.

The new specialFiles parameter correctly follows the same pattern as additionalAxes, maintaining consistency with the rest of the codebase.


57-58: Appropriate addition of specialFiles field.

The new specialFiles parameter correctly follows the same pattern as additionalAxes, maintaining consistency with the rest of the codebase.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala (1)

1-44: Well-structured implementation of special file model.

The new SpecialFile model is well-designed with proper enumeration types, case class structure, and JSON serialization support. The companion object appropriately encapsulates file extensions and directory names while exposing them through the types method.

A few observations:

  1. All special file types currently use the same file extension (hdf5)
  2. The URI scheme documentation correctly indicates file:// for local files
  3. The model structure aligns well with the PR objectives for special file detection and storage
app/models/dataset/Dataset.scala (3)

825-829: Appropriate constructor parameter addition.

The addition of datasetLayerSpecialFilesDAO to the constructor follows the dependency injection pattern used throughout the codebase.


946-946: Correct integration of special files update.

The new call to datasetLayerSpecialFilesDAO.updateSpecialFiles is appropriately placed alongside other layer property updates, ensuring consistent transaction handling.


987-1005: Well-implemented DAO for special files.

The DatasetLayerSpecialFilesDAO implementation follows the same patterns as other DAO classes in the codebase. The updateSpecialFiles method properly handles clearing existing entries and inserting new ones within a transaction.

The implementation correctly:

  1. Deletes existing special files for the dataset
  2. Inserts new entries for each special file in the provided data layers
  3. Uses the replaceSequentiallyAsTransaction helper for proper transaction handling

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
conf/evolutions/reversions/133-special-files.sql (1)

1-3: Enhance schema version assertion for clarity
While the ASSERT ensures the schema version is 133, using an explicit conditional with RAISE EXCEPTION can produce more informative error messages (including the actual current version), improving debuggability if the check fails.

Example diff:

-do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 133, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql;
+DO $$
+DECLARE
+  current_version INTEGER := (SELECT schemaVersion FROM webknossos.releaseInformation);
+BEGIN
+  IF current_version <> 133 THEN
+    RAISE EXCEPTION 'Schema version mismatch: expected 133, found %', current_version;
+  END IF;
+END
+$$ LANGUAGE plpgsql;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d37e282 and 1e07d02.

📒 Files selected for processing (3)
  • MIGRATIONS.unreleased.md (1 hunks)
  • conf/evolutions/133-special-files.sql (1 hunks)
  • conf/evolutions/reversions/133-special-files.sql (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • conf/evolutions/133-special-files.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • MIGRATIONS.unreleased.md
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (2)
conf/evolutions/reversions/133-special-files.sql (2)

9-9: Transaction handling looks correct
Using START TRANSACTION; and COMMIT TRANSACTION; properly scopes the revert.


5-7: ⚠️ Potential issue

Complete the rollback by dropping associated sequences and indexes
Dropping just the table leaves behind the serial‐backing sequence and any indexes created in the forward migration. To avoid leaving orphaned objects, explicitly drop them and consider CASCADE for dependent objects.

Suggested diff:

-DROP TABLE IF EXISTS webknossos.dataset_layer_special_files;
+DROP TABLE IF EXISTS webknossos.dataset_layer_special_files CASCADE;
+-- Remove the sequence created for the serial primary key
+DROP SEQUENCE IF EXISTS webknossos.dataset_layer_special_files_id_seq;
+-- If any custom indexes were added in the forward migration, drop them here:
+-- DROP INDEX IF EXISTS webknossos.idx_dataset_layer_special_files_<index_name>;

Likely an incorrect or invalid review comment.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this! It does indeed raise some important questions.

Yes, I think we do want the special files also in the datasource-properties.json, compare the discussion in #8567

However, I think we might not want them grouped into a “special files” key there, I think norman’s suggestion works better. Also we need to take care of the paths (should be possible to have them relative,absolute,remote…)

On top of this, the hdf5 format will soon no longer be the only option for those files, as several of them are currently rewritten as zarr. I’m not sure how to represent that here, maybe you have a good idea.

I also don’t think that the datasource-properties.jsons should be automatically rewritten all the time. In fact, this may fail for some datastores that don’t have write access on their own filesystems.
As a general rule, whenever we rewrite the json file, we should back up its old contents, there should already be code for that.

@normanrz What do you think, should WK at all try to detect these files and then propagate that into the DB/json? Or should the DB just mirror what is already in the json? We could possibly write a migration script for all existing datasource-properties.jsons to add explicit links to these files. It could then become user (vx) responsibility to list these files in the json if they want them to be accessible in WK.

_dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
layerName TEXT NOT NULL,
path TEXT NOT NULL,
type TEXT NOT NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe type should be an enum?

@normanrz
Copy link
Member

@normanrz What do you think, should WK at all try to detect these files and then propagate that into the DB/json? Or should the DB just mirror what is already in the json? We could possibly write a migration script for all existing datasource-properties.jsons to add explicit links to these files. It could then become user (vx) responsibility to list these files in the json if they want them to be accessible in WK.

I think WK should detect the special files and store them in the DB, without rewriting the datasource-properties.json. At least for now. Migrating all the jsons and modifying/releasing VX will take some time.

@fm3
Copy link
Member

fm3 commented May 15, 2025

Fair enough. I think it still makes sense to adapt our case classes (and possibly db schema) to match the schema discussed in #8567

@normanrz
Copy link
Member

Maybe it even makes sense to already do #8567 as part of this PR? Because if the fields are available in the json the datastore doesn't need to autodetect anymore.

@frcroth
Copy link
Member Author

frcroth commented May 19, 2025

Maybe it even makes sense to already do #8567 as part of this PR? Because if the fields are available in the json the datastore doesn't need to autodetect anymore.

If I understand it correctly in that case autodetection would only happen once and then never again, since the fields already exist and further scanning is stopped even if new files were to be added. Is this intended?

@fm3
Copy link
Member

fm3 commented May 19, 2025

As I understand, yes. Users would have to edit the json if they want new files to be registered. Right @normanrz ?

@normanrz
Copy link
Member

Maybe it even makes sense to already do #8567 as part of this PR? Because if the fields are available in the json the datastore doesn't need to autodetect anymore.

If I understand it correctly in that case autodetection would only happen once and then never again, since the fields already exist and further scanning is stopped even if new files were to be added. Is this intended?

In my earlier comment

I think WK should detect the special files and store them in the DB, without rewriting the datasource-properties.json. At least for now.

I mentioned that the json wouldn't be rewritten. So, I guess the autodetection would need to be done every time?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
conf/evolutions/reversions/134-special-files.sql (2)

5-7: Consider CASCADE for type drops or verify no dependencies
DROP TABLE IF EXISTS followed by DROP TYPE IF EXISTS is correctly ordered, but if any dependent objects still reference these enums, the type drops will fail. If you intend to remove all dependents, append CASCADE; otherwise, validate that no residual dependencies exist before running this script.

Suggested diff:

-DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_TYPE;
-DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_DATAFORMAT;
+DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_TYPE CASCADE;
+DROP TYPE IF EXISTS webknossos.SPECIAL_FILE_DATAFORMAT CASCADE;

9-9: Safeguard the version update with a WHERE clause
To avoid accidentally updating the schema version if it has already changed, narrow the update to only the expected row:

-UPDATE webknossos.releaseInformation SET schemaVersion = 133;
+UPDATE webknossos.releaseInformation
+SET schemaVersion = 133
+WHERE schemaVersion = 134;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e07d02 and 6f2c1b7.

📒 Files selected for processing (16)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • conf/evolutions/134-special-files.sql (1 hunks)
  • conf/evolutions/reversions/134-special-files.sql (1 hunks)
  • tools/postgres/schema.sql (3 hunks)
  • util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala
  • conf/evolutions/134-special-files.sql
🚧 Files skipped from review as they are similar to previous changes (13)
  • MIGRATIONS.unreleased.md
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala
  • tools/postgres/schema.sql
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/SpecialFile.scala
  • app/models/dataset/Dataset.scala
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (2)
conf/evolutions/reversions/134-special-files.sql (2)

1-2: Ensure atomic reversion with transaction
Using START TRANSACTION; at the top is good—it guarantees the reversion is all-or-nothing.


11-11: Finalize the transaction
COMMIT TRANSACTION; cleanly completes the atomic reversion.

@normanrz
Copy link
Member

Instead of "special files" what about calling them "attachments" or "attached files"?

@frcroth frcroth changed the title Store layer-specific "special files" Store layer-specific attachments May 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
tools/postgres/schema.sql (1)

165-173: LGTM: Well-designed schema with proper enum types

The new enum types and attachment table are well-structured. The implementation addresses the previous review feedback about using enum types for the type field, providing better type safety and data integrity. The enum values appropriately cover the expected attachment file types and data formats.

🧹 Nitpick comments (3)
conf/evolutions/134-dataset-layer-attachments.sql (1)

1-18: LGTM! Well-structured database migration with proper safety measures.

The migration script follows best practices with transaction management, schema version assertion, and atomic operations. The ENUM values appear complete based on the PR context.

Consider adding explicit constraints or documentation about the intended uniqueness semantics for the dataset_layer_attachments table. Currently, the table allows duplicate entries for the same dataset/layer/type combination, which may or may not be intentional.

-- Consider adding a unique constraint if duplicates should not be allowed:
-- ALTER TABLE webknossos.dataset_layer_attachments 
-- ADD CONSTRAINT unique_dataset_layer_attachment 
-- UNIQUE (_dataset, layerName, type, path);
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

330-368: Well-implemented attachment merging logic.

The withAttachments method provides sophisticated merging behavior:

Strengths:

  • Handles merging of existing attachments with new ones intelligently
  • Deduplicates list-based attachments (meshes, agglomerates, connectomes)
  • Prioritizes new single-value attachments (segmentIndex, cumsum) appropriately
  • Comprehensive pattern matching covers all concrete layer types

Minor consideration:
The merging logic always prioritizes new values for segmentIndex and cumsum. Consider if this behavior aligns with business requirements - should there be validation or warnings when overwriting existing values?

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (1)

35-48: Consider refactoring duplicate scanning logic

All scanning methods follow the same pattern with only minor variations. This duplication could lead to maintenance issues.

Consider extracting a common scanning method:

private def scanForFiles(layerDirectory: Path, directoryName: String, extension: String, selectFirst: Boolean = false): Box[Seq[AttachedFile]] = {
  val dir = layerDirectory.resolve(directoryName)
  if (Files.exists(dir)) {
    PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(extension)) match {
      case Full(paths) if paths.nonEmpty =>
        val files = paths.map(path => AttachedFile(path.toUri, extension))
        Full(if (selectFirst) files.take(1) else files)
      case _ => Full(Seq.empty)
    }
  } else {
    Full(Seq.empty)
  }
}

Then each scan method could be simplified to:

def scanForMeshFiles(layerDirectory: Path): Seq[AttachedFile] = 
  scanForFiles(layerDirectory, directoryName, scanExtension).openOr(Seq.empty)

Also applies to: 57-71, 80-94, 103-116, 125-139

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f2c1b7 and cdef9c6.

📒 Files selected for processing (16)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • conf/evolutions/134-dataset-layer-attachments.sql (1 hunks)
  • conf/evolutions/reversions/134-dataset-layer-attachments.sql (1 hunks)
  • tools/postgres/schema.sql (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • conf/evolutions/reversions/134-dataset-layer-attachments.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • MIGRATIONS.unreleased.md
🔇 Additional comments (30)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)

87-87: LGTM! Clean addition of attachment support.

The optional attachments parameter follows the established pattern and maintains backward compatibility with the None default value.

app/models/dataset/DatasetService.scala (1)

39-39: LGTM! Clean dependency injection addition.

The addition of datasetLayerAttachmentsDAO parameter follows proper dependency injection patterns and enables the service to manage dataset attachments.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)

20-20: LGTM! Import addition supports the new attachment functionality.

The import of DatasetAttachments is necessary for the new attachments method override.


108-108: LGTM! Consistent attachment support implementation.

The attachments override returning None is appropriate for EditableMappingLayer and follows the established pattern for adding attachment support across layer types.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (3)

38-39: LGTM: Consistent attachment field addition.

The addition of the optional attachments field to both N5 layer types is well-implemented:

  • Uses Option[DatasetAttachments] for proper null safety
  • Defaults to None for backward compatibility
  • Maintains consistent parameter positioning across layer types

Also applies to: 57-58


38-39: LGTM: Consistent attachments field addition

The addition of the attachments field to N5DataLayer follows the expected pattern with appropriate typing and default value.


57-58: LGTM: Consistent attachments field addition

The addition of the attachments field to N5SegmentationLayer matches the pattern used in the data layer counterpart.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (7)

231-231: LGTM: Core trait extension for attachments.

Adding the attachments field to the DataLayerLike trait is the correct approach to ensure all layer implementations can support attachments uniformly.


525-525: LGTM: Consistent abstract layer updates.

The updates to AbstractDataLayer and AbstractSegmentationLayer maintain consistency:

  • Added attachments field with proper default
  • Updated companion object from methods to preserve attachments during conversion
  • Maintains parameter ordering and type safety

Also applies to: 545-545, 567-567, 589-589


231-231: LGTM: Attachments field addition to trait

The addition of the attachments method to the DataLayerLike trait is consistent with the overall design pattern.


525-525: LGTM: AbstractDataLayer attachments field

The addition of the attachments field to AbstractDataLayer follows the established pattern.


545-545: LGTM: Updated companion object method

The from method correctly copies the attachments field from the source layer.


567-567: LGTM: AbstractSegmentationLayer attachments field

The addition of the attachments field to AbstractSegmentationLayer is consistent with the pattern.


589-589: LGTM: Updated companion object method

The from method correctly copies the attachments field from the source layer.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (3)

48-48: LGTM: Consistent WKW layer attachment support.

The addition of the attachments field to WKW layer types follows the established pattern:

  • Consistent with other layer type implementations
  • Proper Optional typing and default value
  • Maintains backward compatibility and clean parameter alignment

Also applies to: 66-66


48-48: LGTM: Consistent attachments field addition

The addition of the attachments field to WKWDataLayer follows the expected pattern with appropriate typing and default value.


66-66: LGTM: Consistent attachments field addition

The addition of the attachments field to WKWSegmentationLayer is consistent with the pattern used across other layer types.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (3)

39-39: LGTM: Completes consistent attachment support across all layer types.

The addition of the attachments field to Precomputed layer types maintains the established pattern across all data layer implementations:

  • Consistent API design across N5, WKW, Zarr, and Precomputed formats
  • Proper Optional typing ensures backward compatibility
  • Clean integration with existing parameter structure

This completes the systematic addition of attachment support across all layer types in the codebase.

Also applies to: 58-58


39-39: LGTM: Consistent attachments field addition

The addition of the attachments field to PrecomputedDataLayer follows the established pattern correctly.


58-58: LGTM: Consistent attachments field addition

The addition of the attachments field to PrecomputedSegmentationLayer completes the consistent pattern across all layer types.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2)

29-41: LGTM: Clean addition of attachments support

The addition of the optional attachments parameter with a default value of None maintains backward compatibility while enabling the new attachment functionality. The implementation follows consistent patterns with other data layer classes.


47-60: LGTM: Consistent implementation across layer types

The attachments parameter addition mirrors the Zarr3DataLayer implementation, ensuring consistency across both data layer and segmentation layer types. Good adherence to the established pattern.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2)

27-40: LGTM: Improved formatting and attachments support

The reformatted parameter list significantly improves readability with proper alignment, and the addition of the attachments parameter maintains the consistent pattern established across other data layer classes. The default value of None ensures backward compatibility.


46-60: LGTM: Consistent implementation with improved readability

The parameter list formatting and attachments field addition follow the same excellent pattern as ZarrDataLayer, ensuring consistency across both layer types while significantly improving code readability.

tools/postgres/schema.sql (1)

24-24: LGTM: Appropriate schema version bump

The schema version update to 134 correctly reflects the addition of the new attachment-related database objects and follows proper migration versioning practices.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)

327-332: LGTM: Clean integration of attachment scanning

The modification elegantly integrates attachment scanning into the existing data source loading flow without disrupting the established logic. The implementation maintains backward compatibility while adding the new functionality seamlessly.


344-358: LGTM: Well-structured attachment scanning implementation

The scanForAttachedFiles method demonstrates excellent separation of concerns by using dedicated scanning functions for each file type. The functional approach with withAttachments maintains immutability while cleanly integrating discovered attachments with existing data layers.

app/models/dataset/Dataset.scala (1)

1019-1069: LGTM! Well-structured DAO implementation

The DatasetLayerAttachmentsDAO implementation correctly handles attachment updates with proper transaction semantics and type safety. The use of companion object constants for types ensures consistency with the scanning logic.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (2)

32-33: Hardcoded file extensions may be too restrictive

The scanning logic only looks for specific extensions (mostly "hdf5"), but the database schema supports multiple formats ('hdf5', 'zarr3', 'json'). This might prevent detection of valid attachment files in other formats.

Should the scanning logic support multiple file formats? For example:

  • Meshes: hdf5, zarr3
  • Agglomerates: hdf5, zarr3
  • Segment indices: hdf5, zarr3
  • Connectomes: hdf5, zarr3
  • Cumsum: json

If so, consider updating the scanning logic to check for multiple extensions.

Also applies to: 54-55, 77-78, 100-101, 122-123


121-121:

❓ Verification inconclusive

Incorrect directory name for cumsum files

CumsumFileInfo uses the same directory name as AgglomerateFileInfo, which appears to be incorrect. Cumsum files and agglomerate files should likely be in separate directories.

Please verify the correct directory name for cumsum files. If they should indeed share the same directory as agglomerates, consider adding a comment explaining this design decision. Otherwise, update to use a distinct directory name.


🏁 Script executed:

#!/bin/bash
set -e
# Locate CumsumFileInfo and AgglomerateFileInfo definitions
rg "CumsumFileInfo" -n
rg "AgglomerateFileInfo" -n

# Show directoryName assignments in DatasetAttachments.scala
rg "val directoryName" -n webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala

Length of output: 1393


🏁 Script executed:

#!/bin/bash
set -e
sed -n '1,200p' webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala

Length of output: 4371


Verify directory name for CumsumFileInfo

CumsumFileInfo.directoryName is currently set to "agglomerates", duplicating the AgglomerateFileInfo folder.

Please confirm whether cumsum JSON files are indeed stored alongside the agglomerates HDF5 files. If they belong in their own folder (e.g. "cumsum"), update the directoryName; otherwise, add a comment explaining why both file types share the same directory.

• File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
– Line 118: val directoryName = "agglomerates" for CumsumFileInfo

@@ -970,6 +975,7 @@ class DatasetLayerDAO @Inject()(
_ <- datasetMagsDAO.updateMags(datasetId, source.toUsable.map(_.dataLayers))
_ <- datasetCoordinateTransformationsDAO.updateCoordinateTransformations(datasetId,
source.toUsable.map(_.dataLayers))
- <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix syntax error in for-comprehension

There's a typo in the for-comprehension binding.

Apply this diff to fix the syntax error:

-      - <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
+      _ <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
_ <- datasetLayerAttachmentsDAO.updateAttachments(datasetId, source.toUsable.map(_.dataLayers))
🤖 Prompt for AI Agents
In app/models/dataset/Dataset.scala at line 978, there is a syntax error caused
by an incorrect or incomplete for-comprehension binding. Remove the hyphen and
ensure the line is correctly formatted as part of the for-comprehension. The
line should be properly integrated into the for-comprehension syntax without
extraneous characters.

val dir = layerDirectory.resolve(directoryName)
if (Files.exists(dir)) {
val paths: Box[List[Path]] =
PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(scanExtension))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Silent error handling may hide important issues

Using silent = true in PathUtils.listFiles suppresses all errors, which could hide permission issues, I/O errors, or other problems that should be logged or handled.

Consider logging errors instead of silently ignoring them:

val paths: Box[List[Path]] = PathUtils.listFiles(dir, silent = false, PathUtils.fileExtensionFilter(scanExtension))
paths match {
  case Full(p) => p.map(path => AttachedFile(path.toUri, scanExtension))
  case Failure(msg, _, _) => 
    logger.warn(s"Failed to scan $directoryName directory: $msg")
    Seq.empty
  case _ => Seq.empty
}

Also applies to: 61-61, 84-84, 107-107, 129-129

🤖 Prompt for AI Agents
In
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala
at line 39, the use of silent = true in PathUtils.listFiles suppresses errors,
potentially hiding important issues like permission or I/O errors. Change silent
to false to enable error logging and handling, and add a match statement to log
any failures, ensuring errors are visible and can be addressed appropriately.

Comment on lines +330 to +368
def withAttachments(attachments: DatasetAttachments): DataLayer = {
def mergeAttachments(existingAttachmentsOpt: Option[DatasetAttachments],
newAttachments: DatasetAttachments): Option[DatasetAttachments] =
existingAttachmentsOpt match {
case None => Some(newAttachments)
case Some(existingFiles) =>
val segmentIndex = newAttachments.segmentIndex.orElse(existingFiles.segmentIndex)
val connectome = (newAttachments.connectomes ++ existingFiles.connectomes).distinct
val agglomerateFiles =
(newAttachments.agglomerates ++ existingFiles.agglomerates).distinct
val meshFiles =
(newAttachments.meshes ++ existingFiles.meshes).distinct
val cumsumFile =
newAttachments.cumsum.orElse(existingFiles.cumsum)

Some(
DatasetAttachments(
meshes = meshFiles,
agglomerates = agglomerateFiles,
segmentIndex = segmentIndex,
connectomes = connectome,
cumsum = cumsumFile
))
}

this match {
case l: N5DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: N5SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: PrecomputedDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: PrecomputedSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: Zarr3DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: Zarr3SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: ZarrDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: ZarrSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: WKWDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: WKWSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case _ => this
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Review merging logic and pattern matching completeness

The withAttachments method implementation looks solid overall, but please verify the following:

  1. Merging Logic: The current logic prioritizes new attachments for segmentIndex and cumsum (single-value fields) and deduplicates lists for connectomes, agglomerates, and meshes. This seems reasonable.

  2. Pattern Matching Coverage: Ensure all current layer types are covered. The fallback case _ returns this unchanged, which may silently ignore attachment updates for new layer types.

Consider making the pattern matching exhaustive by removing the wildcard case and explicitly handling each layer type, or add logging when the fallback case is hit:

  case _                               => 
+   // Log warning about unsupported layer type
+   logger.warn(s"withAttachments not implemented for layer type: ${this.getClass.getSimpleName}")
    this
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def withAttachments(attachments: DatasetAttachments): DataLayer = {
def mergeAttachments(existingAttachmentsOpt: Option[DatasetAttachments],
newAttachments: DatasetAttachments): Option[DatasetAttachments] =
existingAttachmentsOpt match {
case None => Some(newAttachments)
case Some(existingFiles) =>
val segmentIndex = newAttachments.segmentIndex.orElse(existingFiles.segmentIndex)
val connectome = (newAttachments.connectomes ++ existingFiles.connectomes).distinct
val agglomerateFiles =
(newAttachments.agglomerates ++ existingFiles.agglomerates).distinct
val meshFiles =
(newAttachments.meshes ++ existingFiles.meshes).distinct
val cumsumFile =
newAttachments.cumsum.orElse(existingFiles.cumsum)
Some(
DatasetAttachments(
meshes = meshFiles,
agglomerates = agglomerateFiles,
segmentIndex = segmentIndex,
connectomes = connectome,
cumsum = cumsumFile
))
}
this match {
case l: N5DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: N5SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: PrecomputedDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: PrecomputedSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: Zarr3DataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: Zarr3SegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: ZarrDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: ZarrSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: WKWDataLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case l: WKWSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case _ => this
}
}
case l: WKWSegmentationLayer => l.copy(attachments = mergeAttachments(l.attachments, attachments))
case _ =>
// Log warning about unsupported layer type
logger.warn(s"withAttachments not implemented for layer type: ${this.getClass.getSimpleName}")
this
🤖 Prompt for AI Agents
In
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
lines 330 to 368, the pattern matching for different layer types includes a
fallback case that silently ignores unsupported types. To improve robustness,
either make the pattern matching exhaustive by explicitly listing all layer
types or add logging in the fallback case to alert when an unsupported layer
type is encountered. This ensures no layer types are unintentionally ignored
without notice.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (2)

27-39: Address silent error handling as flagged in previous reviews.

The use of silent = true suppresses all errors, potentially hiding important issues like permission problems or I/O errors that should be logged.

Consider this improvement:

-      val paths: Box[List[Path]] =
-        PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(scanExtension))
+      val paths: Box[List[Path]] =
+        PathUtils.listFiles(dir, silent = false, PathUtils.fileExtensionFilter(scanExtension))
       paths match {
-        case Full(p) => p.map(path => AttachedFile(path.toUri, scanExtension))
-        case _       => Seq.empty
+        case Full(p) => p.map(path => AttachedFile(path.toUri, scanExtension))
+        case Failure(msg, _, _) => 
+          // Consider adding logging here
+          Seq.empty
+        case _ => Seq.empty
       }

41-55: Address silent error handling in scanForSingleFile method.

Same issue with silent error handling as in the previous method.

Apply the same fix:

-      val paths: Box[List[Path]] =
-        PathUtils.listFiles(dir, silent = true, PathUtils.fileExtensionFilter(scanExtension))
+      val paths: Box[List[Path]] =
+        PathUtils.listFiles(dir, silent = false, PathUtils.fileExtensionFilter(scanExtension))
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (1)

22-22: Consider using a more descriptive field name for data format.

The dataFormat field stores file extensions (like "hdf5", "json") rather than actual data formats. Consider renaming to fileExtension or format for clarity.

-case class AttachedFile(path: URI, dataFormat: String)
+case class AttachedFile(path: URI, fileExtension: String)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdef9c6 and aa1a642.

📒 Files selected for processing (9)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1 hunks)
🔇 Additional comments (24)
MIGRATIONS.unreleased.md (1)

16-16: LGTM!

The migration guide entry is properly formatted and correctly documents the new database schema evolution for dataset layer attachments.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (4)

38-38: LGTM!

The addition of the optional attachments parameter is well-implemented with proper default value and maintains backward compatibility. This change is consistent with the schema extension pattern applied across all data layer classes.


58-58: LGTM!

The attachments parameter addition follows the same pattern as ZarrDataLayer, ensuring consistency across segmentation layers.


38-38: LGTM: Clean addition of optional attachments field

The addition of the attachments parameter is well-structured with proper positioning, appropriate optional typing, and backward compatibility through the default value.


58-58: LGTM: Consistent implementation across Zarr layer types

The attachment field addition follows the same pattern as ZarrDataLayer, maintaining consistency across the codebase.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (4)

38-39: LGTM!

The attachments parameter addition is properly implemented with backward compatibility and follows the consistent schema extension pattern used across all data layer classes.


57-58: LGTM!

The segmentation layer attachment parameter follows the same pattern, ensuring consistency across the codebase.


38-39: LGTM: Consistent attachment support implementation

The addition follows the established pattern across all data layer types, ensuring architectural consistency and backward compatibility.


57-58: LGTM: Maintains consistency across precomputed layer types

The segmentation layer implementation matches the data layer pattern, ensuring uniform attachment support across the codebase.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)

327-332: LGTM!

The integration of attachment scanning into the data source loading process is well-implemented. The code maintains the existing logic flow while enriching data layers with discovered attachments, which aligns perfectly with the PR objectives for detecting special files during dataset scanning.


344-356: LGTM!

The scanForAttachedFiles method is cleanly implemented with systematic scanning for all attachment types. The approach of creating a DatasetAttachments object for each layer and using the specialized scanning functions from the info objects is well-structured and maintainable.


327-332: LGTM: Clean implementation of attachment scanning without JSON rewriting

The enhancement properly enriches data layers with attachment metadata while avoiding the automatic JSON rewriting concerns discussed in the PR. This aligns well with the approach of detecting special files and storing them in the database without modifying the source JSON files.


344-355: LGTM: Well-structured attachment scanning implementation

The method provides a clean, functional approach to discovering and attaching special files to data layers. The use of specialized file info objects for different attachment types is well-organized and maintains good separation of concerns.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetAttachments.scala (2)

10-16: Well-designed data model for dataset attachments.

The DatasetAttachments case class provides a clean structure for organizing different types of attachment files, with appropriate use of Seq for multiple files and Option for singular files.


57-65: Clean implementation of MeshFileInfo.

The singleton object follows a consistent pattern with appropriate constants and delegation to the utility methods.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)

87-87: Consistent addition of attachments parameter.

The optional attachments parameter with default value None is consistent with the pattern established across other data layer classes in this PR.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (2)

47-48: Consistent pattern for WKWDataLayer attachments.

The addition of the optional attachments parameter follows the established pattern across layer classes in this PR.


65-66: Consistent pattern for WKWSegmentationLayer attachments.

The addition matches the pattern used in WKWDataLayer and other layer classes.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (2)

39-40: Consistent pattern for Zarr3DataLayer attachments.

The addition of the optional attachments parameter maintains consistency with the pattern established across all layer classes in this PR.


58-59: Consistent pattern for Zarr3SegmentationLayer attachments.

The addition matches the pattern used in Zarr3DataLayer and maintains consistency across all layer types.

app/models/dataset/Dataset.scala (4)

17-17: LGTM! Necessary imports for attachment file types.

The new imports are correctly added to support the attachment functionality in the DatasetLayerAttachmentsDAO class.

Also applies to: 20-20, 23-23, 26-27


850-854: LGTM! Proper dependency injection implementation.

The constructor correctly extends the DatasetLayerDAO to include the new DatasetLayerAttachmentsDAO dependency, following the established pattern used for other DAO dependencies in this class.


979-979: LGTM! Proper integration with existing update flow.

The call to updateAttachments is correctly placed alongside other layer-related updates and uses proper for-comprehension syntax. This ensures attachment data is kept in sync when layers are updated.


1020-1049: LGTM! Well-implemented DAO following established patterns.

The DatasetLayerAttachmentsDAO class is excellently implemented with:

  • Proper dependency injection and inheritance from SimpleSQLDAO
  • Consistent clear-and-insert pattern used by other DAO methods in this file
  • Comprehensive handling of all attachment types (agglomerates, connectomes, segment indices, meshes, cumsum files)
  • Appropriate use of transactions via replaceSequentiallyAsTransaction
  • Proper handling of edge cases with getOrElse(List.empty) for optional data structures
  • Correct SQL query construction with type casting for enum values

The implementation integrates seamlessly with the existing data layer management workflow.

@frcroth frcroth requested a review from fm3 May 26, 2025 09:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa1a642 and 71d1ee6.

📒 Files selected for processing (1)
  • tools/postgres/schema.sql (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (1)
tools/postgres/schema.sql (1)

24-24: The insertion of the initial schemaVersion (134) in webknossos.releaseInformation aligns with the migration index and is correct. No changes needed here.

Comment on lines +165 to +173
CREATE TYPE webknossos.ATTACHMENT_FILE_TYPE AS ENUM ('agglomerate', 'connectome', 'segmentIndex', 'mesh', 'cumsum');
CREATE TYPE webknossos.ATTACHMENT_DATAFORMAT AS ENUM ('hdf5', 'zarr3', 'json');
CREATE TABLE webknossos.dataset_layer_attachments(
_dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
layerName TEXT NOT NULL,
path TEXT NOT NULL,
type webknossos.ATTACHMENT_FILE_TYPE NOT NULL,
dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add primary key and foreign key constraints to dataset_layer_attachments
Currently, the new table lacks a primary key and foreign key relationships, which can lead to duplicate rows and orphaned attachments when datasets or layers are removed. Please define a composite primary key and enforce referential integrity.

Proposed diff:

 CREATE TABLE webknossos.dataset_layer_attachments(
   _dataset TEXT CONSTRAINT _dataset_objectId CHECK (_dataset ~ '^[0-9a-f]{24}$') NOT NULL,
   layerName TEXT NOT NULL,
   path TEXT NOT NULL,
-  type webknossos.ATTACHMENT_FILE_TYPE NOT NULL,
-  dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL
+  type webknossos.ATTACHMENT_FILE_TYPE NOT NULL,
+  dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL,
+  PRIMARY KEY (_dataset, layerName, path),
+  FOREIGN KEY (_dataset, layerName)
+    REFERENCES webknossos.dataset_layers(_dataset, name)
+    ON DELETE CASCADE
 );
+CREATE INDEX ON webknossos.dataset_layer_attachments(_dataset);

This ensures uniqueness, accelerates queries by dataset, and cascades deletions when a layer is dropped.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/postgres/schema.sql between lines 165 and 173, the
dataset_layer_attachments table lacks primary key and foreign key constraints,
which can cause duplicate entries and orphaned records. Add a composite primary
key, likely on _dataset and layerName, and define foreign key constraints
referencing the datasets and layers tables to enforce referential integrity and
enable cascading deletes. This will improve data consistency and query
performance.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, works for me! I added some suggestions to the code.

I wonder if we should really store absolute paths for the attachments or rather paths relative to the dataset. I’m also unsure what exactly the plan is for how to use this info later on when requesting data from these files. Should the full path be exposed to the frontend and then passed to the requests (e.g. for requesting data with an agglomerate mapping applied)? Or should we introduce some kind of unique name key to pass around as an identifier? Maybe @normanrz can comment on these questions or maybe you already talked about that.

Also, the scanning currently works only for files with .json or .hdf5 endings respectively. Maybe we should support that for zarr directories too (or at least have a plan for how to do this once #8633 is complete).


do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 133, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql;

CREATE TYPE webknossos.ATTACHMENT_FILE_TYPE AS ENUM ('agglomerate', 'connectome', 'segmentIndex', 'mesh', 'cumsum');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming could be more consistent. We now have attachment, layer_attachment, AttachedFile and here attachment_file_type.

How about

table webknossos.dataset_layer_attachments, case class LayerAttachment, field attachments in Layer, sql types LAYER_ATTACHMENT_TYPE, LAYER_ATTACHMENT_DATAFORMAT with matching scala enums?

?

path TEXT NOT NULL,
type webknossos.ATTACHMENT_FILE_TYPE NOT NULL,
dataFormat webknossos.ATTACHMENT_DATAFORMAT NOT NULL
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a primary key relation for dataset, layerName and path combined?

Also, dataset and layerName should possibly be foreign keys to the respective tables (we have the foreign key constraints way down in the schema.sql file)

Comment on lines +338 to +339
val agglomerateFiles =
(newAttachments.agglomerates ++ existingFiles.agglomerates).distinct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to remove existing attachments? I’m unsure what we want here, maybe this is fine for the moment.

None
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe scanForSingleFile could use scanForFiles and then use headOption, to save some code duplication? Or would we expect a performance impact from that?

object AgglomerateFileInfo {

val directoryName = "agglomerates"
private val scanExtension = "hdf5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soon we will have zarr3 agglomerate “files” #8633 and after that also zarr3 meshfiles, segment stats, connectome files. They don’t necessarily have an extension at all, since they are directories. I wonder how we would express this here? Should everything that has no extension be assumed to be zarr3? Should we search for a zarr.json in the directory? Should we introduce the convention that these directories must be named *.zarr?
Alternatively, we could say there is no scanning for those and they have to be registered in the datasource-properties.json (or later the db directly).
cc @normanrz

@normanrz
Copy link
Member

I wonder if we should really store absolute paths for the attachments or rather paths relative to the dataset. I’m also unsure what exactly the plan is for how to use this info later on when requesting data from these files. Should the full path be exposed to the frontend and then passed to the requests (e.g. for requesting data with an agglomerate mapping applied)? Or should we introduce some kind of unique name key to pass around as an identifier? Maybe @normanrz can comment on these questions or maybe you already talked about that.

That is a good question. We should not expose the paths to the users. Especially with S3 paths, we really don't want them to leave the backend. I think we should add a name field to each attachment object. Alternatively, we could use a hash of the path.
Otherwise, absolute paths are fine.

Also, the scanning currently works only for files with .json or .hdf5 endings respectively. Maybe we should support that for zarr directories too (or at least have a plan for how to do this once #8633 is complete).

I don't think we need to implement scanning for Zarr-based attachments. We can register them in the json from day 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy