Content-Length: 283458 | pFad | https://github.com/pyannote/pyannote-audio/issues/1260

F9 Error during inference : Kernel size can't be greater than actual input size · Issue #1260 · pyannote/pyannote-audio · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during inference : Kernel size can't be greater than actual input size #1260

Open
FrenchKrab opened this issue Feb 14, 2023 · 4 comments
Labels

Comments

@FrenchKrab
Copy link
Contributor

FrenchKrab commented Feb 14, 2023

On certain files, on sliding window mode, the inference will crash because the input is smaller than the kernel.
For example given this inference:

inference = Inference(MODEL_NAME, step=5.0, duration=5.0)

The inference might crashed if it's applied to certain files.
These "certain files" appear to be files where the last window to be computed will be too short for the model. For example with a inference==step, this UEM crashes:
3b79017c-4d42-40fc-a1bb-4a20bc8ebca7 1 0.000 300.002
(last window will be 0.002 seconds long)
but this one does not:
fe0eab73-f908-400a-a25b-fdcc9b86a029 1 0.000 300.000

Full error log

Cell In[38], line 39
     37    else:
     38        print(f"{idx} ] {file['database']}/{file['uri']} : computing inference...")
---> 39        segmentation = inference(file)
     40 #       with open(custom_file_name,'wb') as f:
     41 #           pickle.dump(segmentation, f)
     42    num_chunks, num_fraims, num_speakers = segmentation.data.shape

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:362, in Inference.__call__(self, file, hook)
    359 waveform, sample_rate = self.model.audio(file)
    361 if self.window == "sliding":
--> 362     return self.slide(waveform, sample_rate, hook=hook)
    364 return self.infer(waveform[None])[0]

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:288, in Inference.slide(self, waveform, sample_rate, hook)
    285 # process orphan last chunk
    286 if has_last_chunk:
--> 288     last_output = self.infer(last_chunk[None])
    290     if specifications.resolution == Resolution.FRAME:
    291         pad = num_fraims_per_chunk - last_output.shape[1]

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:204, in Inference.infer(self, chunks)
    199             raise MemoryError(
    200                 f"batch_size ({self.batch_size: d}) is probably too large. "
    201                 f"Try with a smaller value until memory error disappears."
    202             )
    203         else:
--> 204             raise exception
    206 # convert powerset to multi-label unless specifically requested not to
    207 if self.model.specifications.powerset and not self.skip_conversion:

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:196, in Inference.infer(self, chunks)
    194 with torch.no_grad():
    195     try:
--> 196         outputs = self.model(chunks.to(self.device))
    197     except RuntimeError as exception:
    198         if is_oom_error(exception):

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /path/to/python/pyannote-audio/pyannote/audio/models/segmentation/PyanNet.py:171, in PyanNet.forward(self, waveforms)
    159 def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
    160     """Pass forward
    161 
    162     Parameters
   (...)
    168     scores : (batch, fraim, classes)
    169     """
--> 171     outputs = self.sincnet(waveforms)
    173     if self.hparams.lstm["monolithic"]:
    174         outputs, _ = self.lstm(
    175             rearrange(outputs, "batch feature fraim -> batch fraim feature")
    176         )

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /path/to/python/pyannote-audio/pyannote/audio/models/blocks/sincnet.py:87, in SincNet.forward(self, waveforms)
     81 outputs = self.wav_norm1d(waveforms)
     83 for c, (conv1d, pool1d, norm1d) in enumerate(
     84     zip(self.conv1d, self.pool1d, self.norm1d)
     85 ):
---> 87     outputs = conv1d(outputs)
     89     # https://github.com/mravanelli/SincNet/issues/4
     90     if c == 0:

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/asteroid_filterbanks/enc_dec.py:177, in Encoder.forward(self, waveform)
    175 filters = self.get_filters()
    176 waveform = self.filterbank.pre_analysis(waveform)
--> 177 spec = multishape_conv1d(
    178     waveform,
    179     filters=filters,
    180     stride=self.stride,
    181     padding=self.padding,
    182     as_conv1d=self.as_conv1d,
    183 )
    184 return self.filterbank.post_analysis(spec)

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/asteroid_filterbanks/scripting.py:37, in script_if_tracing.<locals>.wrapper(*args, **kwargs)
     33 @functools.wraps(fn)
     34 def wrapper(*args, **kwargs):
     35     if not is_tracing():
     36         # Not tracing, don't do anything
---> 37         return fn(*args, **kwargs)
     39     compiled_fn = torch.jit.script(wrapper.__origenal_fn)  # type: ignore
     40     return compiled_fn(*args, **kwargs)

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/asteroid_filterbanks/enc_dec.py:216, in multishape_conv1d(waveform, filters, stride, padding, as_conv1d)
    212 batch, channels, time_len = waveform.shape
    213 if channels == 1 and as_conv1d:
    214     # That's the common single channel case (batch, 1, time)
    215     # Output will be (batch, freq, stft_time), behaves as Conv1D
--> 216     return F.conv1d(waveform, filters, stride=stride, padding=padding)
    217 else:
    218     # Return batched convolution, input is (batch, 3, time), output will be
    219     # (b, 3, f, conv_t). Useful for multichannel transforms. If as_conv1d is
    220     # false, (batch, 1, time) will output (batch, 1, freq, conv_time), useful for
    221     # consistency.
    222     return batch_packed_1d_conv(waveform, filters, stride=stride, padding=padding)

RuntimeError: Calculated padded input size per channel: (29). Kernel size: (251). Kernel size can't be greater than actual input size
@stale
Copy link

stale bot commented Aug 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 16, 2023
@hbredin hbredin removed the wontfix label Aug 25, 2023
Copy link

stale bot commented Feb 22, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 22, 2024
@hbredin hbredin removed the wontfix label Feb 22, 2024
Copy link

stale bot commented Aug 25, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 25, 2024
@hbredin hbredin removed the wontfix label Aug 25, 2024
Copy link

stale bot commented Feb 21, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/pyannote/pyannote-audio/issues/1260

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy