BUG: `vectorize` truncates string outputs to 1 character, even with explicitly-specified `otypes` #23442

cgobat · 2023-03-23T20:53:27Z

Describe the issue:

When a user creates a vectorize object from a function that returns a string as its output, and specifies the function's output type(s) using the otypes argument, string typecode specifiers (e.g. "U10" for a 10-character string) of any length cause the returned strings to be truncated to 1 character (i.e. np.dtype("<U1"). The same things happens with bytes (typecode "S", with any length specified). In order to make it work, one must either not specify any otypes (omit the argument), or use "O" to get a generic object dtype.

It seems this issue is possibly related to #2485 and/or StackOverflow: How to explicitly specify the output's string length in numpy.vectorize, but I can't say for sure. It seems odd that otypes ignores explicit length declarations.

Reproduce the code example:

import numpy as np

def make_10char_str(n: int) -> str:
    """Returns a string version of the input integer, with spaces to the
       right to left-justify it and pad the string out to 10 characters"""
    return f"{n:<10d}"

vector_str_func = np.vectorize(make_str_from_number, signature="()->()", otypes=["<U10"]) # "<U10" should correspond to a 10-character unicode str

print(vector_str_func([1, 24, 365, 4096])) # expected output is array([['1         ', '24        ', '365       ', '4096      '], dtype='<U10')

Output:

array(['1', '2', '3', '4'], dtype='<U1')

Runtime information:

np.__version__ is 1.24.2. sys.version is

3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:55) 
[GCC 11.3.0]

Output of np.show_runtime() is:

[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'AVX2'],
                      'not_found': ['F16C',
                                    'FMA3',
                                    'AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/cgobat/miniconda3/envs/.../lib/libopenblasp-r0.3.21.so',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.21'}]

Context for the issue:

This issue can cause problems because if users want to specify their function's otypes explicitly, they are forced to use "O", which other operations that expect to see string dtype outputs (rather than np.object) may not be able to handle without additional processing.

The text was updated successfully, but these errors were encountered:

WarrenWeckesser · 2023-03-24T01:22:15Z

Thanks for reporting the issue, @cgobat.

Edit: I removed my previous misguided comment.

The size of any string types specified in otypes are always ignored by vectorize. The otypes argument is converted to a sequence of single character type codes. For example,

In [15]: def foo(a, b):
    ...:     pass
    ...: 

In [16]: vfoo = np.vectorize(foo, otypes=['<U16', '<U32'])

In [17]: vfoo.otypes
Out[17]: 'UU'

The lengths 16 and 32 have been discarded, and only the type code U is saved.

The actual output size of the strings will depend on the code path taken internally. If signature is not specified, the length of the output string type will be the maximum of the lengths of the computed values, e.g.

In [35]: vstr = np.vectorize(str, otypes=['U32'])  # The '32' is ignored.

In [36]: vstr.otypes
Out[36]: 'U'

In [37]: vstr(123)
Out[37]: array('123', dtype='<U3')

In [38]: vstr([[123, 99999],[-1, 0]])
Out[38]: 
array([['123', '99999'],
       ['-1', '0']], dtype='<U5')

If signature is given, a different code path is followed internally, and the output length is always 1:

In [40]: vstr = np.vectorize(str, signature='()->()', otypes=['U32'])

In [41]: vstr.otypes
Out[41]: 'U'

In [42]: vstr(123)
Out[42]: array('1', dtype='<U1')

In [43]: vstr([[123, 99999],[-1, 0]])
Out[43]: 
array([['1', '9'],
       ['-', '0']], dtype='<U1')

cgobat · 2023-03-24T15:29:40Z

Thanks for looking into this, @WarrenWeckesser. Any ideas on how to proceed towards a resolution? Is a documentation update called for in the meantime?

lsaavedr · 2024-01-10T04:11:12Z

is there some news about that?

ganesh-k13 · 2025-05-13T05:26:35Z

The issue arises when signature is given as this particular path is chosen:

https://github.com/numpy/numpy/blob/c458e69d8794d4d25549761e12b40bfeafa1a4e3/numpy/lib/_function_base_impl.py#L2257C24-L2259

As part of this, np.empty_like is creating the result array that is <U1:

> /Users/gakathir/Documents/os/numpy/build-install/usr/lib/python3.13/site-packages/numpy/lib/_function_base_impl.py(2261)_create_arrays()
-> return arrays
(Pdb) p arrays
(array('', dtype='<U1'),)
(Pdb) for a, b, c in zip(results, shapes, dtypes): print(a,b,c)
123 () U
(Pdb) p np.empty_like(a, shape=b, dtype=c)
array('', dtype='<U1')
(Pdb)

[EDIT]

Coupled with this change #26270, the size of U is dropped which seems correct. Hence, we need to fix _create_arrays function for this

cgobat added the 00 - Bug label Mar 23, 2023

cgobat changed the title ~~BUG: vectorize truncates string outputs to 1 character, even with explicitly-specified otypes~~ BUG: vectorize truncates string outputs to 1 character, even with explicitly-specified otypes Mar 23, 2023

WarrenWeckesser added the component: numpy.lib label Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: `vectorize` truncates string outputs to 1 character, even with explicitly-specified `otypes` #23442

BUG: `vectorize` truncates string outputs to 1 character, even with explicitly-specified `otypes` #23442

cgobat commented Mar 23, 2023 •

edited

Loading

WarrenWeckesser commented Mar 24, 2023 •

edited

Loading

Uh oh!

cgobat commented Mar 24, 2023

Uh oh!

lsaavedr commented Jan 10, 2024

Uh oh!

ganesh-k13 commented May 13, 2025 •

edited

Loading

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Uh oh!

BUG: vectorize truncates string outputs to 1 character, even with explicitly-specified otypes #23442

BUG: vectorize truncates string outputs to 1 character, even with explicitly-specified otypes #23442

Comments

cgobat commented Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue:

Reproduce the code example:

Runtime information:

Context for the issue:

WarrenWeckesser commented Mar 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgobat commented Mar 24, 2023

Uh oh!

lsaavedr commented Jan 10, 2024

Uh oh!

ganesh-k13 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

BUG: `vectorize` truncates string outputs to 1 character, even with explicitly-specified `otypes` #23442

BUG: `vectorize` truncates string outputs to 1 character, even with explicitly-specified `otypes` #23442

cgobat commented Mar 23, 2023 •

edited

Loading

WarrenWeckesser commented Mar 24, 2023 •

edited

Loading

ganesh-k13 commented May 13, 2025 •

edited

Loading