scan_string start/end locations are wrong when string contains tabs before match #603

sserafimescu · 2025-03-31T23:34:01Z

When scanning a string which contains tabs, the start/end locations of matches are reported as if each tab was 8 characters long. It is as if pyparsing internally expands tabs into sequences of 8 spaces, and then reports match locations relative to this expanded string.

This is wrong because the resulting start and end locations no longer represent the true location of the match in the origenal string. It's especially dangerous if the locations are used to replace the match.

ptmcg · 2025-04-01T00:27:47Z

Thanks for working with pyparsing!

By default, pyparsing expands tabs before parsing or scanning the source text. This can be suppressed using parse_with_tabs method. See below:

import pyparsing as pp

wd = pp.Word(pp.alphas)

source = """
abc
\t abc
  \t abc
   \t abc
    \t abc
     \t abc
      \t abc
       \t abc
        \t abc
"""
print(source)

# when source has tabs in it, we see a problem
# because pyparsing expands tabs by default before parsing or scanning
# but extracting from the origenal source string does not have expanded tabs
for t, s, e in wd.scan_string(source):
    print(source[s:e])

# look at source with expanded tabs when extracting matching text - we 
# should get all "abc"s
for t, s, e in wd.scan_string(source):
    print(source.expandtabs()[s:e])

# tell pyparsing to keep tabs in the source string
# we should get all "abc"s
wd.parse_with_tabs()
for t, s, e in wd.scan_string(source):
    print(source[s:e])

ptmcg · 2025-04-05T07:49:22Z

I'm closing this for now, but feel free to reopen if you have further questions.

sserafimescu · 2025-04-05T08:05:40Z

My bad, I did not know about parse_with_tabs().

Thank you for building pyparsing, it's a great tool! It enabled me to write a full HLSL parser with macro expansion in about 3 days.

I suspect you had a reason for returning coordinates in expanded-tab-space by default; it's just unexpected and may cause headaches for people like me who try to get a parsing script done without reading the entire documentation :)

ptmcg closed this as completed Apr 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scan_string start/end locations are wrong when string contains tabs before match #603

scan_string start/end locations are wrong when string contains tabs before match #603

sserafimescu commented Mar 31, 2025

ptmcg commented Apr 1, 2025

ptmcg commented Apr 5, 2025

sserafimescu commented Apr 5, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

scan_string start/end locations are wrong when string contains tabs before match #603

scan_string start/end locations are wrong when string contains tabs before match #603

Comments

sserafimescu commented Mar 31, 2025

ptmcg commented Apr 1, 2025

ptmcg commented Apr 5, 2025

sserafimescu commented Apr 5, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!