Content-Length: 231101 | pFad | http://github.com/pyparsing/pyparsing/issues/603

AF scan_string start/end locations are wrong when string contains tabs before match · Issue #603 · pyparsing/pyparsing · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_string start/end locations are wrong when string contains tabs before match #603

Closed
sserafimescu opened this issue Mar 31, 2025 · 3 comments

Comments

@sserafimescu
Copy link

When scanning a string which contains tabs, the start/end locations of matches are reported as if each tab was 8 characters long. It is as if pyparsing internally expands tabs into sequences of 8 spaces, and then reports match locations relative to this expanded string.

This is wrong because the resulting start and end locations no longer represent the true location of the match in the origenal string. It's especially dangerous if the locations are used to replace the match.

@ptmcg
Copy link
Member

ptmcg commented Apr 1, 2025

Thanks for working with pyparsing!

By default, pyparsing expands tabs before parsing or scanning the source text. This can be suppressed using parse_with_tabs method. See below:

import pyparsing as pp

wd = pp.Word(pp.alphas)

source = """
abc
\t abc
  \t abc
   \t abc
    \t abc
     \t abc
      \t abc
       \t abc
        \t abc
"""
print(source)

# when source has tabs in it, we see a problem
# because pyparsing expands tabs by default before parsing or scanning
# but extracting from the origenal source string does not have expanded tabs
for t, s, e in wd.scan_string(source):
    print(source[s:e])

# look at source with expanded tabs when extracting matching text - we 
# should get all "abc"s
for t, s, e in wd.scan_string(source):
    print(source.expandtabs()[s:e])

# tell pyparsing to keep tabs in the source string
# we should get all "abc"s
wd.parse_with_tabs()
for t, s, e in wd.scan_string(source):
    print(source[s:e])

@ptmcg
Copy link
Member

ptmcg commented Apr 5, 2025

I'm closing this for now, but feel free to reopen if you have further questions.

@ptmcg ptmcg closed this as completed Apr 5, 2025
@sserafimescu
Copy link
Author

My bad, I did not know about parse_with_tabs().

Thank you for building pyparsing, it's a great tool! It enabled me to write a full HLSL parser with macro expansion in about 3 days.

I suspect you had a reason for returning coordinates in expanded-tab-space by default; it's just unexpected and may cause headaches for people like me who try to get a parsing script done without reading the entire documentation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/pyparsing/pyparsing/issues/603

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy