python wrong file format cause unpack error -
when use python code below deal right.txt
, wrong.txt
, though totally same, wrong.txt
cannot run. indent problem?
my code here:
import re if __name__ == '__main__': open('wrong.txt') fin: text = fin.read() l = [p p in text.split('\nsentence #') if p] p in l: lines, deps = tuple(p.split('\n\n')[:2])
right.txt:
sentence #1 (33 tokens): introduction. [text=. characteroffsetbegin=208 characteroffsetend=209 partofspeech=. lemma=.] (root (. .))) root(root-0, stored-18)
wrong.txt:
sentence #1 (33 tokens): introduction. [text=. characteroffsetbegin=208 characteroffsetend=209 partofspeech=. lemma=.] (root (. .))) root(root-0, stored-18)
i had compare of 2 txt files, , found (4111)differeces line endings (new line). right.txt uses (0x0a, '\n'); wrong.txt uses (0x0d0a, '\r\n').
with above in mind, code this:
import re if __name__ == '__main__': open('wrong.txt') fin: text = fin.read() ending = '\r\n' if '\r\n' in text else '\n' l = [p p in text.split( ending + 'sentence #') if p] p in l: lines, deps = tuple(p.split( ending * 2 )[:2])
Comments
Post a Comment