regex - python regular expression with utf8 issue

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

regex - python regular expression with utf8 issue

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I got a file which includes many lines of plain utf-8 text. Such as below, by the by, it's Chinese.

PROCESS：类型：关爱积分[NOTIFY]   交易号：2012022900000109   订单号：W12022910079166    交易金额：0.01元    交易状态：true 2012-2-29 10:13:08

The file itself was saved in utf-8 format. file name is xx.txt

here is my python code, env is python2.7

#coding: utf-8
import re
pattern = re.compile(r'交易金额：(d+)元')
for line in open('xx.txt'):
    match = pattern.match(line.decode('utf-8'))
    if match:
        print match.group()

The problematic thing here is I got no results.

I wanna get the decimal string from 交易金额：0.01元, in here, which is 0.01.

Why doesn't this code work? Can anyone explain it to me, I got no clue whatsoever.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

656 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:58:55+0000

There are several issues with your code. First you should use re.compile(ur'<unicode string>'). Also it is nice to add re.UNICODE flag (not sure if really needed here though). Next one is that still you will not receive a match since d+ doesn't handle decimals just a series of numbers, you should use d+.?d+ instead (you want number, probably a dot and a number). Example code:

#coding: utf-8

text = u"PROCESS：类型：关爱积分[NOTIFY]   交易号：2012022900000109   订单号：W12022910079166    交易金额：0.01元    交易状态：true 2012-2-29 10:13:08"
import re
pattern = re.compile(ur'交易金额：(d+.?d+)元', re.UNICODE)

print pattern.search(text).group(1)

Categories

regex - python regular expression with utf8 issue

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags