Filtering nodes do not support multi-language character sets

CloverDX Customer Portal
Forum
Filtering nodes do not support multi-language character sets

hwhwhw

January 04, 2008 00:00

Answered

Filtering nodes do not support regular expressions are intended to characters in the \ w \ s, and so on, does not support multi-language character sets.

eg.
$Field1 ~= "^[0-9]\w.*"
substring($Field2,0,2) == "黄威"

Can support?

Comments 9

dpavlis

January 06, 2008 11:26
0

Comment actions Permalink
Can you try to use unicode escape sequence in place of the characters (both for the regex and the substring) ?

Also, I am not sure what problem are you describing - is it that 黄威 are not recognized as \w ?

Sorry, I am not familiar with Asian alphabets and need a hint here.
hwhwhw

January 07, 2008 07:50
0

Comment actions Permalink
table structure:
create table t1 (f1 varchar(50), f2 varchar(50));

record content:
黄威 20071976北京
huangwei 20071976beijing

extFilter node expression:
$f2 ~= '^[0-9]{8}[a-z]*'

outPort (0) output record
huangwei 20071976beijing

outPort (1) output record
黄威 20071976北京
----------------------------------------------------------
I want outPort (0) to output content below
黄威 20071976北京
----------------------------------------------------------

extFilter node expression:
$f2 ~= '^[0-9]{8}\p{InHanzi}*'
output error info:
ERROR [WatchDog] - EXT_FILTER_0 ...FAILED !
Parser error when parsing expression: Encountered "\'^[0-9]{8}" at line 1, column

8.
Was expecting:
<STRING_LITERAL> ...

extFilter node expression:
substring($f3,8,2)=='北京'

outPort (0) output record 0
outPort (1) output record 2
黄威 20071976北京
huangwei 20071976beijing
dpavlis

January 07, 2008 08:46
0

Comment actions Permalink
If you use \ (backslash) in your regex string in transform language, you have to escape it - like this:
```
$f2 ~= '^[0-9]{8}\\\\p{InHanzi}*'
```
The reason why is that the backslash gets preprocessed twice - first when the expression is read from XML and \\ is preprocess to \ and then again the TL language parser preprocesses \\ to \ - then it gets to Java's regex evaluator.

We will try to fix this nuisance (in 2.3.x and earlier) in next release of Clover.

I will check the rest of the problem too, but check the updated expression above.
hwhwhw

January 07, 2008 09:19
0

Comment actions Permalink
ERROR [WatchDog] - EXT_FILTER_0 ...FAILED !
Error when parsing expression: Illegal repetition near index 11
^[0-9]{8}\\p{InHanzi}*

--------------------------------

substring($f2,8,2)=='北京'

Substring function Why not support the "北京"?
dpavlis

January 07, 2008 09:30
0

Comment actions Permalink
Well,interesting problem with the regex... I will see to it ..

As for the substring - try to use unicode escape (\uxxxx) in place of the two chars - you will have to find their unicode numbers.
avackova

January 07, 2008 12:03
0

Comment actions Permalink
I've found that such regex does not throw an exception:
"^[0-9]{8}[\\p{InHanzi}]*"
hwhwhw

January 08, 2008 02:47
0

Comment actions Permalink
Thank you for your response,Substring function issue has been resolved
dpavlis

January 08, 2008 09:35
0

Comment actions Permalink
Cool,
can I ask you how did you solve it ?
hwhwhw

January 09, 2008 00:48
0

Comment actions Permalink
Solutions to the inconvenient, the process is this.

1)
D:\javasoft\Jdk1.5.0_04\bin>native2ascii
北京
\u5317\u4eac

2)
extFilter node expression:
substring($f2,8,2)=='\u5317\u4eac'

====================================
extFilter node expression:
$f2 ~= '^[0-9]{8}[\u4e00-\u9fa5]*'

[\u4e00-\u9fa5] On behalf of the Asian Regional Character Set,This realization is some trouble

Please sign in to leave a comment.

Quick links

Access my products

SUPPORT & SERVICES

Community

RESOURCES