Flex & utf8
“Once upon a time, it seems, last Friday,” I needed a lexical analyzer that could work with Unicode data.
The builder of the lexical analyzer wanted to have Flex , and this turned out to be a whole problem.
Flex itself does not know how to work with Unicode data. when constructing an automaton, it is assumed that the characters are 7 or 8 bit.
I met flex-2.5.4a-unicode-patch , but only for 16-bit characters and a specific version with all that it implies.
Meanwhile, there is a simple and quite workable solution that does not requiredirty hands to climb into the holy of holies rebuilding tools.
Announce
The builder of the lexical analyzer wanted to have Flex , and this turned out to be a whole problem.
Flex itself does not know how to work with Unicode data. when constructing an automaton, it is assumed that the characters are 7 or 8 bit.
I met flex-2.5.4a-unicode-patch , but only for 16-bit characters and a specific version with all that it implies.
Meanwhile, there is a simple and quite workable solution that does not require
Announce
%option 8bit
%option c++
...
alpha [A-Za-z]
U1 [\x80-\xbf]
U2 [\xc2-\xdf]
U3 [\xe0-\xef]
U4 [\xf0-\xf4]
ualpha {alpha}|{U2}{U1}|{U3}{U1}{U1}|{U4}{U1}{U1}{U1}
uname ({ualpha}|\_)*
...
and voilà ... can be used.%%
...
{uname} {
...
yylval.str_ = std::string(yytext);
return XyzParser::ttName;
}