Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 configuration parse errors - exposed after #5767 #5840

Closed
petrovr opened this issue Apr 2, 2018 · 10 comments
Closed

UTF-8 configuration parse errors - exposed after #5767 #5840

petrovr opened this issue Apr 2, 2018 · 10 comments

Comments

@petrovr
Copy link

petrovr commented Apr 2, 2018

e_nss regression test start to fail with master (last commit Documentation typo fix in EVP_EncryptInit.pod - 1238caa )

git bisect point to commit 9256510 with following comment:
"...After this commit the various IS_() macros in the auto-generated file
conf_def.h may incorrectly return true if the supplied character has its
most significant bit set. The IS_
() macros should be able to correctly
handle 8-bit characters. Note that UTF-8 support is not a requirement...."

e_nss regression test creates X.509 certificates with cyrillic and greek letters in distinguished name.
With above commit error is:
Country Name (2 letter code) [XX]:State or Province Name (full name) [World]:Locality Name (eg, city) [Somewhere cyrillic-АБВ-Яабв-я greek-ΑΒΓ-Ωαβγ-<<<>>>> :problems making Certificate Request
140215077046016:error:0D07A086:asn1 encoding routines:ASN1_mbstring_ncopy:invalid utf8string:crypto/asn1/a_mbstr.c:85:

<<<>>>> as is visible from https://gitlab.com/e_nss/e_nss/blob/master/tests/ca/catest.config#L50 is small greek letter omega.

I can not understand what is relation between commit and above failure.
For instance after replacement of "αβγ-ω" with "ω-αβγ" certificate is created!
Perhaps before CONF_HIGHBIT mask another defect.

@richsalz
Copy link
Contributor

richsalz commented Apr 2, 2018

This is not surprising. We had to remove some code because we could not track down the author and get them to agree to changing the license. We need a new implementation.

@mspncp
Copy link
Contributor

mspncp commented Apr 2, 2018

You can revert that commit for yourself, until someone comes up with a fix.

@petrovr
Copy link
Author

petrovr commented Apr 2, 2018

I will change title s/regression/exposed/

For some reasons config parser fail if last character is greek small omega
It could be reproduced within openssl build tree:
(a) go to buid tree top
(b) open test/CAssrsa.cnf and set organizationName_value to ω, i.e. s/Hermanos Locos/ω/
(c) run command: ./util/opensslwrap.sh req -new -utf8 -config test/CAssrsa.cnf -keyform PEM -key test/certs/root-key.pem

Command pass with values like "ωψ" (omega psi), i..e omega is not last character.
Command fail with more then one omega for instance "ωωωωω" - last character cannot be read.
Unicode is of omega is 03c9 and a number of tests shows the same error if unicode is ??c9.

@petrovr petrovr changed the title UTF-8 configuration parse errors - regression after #5767 UTF-8 configuration parse errors - exposed after #5767 Apr 2, 2018
@mspncp
Copy link
Contributor

mspncp commented Apr 2, 2018

@petrovr, I can have a look at this in the next days if it‘s not urgent (I‘m on vacation). Thanks for the hint about how to reproduce the error without e_nss.

mspncp added a commit to mspncp/openssl that referenced this issue Apr 2, 2018
Fixes openssl#5778, openssl#5840

The various IS_*() macros did not work correctly for 8-bit ASCII
characters with the high bit set, because the CVT(a) preprocessor
macro and'ed the given ASCII value with 0x7F, effectively folding
the high value range 128-255 over the low value range 0-127.
As a consequence, some of the IS_*() erroneously returned TRUE.

This commit fixes the issue by mapping CVT(a) to 127 for all values
a >= 127. This works because the lookup tables CONF_type_default
and CONF_type_win32 have all bits cleared at entry 127, whence
IS_*(a) returns FALSE for all values outside the range 0-127.

The IS_*() macros were also changed to return TRUE or FALSE (1 or 0)
instead of a nonzero or zero value.

Note that with this change, the macro IS_*(c,a) evaluates the 'a'
argument twice, unless CHARSET_EBCDIC is defined. This prohibits the
use of arguments with side effects like IS_*(c, *p++).
@mspncp
Copy link
Contributor

mspncp commented Apr 2, 2018

Roumen, could you please be so kind and check whether #5844 fixes the e_nss test? I already verified that it fixes your openssl example:

msp@msppc:~/src/openssl$ util/opensslwrap.sh req -new -utf8 -config test/CAssrsa.cnf -keyform PEM -key test/certs/root-key.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [ES]:ES
Organization Name (eg, company) []:ω
Common Name (eg, YOUR name) []:Hermanos Locos CA
-----BEGIN CERTIFICATE REQUEST-----
MIICezCCAWMCAQAwNjELMAkGA1UEBhMCRVMxCzAJBgNVBAoMAs+JMRowGAYDVQQD
DBFIZXJtYW5vcyBMb2NvcyBDQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC
ggEBAOHmAPUGvKBGOHkPPx5xGRNtAt8rm3Zr/KywIe3WkQhCO6VjNexSW6CiSsXW
AJQDl1o9uWco0n3jIVyk7cY8jY6E0Z1Uwz3ZdKKWdmdx+cYaUHez/XjuW+DjjIkj
wpoi7D7UN54HzcArVREXOjRCHGkNOhiw7RWUXsb9nofGHOeUGpLAXwXBc0PlA94J
kckkztiOi34u4DFI0YYqalUmeugLNk6XseCkydpcaUsDgAhWg6Mfsiq4wUz+xbFN
1MABqu2+ziW97mmt9gfNbiuhiVT1aOuYCe3JYGbLM2JKA7Bo1g6rX8E1VX79Ru66
69y2oqPthX9337VoIkN+ZiQjr8UCAwEAAaAAMA0GCSqGSIb3DQEBCwUAA4IBAQAE
abipofYXHvrTs+dX1qGGY6BN5tmIgO+jkpHRP3u7e5qZBpfnSY1ciluQ2L/b4sTf
btMWaahb2dF+lSUrghAIj/Gz++bsYf0QRQw1wCbXAZqV8S5D15ylyhB3sAK4umNH
V7gO0/ojdUVKXIk6ydvtcnmC3fP1iIHMtGDtXySLG8AN2yY/Lm6QKfkj7JHyJP7D
aAUFFq3Hip/nrqc8/EIIwB3RPiRZhwXPVq+rNCBgjbLtGadV229AYh4Vfq/XR2cR
mKTE+cRtJeGvvclNiVGxp89HqwqpyAxudzs8ACCmZn+mNtZON9Ulm6f+XgXlEDHm
F2DNvtspS5tMzgVMnitz
-----END CERTIFICATE REQUEST-----

@mspncp
Copy link
Contributor

mspncp commented Apr 2, 2018

I can not understand what is relation between commit and above failure.

I did not undertake the effort to find out how exactly the false positives of the IS_*() macros caused your reported issue. I simply fixed the macros and then the certificate request succeeded. So I can't tell you why the request fails in particular for unicode characters of the form ??c9. If you are curious and manage to find an explanation, I'll be interested to hear about it.

@petrovr
Copy link
Author

petrovr commented Apr 4, 2018

So issue was discovered with omega as last character in line in UTF-8 encoded file. Line is in format key = value. On such line is expected spaces around '=' to be removed.
Omega is encoded in UTF-8 as 0xcf 0x89. Output of command shows byte 0xcf, i.e. 0x89 is removed.
So code removes trailing byte.
Let review CONF load method - function def_load_bio(). After error case CONF_R_MISSING_EQUAL_SIGN is expected processing of "value" part. As this point we could see 2 lines:
start = eat_ws(conf, p);
trim_ws(conf, start);
First look like left trim so lets review directly trim_ws: more pointer to the right if !IS_EOF and return back if IS_WS. Error shows that 0x89 is considered as space.
Review of IS_ macros lead to CVT(a) ((a) & 0x7F) Oops!
0x89 & 0x7F is 0x09 (TAB) , i.e. space. Now is easy to construct more failures. keyset.pl consider " \t\r\n" as whitespaces, so failure should be for all characters that in UTF-8 encoding ends with:

  • 0xA0 from space (space & 0x80)
  • 0x89 from tab
  • 0x8D from carriage return
  • 0x8A from line feed

Now I understand what was reason of CONF_HIGHBIT - inefficient way to prevent IS_* macros to return non-zero result outside 7-bit ASCII code.

@mspncp
Copy link
Contributor

mspncp commented Apr 4, 2018

Thanks for the analysis, @petrovr. Were you also able to verify that #5844 fixes the e_nss test?

@mspncp
Copy link
Contributor

mspncp commented Apr 4, 2018

Now I finally appreciate how wise it was from the UTF-8 designers to ensure that 7-bit bytes (bytes where the most significant bit is 0) never appear in a multi-byte sequence. Otherwise we would get plagued by false positives of the IS_*() macros all the time.

mspncp added a commit to mspncp/openssl that referenced this issue Apr 7, 2018
Fixes openssl#5778, openssl#5840

The various IS_*() macros did not work correctly for 8-bit ASCII
characters with the high bit set, because the CVT(a) preprocessor
macro and'ed the given ASCII value with 0x7F, effectively folding
the high value range 128-255 over the low value range 0-127.
As a consequence, some of the IS_*() erroneously returned TRUE.

This commit fixes the issue by adding range checks instead of
cutting off high order bits using a mask. In order avoid multiple
evaluation of macro arguments, most of the implementation was moved
from macros into a static function is_keytype().

Thanks to Румен Петров for reporting and analyzing the UTF-8 parsing
issue openssl#5840.
levitte pushed a commit that referenced this issue Apr 8, 2018
Fixes #5778, #5840

The various IS_*() macros did not work correctly for 8-bit ASCII
characters with the high bit set, because the CVT(a) preprocessor
macro and'ed the given ASCII value with 0x7F, effectively folding
the high value range 128-255 over the low value range 0-127.
As a consequence, some of the IS_*() erroneously returned TRUE.

This commit fixes the issue by adding range checks instead of
cutting off high order bits using a mask. In order avoid multiple
evaluation of macro arguments, most of the implementation was moved
from macros into a static function is_keytype().

Thanks to Румен Петров for reporting and analyzing the UTF-8 parsing
issue #5840.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from #5903)
@mspncp
Copy link
Contributor

mspncp commented Apr 8, 2018

Fixed by #5903.

@mspncp mspncp closed this as completed Apr 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants