Hello Andrew,
It's difficult to say it better than Gary did, but anyway I'll try to add my two cents.
Two files having the same contents should doubtlessly have an identical hash (BTW the reverse is not always true, even on x86). Now let's see what the file contents really is.
In a certain sense, most files are supposed to be 'consumed' by application programs, so the contents is what an application would read from the file. In USS with automatic conversion enabled, this is not necessarily identical to the bytes stored in the file system blocks. If a piece of (text) data is encoded as IBM-1047 and stored in a file tagged as IBM-1047, and the same piece of data is stored in another file in ISO8859-1 (and tagged as ISO8859-1), an application reading these two files with auto-conversion won't be able to distinguish one from the other. Think of auto-conversion as an additional layer in the already many-layered storage system. Who in the world will want to compute SHA256 for physical sectors on the hard disk, or for TCP packets with all the headers and control data they include? Most users are interested in the 'user-level' data, or, more appropriately, application-level data - which on USS sits on top of auto-conversion.
One might argue that computing SHA256 digest should read the 'real' data from the file, without any fancy 'post-processing' like character conversion. There's some data in the file system, give me the SHA sum. I don't quite agree. A file is not just a piece of data; it is data
plus metadata, and the latter includes, among others, the tag information - on USS, that is. The tag tells the system how to interpret the data coming from the filesystem blocks. To me, an attempt to ignore this transformational layer is like ignoring the decompression layer on a compressed filesystem. Or like computing SHA256 for clusters occupied by a sparse file on NTFS, without 'inflating' it first.
Please note that all this is my personal opinion on how things work or should work. Sorry if some parts sounded a bit harsh, this was not intentional.
Regards,
Vladimir
------------------------------
Vladimir Ein
Software Engineer
Rocket Software
------------------------------
Original Message:
Sent: 01-12-2022 07:33
From: Gary Freestone
Subject: File tagging and openssl dgst
The whole ASCII/EBCDIC conversion is a pain, but something we all have to live with for the foreseeable future.
I agree with you Andrew that data is just a sequence of bits and the SHA of two different files should only match if the sequence of bits match exactly.
However, in the world of ASCII/EBCDIC conversions I can understand the requirement of having the SHA of an ASCII text file matching the SHA of an EBCDIC text file if the sequence of characters match exactly.
Where things go really crazy knowing what file is binary and what file is text, because there are two places where a USS file can be marked as Binary. A CCSID of X'FFFF' means its a binary file, but there is also the file format field in the USS directory called FILEFMT where X'01' also means binary file. The fields can be independently modified so it possible to have a file with a CCSID of ISO8859-1 (ASCII) and a FILEFMT of X'01' making it binary.
I don't know which file takes precedence. Is it ASCII or Binary ????
------------------------------
Gary Freestone
Systems Programmer
Kyndryl Inc
Mt Helen AU
Original Message:
Sent: 01-11-2022 16:40
From: Andrew Rowley
Subject: File tagging and openssl dgst
"We also assume that 2 tagged text files with different binary content that seems identical to user should have the same hash"
Surely this is wrong? 2 files should have the same hash only if they have exactly the same content - not just if they seem to be identical to the user.
------------------------------
Andrew Rowley
Self Registered
Ballarat AU
Original Message:
Sent: 01-11-2022 11:04
From: Tatiana Balaburkina
Subject: File tagging and openssl dgst
Hi Jeff,
There is no good way to determine where the content of the file is binary or text. On USS we rely on file tags and default encoding for files. Most of our tools work with assumption that untagged file is a text file with EBCDIC (IBM-1047 encoded) content.
We also assume that 2 tagged text files with different binary content that seems identical to user should have the same hash:
> ls -T *t ISO8859-1 T=on -rw-r--r-- 1 TS5505 PDUSER 6 Jan 11 10:32 asciit IBM-1047 T=on -rw-r--r-- 1 TS5505 PDUSER 6 Jan 11 10:33 ebcdic> cat asciihello> od -h ascii0000000000 68 65 6C 6C 6F 0A0000000006> cat ebcdichello> od -h ebcdic0000000000 88 85 93 93 96 150000000006> openssl dgst -sha256 asciiSHA256(ascii)= 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03> openssl dgst -sha256 ebcdicSHA256(ebcdic)= 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
In order to align with these 2 assumptions openssl will internally convert untagged files from IBM-1047 to ISO8859-1 to calculate the hash.
If you work with binary files, you should avoid auto conversion, which means that
binary files should be tagged as binary.
Unfortunately, there was a bug in openssl 1.1.1h that caused conversion of binary files as well. The problem was solved in Openssl 1.1.1k that was released in September 2021. Due to 6-month gap between private and public channels, this version will be available on
anaconda.org in the end of March 2022.
------------------------------
Tatiana Balaburkina
Rocket Internal - All Brands
Original Message:
Sent: 01-10-2022 09:32
From: Jeff Mierzejewski
Subject: File tagging and openssl dgst
"OpenSSL 1.1.1h 22 Sep 2020 " installed with Miniconda from the Rocket open source (nonsecured) channel this past Thursday, January 6.
------------------------------
Jeff Mierzejewski
Advisory Software Engineer
IBM Global Services
Austin TX US
Original Message:
Sent: 01-10-2022 09:25
From: Tatiana Balaburkina
Subject: File tagging and openssl dgst
Hi Jeff,
Could you clarify what is the build number and how did you get this build? I am assuming it came from Izoda.
------------------------------
Tatiana Balaburkina
Rocket Internal - All Brands
Original Message:
Sent: 01-10-2022 09:02
From: Jeff Mierzejewski
Subject: File tagging and openssl dgst
Tagging the file as binary (chtag -b) still yields the wrong hash (or, if I am including -verify, verification fails).
Tagging the file as ASCII text (chtag -c ISO8859-1) results in the correct hash (and if including -verify, verification succeeds).
------------------------------
Jeff Mierzejewski
Advisory Software Engineer
IBM Global Services
Austin TX US
Original Message:
Sent: 01-10-2022 02:43
From: Vladimir Ein
Subject: File tagging and openssl dgst
Hello Jeff,
Does it work correctly for you if you tag the original file as binary?
chtag -b <filename>
Regards,
Vladimir
------------------------------
Vladimir Ein
Software Engineer
Rocket Software
Original Message:
Sent: 01-07-2022 12:58
From: Jeff Mierzejewski
Subject: File tagging and openssl dgst
As part of a code-signing initiative, I am working with SHA256 digests on z/OS. In the past, I would download files to a Linux or Windows machine and verify that the hashes matched expected values. But for some reason, Rocket openssl (OpenSSL 1.1.1h 22 Sep 2020) yields different digest values (openssl dgst -sha256) than Linux or WIndows.
The only way to get the same hash is to tag the file on z/OS before creating the hash:
chtag -tc ISO8859-1 <filename>
This feels really odd, since the original file is binary. Is this working as designed?
------------------------------
Jeff Mierzejewski
Advisory Software Engineer
IBM Global Services
Austin TX US
------------------------------