Skip to main content

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Note that issuing an: unset _TAG_REDIR_OUT resolves this but why ???

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Lionel,

This may have something to do with an issue I raised back in June last year. See https://community.rocketsoftware.com/forums/forum-home/digestviewer/viewthread?GroupId=79&MessageKey=856a6223-d5ee-4c19-9d2b-5c95ba5f4c07&CommunityKey=1e694975-142d-4f2d-9b52-0e37e225db41

At the time Rocket admitted it was a problem but I haven't been able to get an answer out of them about what the fix was going to be. 

Maybe we now know.

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Gary - I ran into this with ZIGI and needed to find a solution for all ZIGI users. Fortunately I was talking with the IBM z/OS Open Tools team and they (Igor) suggested this as a fix.

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Ok.  Now it's getting weird.   

If I issue   "git show xxxxx:file"  and use REXX's  OUTTRAP to capture the output  and do a bunch of SAYs to display it then it arrives back in EBCDIC. Because I can read it.

If I redirect the output to a file thus "git show xxxxx:file >/tmp/output" the the output file, as Lionel has already indicated, is ASCI encoded. 

I don't understand

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Gary - you nailed it.  So why the change in git show processing? For the non-Z users it probably isn't an issue but we expect Rocket to look after the Z users here.

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Hey Lionel, sorry you are having problems with the updated git. I just tried to recreate the behavior you specified and indeed if you write the output of 'git show' to a new file, it will automatically be tagged as ISO88159. However, if you specify the output to a file that already has a tag specified, then that tag will be used. Perhaps you could prepend 'touch file && chtag -tc 1047 file' to your command?

Also, is it possible that the CHGSTR file was tagged beforehand when you used the old version of git? This may have changed if you updated from git 2.26.2-27 to git 2.26.2-84 and then cloned the repo you are working on anew.

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

I've got things working - i've recreated on a new repo so it wasn't a historical artiface.  after the 'git show' I'm now adding a chtag.

Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Hey @Benjamin Maloy I've tried what you suggested as can be seen in this code:

tmpfile='/tmp/U700215.TEMPFILE.8499'

sha='627afb4'

file='PANELS/WGTIPASM'

 

call Issue_USS 'touch 'tmpfile

call issue_USS 'chtag -tc IBM-1047 'tmpfile

call Issue_USS 'git show 'sha':'file '>'tmpfile

You can see below the temporary file did get tagged as IBM-1047 and the show did put data into the file.


But the data did not come back as expected.

This first image is the file as it stands in USS after the clone from GitHub 


and this second image is the file after it was brought using the Git Show


You can see some of the EBCDIC characters have come back as 2 bytes instead of one.   I'm guessing the EBCDIC characters were translated to a 2 byte UTF-8 character when save in Git but after it was brought back by the Git Show command the 2 byte character has not been converted to the single EBCDIC character.   Note: that a Git Checkout does not have that problem, the 2 byte UTF-8 character is brought back as a single EBCDIC character. 

I tried @Lionel Dyck 's suggestion of clone the repository fresh from GitHub but I get exactly the same result.

@Benjamin Meier can you try it with by "crazy" EBCDIC characters and my REXX code. 

@Lionel Dyck  you seem to imply you did the CHTAG after the show ?  Is that true ? It makes more sense to tag the file before you do the git show 


Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Based on @Benjamin Maloy comment that it gets tag as ASCII (ISO8859-1) I have experimented further, and I finally know what the problem is with Git. 

During normal Git processing when a file is added to the Git repo the equivalent of a "iconv  -f IBM-1047   -t UTF-8   ebcdic.filename  >  file.for.git.to.store". Thus taking my EBCDIC encoded file and converting it to UTF-8 and adding it to the Git database.

When a Checkout or Clone is performed the opposite is performed "iconv   -f UTF-8   -t IBM-1047   file.for.git.to.store  >  ebcdic.filename".  Taking the UTF-8 version of my file can converting it back to EBCDIC. 

But this not what  "git show" is doing. 

For some reason a "git show" does the equivalent of   "iconv   -f ISO8859-1    -t IBM-1047   file.for.git.to.store  >  show.ebcdic.filename".  Taking the UTF-8 encoded file and assuming it's an ASCII file and converting it to EBCDIC.

You would never see this issue if your EBCDIC files only contained characters that when translated only create single byte UTF-8 characters. But as soon as you have EBCDIC characters that translate to multiple byte UTF-8 characters then you get the problem, and my file has these EBCDIC characters.

This can be reproduced as follows:

Create a EBCDIC tagged  file that contains EBCDIC characters that translate to 2 byte UTF-8 characters. Eg in my example X'4A', X'B3', X'AC' and X'BA'

iconv this file using  -f IBM-1047   -t UTF-8 

Iconv the new UTF-8 file using -f ISO8859-1    -t IBM-1047 

The result file matches exactly the file created if the original EBCDIC is added to a git repo and extracted to an EBCDIC tagged file using "git show " > EBCDIC.tagged.file

Could Rocket please fix this problem. As the git show command hasn't been working correctly since the upgrade to 2.26 


Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Just in case other people are reading this post and are having the same issue, there is a work around that I have tested.  

Using my original code example 

tmpfile='/tmp/U700215.TEMPFILE.8499'

sha='627afb4'

file='PANELS/WGTIPASM'

 

call Issue_USS 'touch 'tmpfile

call issue_USS 'chtag -tc IBM-1047 'tmpfile

call Issue_USS 'git show 'sha':'file '>'tmpfile

Add the following two lines:

call Issue_USS 'iconv -f IBM1047 -t ISO8859-1 'tmpfile'>'tmpfile'.ascii'
call Issue_USS 'iconv -f UTF-8 -t IBM1047 'tmpfile.'ascii>'tmpfile.ebcdic

The first line undoes the ASCII to EBCDIC that git show performed incorrectly, and the second line performs the correct UTF-8 to EBCDIC translation.


Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

I thought I finally understood this issue but alas it keeps on keeping on. 

Everything thing I said in my previous post is true BUT.....

Only for files committed using Git 2.26 onward.   Files that were committed before 2.26 were committed as ASCII meaning that the current "iconv -f ISO8859-1 -t IBM1047"  will work fine as these files are not UTF-8 format.  It's only files committed to Git in UTF-8 format that have the issue. 

Question for Rocket.  Is there any way of telling whether the files committed in the Git repository are encoded in ASCII or UTF-8.   ????


Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Hey @Gary Freestone

Thanks for bringing this back up. Unfortunately, there is no metadata or anything like that to indicate the encoding of the files in the index. You can check out the repo with no .gitattributes and you will get all of the content of the index 'as is' upon cloning.

To that end, what are the contents of your .gitattributes file?

Some background info that may be useful: If you have a zos-working-tree-encoding specified for a file, then it will be converted to UTF-8 when it is uploaded to the index. If you do not have a zos-working-tree-encoding specified, the file will be uploaded to the index without any conversion.


Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

@Benjamin Maloy thanks for replying. 

I'm not talking about "git checkout" nor "git clone".  Both of these commands bring the source code from Git to the working directory correctly.  It doesn't matter to these commands whether the code is encoded in Git as UTF-8 (Git 2.26 and above) or encoded as ASCII (releases of Git prior to 2.26) the working directory files always end up with the correct characters.

The issue is with "Git Show" and the fact that when bringing back source code it always assumes it was encoded in Git as ASCII.  This was true prior to Rocket's Git 2.26 release. But since 2.26 its been encoded in UTF-8.

So why do Checkout and Clone work correctly and Show not ?

I don't think my .gitattributes has anything to do with it. But you asked so here it is:

#                                                                         
# Default zGit .gitattributes file specifying the attributes of some      
# known file names                                                        
#                                                                         
#                                                                         
# This file is copied to .git/info/attributes when a zGit repo is created 
#                                                                         
#                                                                         
                                                                          
*              zos-working-tree-encoding=ibm-1047                         
.gitattributes zos-working-tree-encoding=iso8859-1                        
.gitignore     zos-working-tree-encoding=iso8859-1                        
                                                                          
*.mpg          binary                                                     
*.mov          binary                                                     
*.mp3          binary                                                     
*.mp4          binary                                                     
*.wav          binary                                                     
*.png1         binary                                                     
*.whl          binary                                                     
                                                                          
# Archives                                                                
                                                                          
*.7z           binary                                                     
*.br           binary                                                     
*.gz           binary                                                     
*.tar          binary                                                     
*.zip          binary                                                     
                                                                          
# Documents                                                               
                                                                          
*.pdf          binary                                                     
                                                                          
# Images                                                                  
                                                                          
*.gif          binary                                                     
*.ico          binary                                                     
*.jpeg         binary                                                     
*.jpg          binary                                                     
*.pdf          binary                                                     
*.png          binary                                                     
*.psd          binary                                                     
*.webp         binary                                                     
                                                                          
# Fonts                                                                   
                                                                          
*.woff2        binary                                                     
                                                                          
# Other                                                                   
                                                                          
*.exe          binary                                                     
*.md           zos-working-tree-encoding=UTF-8                            


Something changed in git in terms of file encoding.

These files were created using this code:

cmd = 'cd' localrep'/'zigirep ,
   '&& git show' hcommit':'element ,
   '>' file

where localrep/zigirep, hcommit, element and file are all variables.

This is what I have on my work system for this use case using git version 2.26.2-78

t IBM-1047 T=on -rw-rw-rw- 1 SPLBD SYS1 1875 Feb 1 05:03 CHGSTR

But with git version 2.26.2-84 I get this:

t ISO8859-1 T=on -rw-rw-rw- 1 LBDYCK ZOWEDEV 1875 Feb 1 08:06 CHGSTR

To summarize this - with git 2.26.2-27 the git show generated file is tagged as IBM-1047 and then with git 2.26.2-84 it is now tagged as ISO8859-1.

Why the change and how can I control it?

Hey @Gary Freestone

In response to "The issue is with "Git Show" and the fact that when bringing back source code it always assumes it was encoded in Git as ASCII.  This was true prior to Rocket's Git 2.26 release. But since 2.26 its been encoded in UTF-8.":

The .gitattributes file will dictate the encoding of a file in the index, which is what the 'git show' command will display. I see from your .gitattributes file that all files have a zos-working-tree-encoding specified. This means that in the index, the files are stored as UTF-8. Hence, when you use the 'git show' command to check the contents of the index, the files will be encoded as UTF-8. 

You mentioned that you are comparing this behavior to git before 2.26. As you mentioned, before git for z/OS version 2.26, you could upload files to the index as ISO8859-1, in which case the 'git show' command would display an ISO8859-1 file.

Does this explain the difference in the behavior of the 'git show' command?

Indeed, if the files are UTF-8 in the index, you may have trouble converting to EBCDIC if you have multibyte characters in the file.