Gidday all,
If the first and only release of Git for z/OS you installed on your system is 2.26 or higher you can ignore this post and exit now. But if you installed releases prior to this, you COULD have problems when moving to 2.26.
Prior to 2.26 the z/OS version of Git encoded its database in ISO8859-1 (aka ASCII) but from 2.26 and above the database is now encoded in UTF-8. This means repos created with the older releases have to be re-encoded. Rocket has updated Git to do the re-encoding automatically but depending on what characters are used in your repo, the their automatic re-encoding can create false results as seen here.
Below is the output of a "git diff" between the master branch with a selected commit e45b11. This was using Git for z/OS release 2.14.
Below is the output of the same command run after the master branch has been re-encoded as recommended by Rocket's Git for z/OS 2.26 migration guide. (Note that the output has been truncated, it's actually 805 lines)
The output should be the same, but because commit e45b11 has not been re-encoded the "git diff" shows the difference between the newly encoded master branch and the ISO8859-1 encoded commit e45b11 too.
I believe the only way to guarantee consistency with your repo is to re-encode every commit !!! But.........
When converting from IBM-1047 (EBCDIC) to ISO8859-1 (ASCII) and from IBM-1047 to UTF-8 there are 128 characters that convert to exactly the same characters. e.g. An EBCDIC "A" (X'C1') converts to ASCII "A" (X'41') and UTF-8 "A" (X'41'). If your repo consists only of these 128 characters then re-encoding will produce the same result. In other words, you don't have to re-encode your repo.
If however, your repo uses one or more of the other 128 characters, the mapping of IBM-1047 to ISO8859-1 and IBM-1047 to UTF-8 is very different. e.g. An EBCDIC X'04' converts to ASCII X'9C' but in UTF-8 in converts to X'C29C' (yes - 2 bytes) and this is why your repo needs to be re-encoded.
The best way to see if your repo is affected is to search for the characters that map differently in UTF-8. I have attached a file consisting 128 "srchfor" statements that can be used in the "Statements Dsn" parameter in ISPF 3.15 to search your repo for these characters. If any of these characters are found then I recommend a total re-encoding of your repository.
So how do you re-encode the entire repo, well based on my research and experimentation I'm recommending the use of the "git fast-export"/"git fast-import" commands.
The fast-export unloads the entire repo into a single file.
The fast-import reads the file recreating the entire repo encoded in UTF-8.
As long as the single file doesn't change the commit IDs will the same as the IDs in the original repo.
The steps required are:
1) CD to repo that needs re-encoding ("cd /u/user/currdir")
2) Do the git fast-export ("git fast-export --all >/tmp/fastexport-output")
3) Create new directory ("mkdir newdir")
4) CD to the directory ("cd newdir")
2) Initilize a git repo ("git init")
5) Perform fast-import ("git fast-import </tmp/fastexport-output")
6) Create the working directory ("git checkout master")
7) Reconnect any remote connections ("git remote add origin <REMOTE_URL>")
I did this on my largest repo processed by Git for z/OS which has been going since May 2017 and the unload file was 750MB so make sure you have plenty of room available for you temporary file that's created by fast-export.
I performed a "git log" on both repos after the import and compared the result. Got a 100% match.
I hope this is a help to anyone who has yet to do the conversion.
------------------------------
Gary Freestone
Systems Programmer
Kyndryl Inc
Mt Helen Australia
------------------------------