Python: read an EBCDIC file

Gabor Markon
Author
Participating Frequently
5 replies
Forum|Forum|3 years ago
December 17, 2021

I try to read a sequential dataset (FB LRECL(1000)) by using this simple code:

# ==================================================
count = 0
thefilepath = "//'sa8sb22.resultfb'"
count = len(open(thefilepath, 'r', encoding="cp500").readlines( ))
print("There are ", count , " lines in the file")
# ==================================================

I have 133 records in the file, but the above code delivers 1 (one). I suppose, no read is executed.
I suppose it is realted to the encoding, but I don't find any better as the cp500.
The file is definitive EBCDIC (x'81' for 'a').

Any hints pls.?

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Addendum: with Java / jzos I can read the file without any problem.

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Christopher Hanks
New Participant
4 replies
Forum|Forum|3 years ago
December 19, 2021

I try to read a sequential dataset (FB LRECL(1000)) by using this simple code:

# ==================================================
count = 0
thefilepath = "//'sa8sb22.resultfb'"
count = len(open(thefilepath, 'r', encoding="cp500").readlines( ))
print("There are ", count , " lines in the file")
# ==================================================

I have 133 records in the file, but the above code delivers 1 (one). I suppose, no read is executed.
I suppose it is realted to the encoding, but I don't find any better as the cp500.
The file is definitive EBCDIC (x'81' for 'a').

Any hints pls.?

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

This is how I read an EBCDIC file. in the read command you need to provide the length of the record. In this case it was 87. Might include a CR or LF as I copied the file to USS before reading.

fieldwidths = (8, 8, 4, -1, 1, -2, 7, 46, 8, 1, -2) # negative widths represent ignored padding fields
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)

with open('file.txt', 'r', encoding="cp1140") as f: #source file encoding
read_data = bytes(f.read(87), 'utf-8') #destination encoding
while read_data:
fieldstruct = struct.Struct(fmtstring)
jobnames = namedtuple('F1','F2 F3 F4 F5 F6 F7 F8 F9')
parse = fieldstruct.unpack_from
fields = parse(read_data)
f1 = jobnames._make(fields)

I also used Struct to parse the fields.

This only works for character data. No packed decimal or binary fields.

Chris Hanks

------------------------------
Christopher Hanks
Architect
Citigroup Technology Inc
Tampa FL US
------------------------------

Gabor Markon
Author
Participating Frequently
5 replies
Forum|Forum|3 years ago
December 20, 2021

This is how I read an EBCDIC file. in the read command you need to provide the length of the record. In this case it was 87. Might include a CR or LF as I copied the file to USS before reading.

fieldwidths = (8, 8, 4, -1, 1, -2, 7, 46, 8, 1, -2) # negative widths represent ignored padding fields
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)

with open('file.txt', 'r', encoding="cp1140") as f: #source file encoding
read_data = bytes(f.read(87), 'utf-8') #destination encoding
while read_data:
fieldstruct = struct.Struct(fmtstring)
jobnames = namedtuple('F1','F2 F3 F4 F5 F6 F7 F8 F9')
parse = fieldstruct.unpack_from
fields = parse(read_data)
f1 = jobnames._make(fields)

I also used Struct to parse the fields.

This only works for character data. No packed decimal or binary fields.

Chris Hanks

------------------------------
Christopher Hanks
Architect
Citigroup Technology Inc
Tampa FL US
------------------------------

Hello Christopher,

thanks for the detailed answer.
The cp1140 is an extra package, isn't it? In this case, I'll try to download and install it.
Unfortunately, I do not have access to my system because of an adminstration error.
I will try it, and give a feedback anyway.

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

+1

Vladimir Ein
Rocketeer
110 replies
Forum|Forum|3 years ago
December 20, 2021

Hello Christopher,

thanks for the detailed answer.
The cp1140 is an extra package, isn't it? In this case, I'll try to download and install it.
Unfortunately, I do not have access to my system because of an adminstration error.
I will try it, and give a feedback anyway.

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Hello Gabor,

What encoding does your data set really use? Are there any characters that look differently in the IBM-500 encoding than in IBM-1047? If it's in IBM-1047, you can use the solution below.

Internally, Python first converts your data from the encoding you specify, and then splits it into lines using the linefeed characters. Unfortunately the 'native' mapping of characters in the community version of Python (and therefore in ours, too) doesn't work well for EBCDIC newline characters (0x15) used on z/OS. To do the proper mapping, you need to specify encoding as 'cp1047_oe':

count = len(open(thefilepath, 'r', encoding="cp1047_oe").readlines( ))

This way Python will split text into lines correctly, and you should be getting 133 lines in the count. Please note that cp1047_oe is Rocket's extension to the original community version of Python, and is not available on other systems (e.g. Windows). You do not need to download any extra packages to use cp1047_oe.

Christopher's solution will work if you have fixed-width records inside of your data set; but even in this case I'd highly recommend to split data on newlines explicitly (e.g. using .split()) before converting them to structs. Just in case you read records of different lengths, which is easily possible when you open files in text mode.

Regards,
Vladimir

------------------------------
Vladimir Ein
Software Engineer
Rocket Software
------------------------------

H

hank oerlemans002
Participating Frequently
10 replies
Forum|Forum|3 years ago
December 20, 2021

I try to read a sequential dataset (FB LRECL(1000)) by using this simple code:

# ==================================================
count = 0
thefilepath = "//'sa8sb22.resultfb'"
count = len(open(thefilepath, 'r', encoding="cp500").readlines( ))
print("There are ", count , " lines in the file")
# ==================================================

I have 133 records in the file, but the above code delivers 1 (one). I suppose, no read is executed.
I suppose it is realted to the encoding, but I don't find any better as the cp500.
The file is definitive EBCDIC (x'81' for 'a').

Any hints pls.?

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

I like this cp1047_oe encoding.

Gabor, be aware that readlines() returns a list.

With the stock encoding you are getting a single item in the list therefore count=1.
With the nifty Rocket encoding you get a list with n items. n being the number of records.

Compare
lines = open(thefilepath, 'r', encoding="cp500").readlines( )
print(lines)

with the result you get with cp1047_oe.

As a recent Python experimenter the object structure and conceptual class stuff takes a while to get used to.
docs.python.org is your friend as well as Google :-)

------------------------------
hank oerlemans
Lead Software Engineer
HCL America Inc.
North Sydney NSW AU
------------------------------

Gabor Markon
Author
Participating Frequently
5 replies
Forum|Forum|3 years ago
December 21, 2021

This is how I read an EBCDIC file. in the read command you need to provide the length of the record. In this case it was 87. Might include a CR or LF as I copied the file to USS before reading.

fieldwidths = (8, 8, 4, -1, 1, -2, 7, 46, 8, 1, -2) # negative widths represent ignored padding fields
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)

with open('file.txt', 'r', encoding="cp1140") as f: #source file encoding
read_data = bytes(f.read(87), 'utf-8') #destination encoding
while read_data:
fieldstruct = struct.Struct(fmtstring)
jobnames = namedtuple('F1','F2 F3 F4 F5 F6 F7 F8 F9')
parse = fieldstruct.unpack_from
fields = parse(read_data)
f1 = jobnames._make(fields)

I also used Struct to parse the fields.

This only works for character data. No packed decimal or binary fields.

Chris Hanks

------------------------------
Christopher Hanks
Architect
Citigroup Technology Inc
Tampa FL US
------------------------------

Hello Vladimir,

thank you very much, now I understand the whole background.
I will try your solution as soon as I am bale to use the affected mainframe again.

Regards
Gábor

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Gabor Markon
Author
Participating Frequently
5 replies
Forum|Forum|3 years ago
January 5, 2022

I like this cp1047_oe encoding.

Gabor, be aware that readlines() returns a list.

With the stock encoding you are getting a single item in the list therefore count=1.
With the nifty Rocket encoding you get a list with n items. n being the number of records.

Compare
lines = open(thefilepath, 'r', encoding="cp500").readlines( )
print(lines)

with the result you get with cp1047_oe.

As a recent Python experimenter the object structure and conceptual class stuff takes a while to get used to.
docs.python.org is your friend as well as Google :-)

------------------------------
hank oerlemans
Lead Software Engineer
HCL America Inc.
North Sydney NSW AU
------------------------------

Hello Hank,

sorry, I did not read the messges in the last time. My workplace notebook was destroyed by a system error to the ground, and I succeeded to restore it (say, to 80%) by now.

Thank you for your suggestion; I will try it as soon as possible, I must struggle first to get back my lost mainframe credentials as well.

Best regards

Gabor

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Gabor Markon
Author
Participating Frequently
5 replies
Forum|Forum|3 years ago
January 6, 2022

Hello Hank,

sorry, I did not read the messges in the last time. My workplace notebook was destroyed by a system error to the ground, and I succeeded to restore it (say, to 80%) by now.

Thank you for your suggestion; I will try it as soon as possible, I must struggle first to get back my lost mainframe credentials as well.

Best regards

Gabor

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Hi Hank,
I've successfully restored my mainframe access today. My first try was to try the CP1047_OE setting - works perfect!
Thank you, and thanks for all for the competent information!
Regards
Gabor

------------------------------
Gabor Markon
Mainframe Architect
Self Registered
Budapest HU
------------------------------

Sign up

Please log in or register:

Welcome to the Rocket Forum!

Please log in or register:

Scanning file for viruses.

This file cannot be downloaded