I have a challenge to automate the creation of a backup file for an existing data file. The data file being backed up will vary and they can be hashed or dynamic files.
It has been reasonably straight forward until I came to the effort to support an existing dynamic file.
All my previous posts about dynamic files are in an attempt to better understand how I can assess the existing file such that I can create the backup file using well-optimised specifications so the item copy process is as quick/efficient as possible.
I was first thinking that ANALYZE.FILE might help determine both the Minimum modulo needed as well as the group size needed to properly cater for the data, but now I'm unsure if that is the best approach.
My alternative method is to simply count the number of items in the existing file and generate an average item size, then set the MINIMUM.MODULO to the item count and set the RECORD.SIZE to the average item size and let UV's Dynamic file magic do it's work.
I have come up with this TCL command to both count the items as well as calculate the average record size:
SELECT COUNT(*), AVG(CAST(EVAL "LEN(@RECORD)+LEN(@ID)+1" AS INT)) FROM TEMP_DYN COUNT.SUP COL.SUP MARGIN 0;
Are these approaches valid, or are there other methods/options available to achieve the results I need to deliver?
Many thanks for any guidance on this.
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
I tried the SELECT statement on a test file (it is dynamic, and has some VERY large items in it):
>SELECT COUNT(*), AVG(CAST(EVAL "LEN(@RECORD)+LEN(@ID)+1" AS INT)) FROM TEMP_DYNCOUNT.SUP COL.SUP MARGIN 0;
LEN ( @RECORD ) + LEN ( @ID ) + 1
2455 34746
>
So I then created a file using the following commands:
ICREATE.FILE DICT TEMP_DYN$_311023 18 53 2
ICREATE.FILE DATA TEMP_DYN$_311023 DYNAMIC GENERAL MINIMUM.MODULUS 2456 RECORD.SIZE 34746
And copied the source data into the new backup file using
COPYI FROM TEMP_DYN TO GDS.TEMP_DYN$_311023 ALL OVERWRITING
This all worked ok
>ICREATE.FILE DICT TEMP_DYN$_311023 18 53 2
Creating file "D_TEMP_DYN$_311023" as Type 18, Modulo 53, Separation 2.
Added "@ID", the default record for RetrieVe, to "D_TEMP_DYN$_311023".
>ICREATE.FILE DATA TEMP_DYN$_311023 DYNAMIC GENERAL MINIMUM.MODULUS 2456 RECORD.SIZE 34746
Creating file "TEMP_DYN$_311023" as Type 30.
>COPYI FROM TEMP_DYN TO TEMP_DYN$_311023 ALL OVERWRITING
2455 records copied.
>
This is what ANALYZE.FILE thinks:
>ANALYZE.FILE TEMP_DYN$_311023 STATS NO.PAGE
File name = TEMP_DYN$_311023
File has 2456 groups (each * represents 10 groups analyzed).
********************************************************************************
********************************************************************************
********************************************************************************
*****File name .................. TEMP_DYN$_311023
Pathname ................... TEMP_DYN$_311023
File type .................. DYNAMIC
File style and revision .... 64BIT Revision 12
NLS Character Set Mapping .. NONE
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 2456 current ( minimum 2456, 934 empty,
1285 overflowed, 1263 badly )
Number of records .......... 2455
Large record size .......... 3257 bytes
Number of large records .... 1867
Group size ................. 4096 bytes
Load factors ............... 80% (split), 50% (merge) and 6% (actual)
Total size ................. 99160064 bytes
Total size of record data .. 89682841 bytes
Total size of record IDs ... 78591 bytes
Unused space ............... 9390440 bytes
Total space for records .... 99151872 bytes
File name .................. TEMP_DYN$_311023
Number per group ( total of 2456 groups )
Average Minimum Maximum StdDev
Group buffers .............. 9.86 1 148 15.37
Records .................... 1.00 1 5 1.02
Large records .............. 0.76 1 5 0.89
Data bytes ................. 6515.81 27 602257 2967.32
Record ID bytes ............ 32.00 2 175 33.24
Unused bytes ............... 3823.47 88 4096 558.45
Total bytes ................ 0371.28 4096 606208 0.00
Number per record ( total of 2455 records )
Average Minimum Maximum StdDev
Data bytes ................. 6530.69 38 516090 0615.47
Record ID bytes ............ 32.01 2 62 7.40
Total bytes ................ 6562.70 40 516152 0616.95
File name .................. TEMP_DYN$_311023
Histogram of record and ID lengths
63.8%
Bytes ---------------------------------------------------------------------
up to 4|
up to 8|
up to 16|
up to 32|
up to 64|
up to 128|
up to 256|
up to 512| >>>>>>>>>
up to 1K| >>>>>>
up to 2K| >>>>>>>
up to 4K| >
up to 8K| >
up to 16K| >>>>>>>>>>>
More| >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
---------------------------------------------------------------------
>
From reading the manual the RECORD.SIZE option for ICREATE.FILE should result in the group size and large record values being calculated to suite the value specified.
The resulting GROUP size is 4096 - aka GROUP.SIZE 2 - and the Large record value of 3257 is the 80% of the calculated group size, but neither seem related to the RECORD.SIZE provided on the command line.
Does this indicate that the RECORD.SIZE parameter is totally ignored, or does it have an upper limit (that is not documented)?
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
I tried the SELECT statement on a test file (it is dynamic, and has some VERY large items in it):
>SELECT COUNT(*), AVG(CAST(EVAL "LEN(@RECORD)+LEN(@ID)+1" AS INT)) FROM TEMP_DYNCOUNT.SUP COL.SUP MARGIN 0;
LEN ( @RECORD ) + LEN ( @ID ) + 1
2455 34746
>
So I then created a file using the following commands:
ICREATE.FILE DICT TEMP_DYN$_311023 18 53 2
ICREATE.FILE DATA TEMP_DYN$_311023 DYNAMIC GENERAL MINIMUM.MODULUS 2456 RECORD.SIZE 34746
And copied the source data into the new backup file using
COPYI FROM TEMP_DYN TO GDS.TEMP_DYN$_311023 ALL OVERWRITING
This all worked ok
>ICREATE.FILE DICT TEMP_DYN$_311023 18 53 2
Creating file "D_TEMP_DYN$_311023" as Type 18, Modulo 53, Separation 2.
Added "@ID", the default record for RetrieVe, to "D_TEMP_DYN$_311023".
>ICREATE.FILE DATA TEMP_DYN$_311023 DYNAMIC GENERAL MINIMUM.MODULUS 2456 RECORD.SIZE 34746
Creating file "TEMP_DYN$_311023" as Type 30.
>COPYI FROM TEMP_DYN TO TEMP_DYN$_311023 ALL OVERWRITING
2455 records copied.
>
This is what ANALYZE.FILE thinks:
>ANALYZE.FILE TEMP_DYN$_311023 STATS NO.PAGE
File name = TEMP_DYN$_311023
File has 2456 groups (each * represents 10 groups analyzed).
********************************************************************************
********************************************************************************
********************************************************************************
*****File name .................. TEMP_DYN$_311023
Pathname ................... TEMP_DYN$_311023
File type .................. DYNAMIC
File style and revision .... 64BIT Revision 12
NLS Character Set Mapping .. NONE
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 2456 current ( minimum 2456, 934 empty,
1285 overflowed, 1263 badly )
Number of records .......... 2455
Large record size .......... 3257 bytes
Number of large records .... 1867
Group size ................. 4096 bytes
Load factors ............... 80% (split), 50% (merge) and 6% (actual)
Total size ................. 99160064 bytes
Total size of record data .. 89682841 bytes
Total size of record IDs ... 78591 bytes
Unused space ............... 9390440 bytes
Total space for records .... 99151872 bytes
File name .................. TEMP_DYN$_311023
Number per group ( total of 2456 groups )
Average Minimum Maximum StdDev
Group buffers .............. 9.86 1 148 15.37
Records .................... 1.00 1 5 1.02
Large records .............. 0.76 1 5 0.89
Data bytes ................. 6515.81 27 602257 2967.32
Record ID bytes ............ 32.00 2 175 33.24
Unused bytes ............... 3823.47 88 4096 558.45
Total bytes ................ 0371.28 4096 606208 0.00
Number per record ( total of 2455 records )
Average Minimum Maximum StdDev
Data bytes ................. 6530.69 38 516090 0615.47
Record ID bytes ............ 32.01 2 62 7.40
Total bytes ................ 6562.70 40 516152 0616.95
File name .................. TEMP_DYN$_311023
Histogram of record and ID lengths
63.8%
Bytes ---------------------------------------------------------------------
up to 4|
up to 8|
up to 16|
up to 32|
up to 64|
up to 128|
up to 256|
up to 512| >>>>>>>>>
up to 1K| >>>>>>
up to 2K| >>>>>>>
up to 4K| >
up to 8K| >
up to 16K| >>>>>>>>>>>
More| >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
---------------------------------------------------------------------
>
From reading the manual the RECORD.SIZE option for ICREATE.FILE should result in the group size and large record values being calculated to suite the value specified.
The resulting GROUP size is 4096 - aka GROUP.SIZE 2 - and the Large record value of 3257 is the 80% of the calculated group size, but neither seem related to the RECORD.SIZE provided on the command line.
Does this indicate that the RECORD.SIZE parameter is totally ignored, or does it have an upper limit (that is not documented)?
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Hi Gregor
Dynamic files will treat large items effectively the same a static hashed files - they will be handled as out of line (oversized) records into the OVER.30. That is not necessarily a bad thing: the ID will be hashed in primary space and points directly to the first block of the record body and a chain count so it's only one additional fetch. If you only have a relatively small number of those it isn't worth getting hung up about, so long as it accommodates the vast majority well. As to the RECORD.SIZE parameter, it does seem to have some effect on calculation but I haven't spent time working out exactly what (!) so I always just go off the group size.
Unless someone else has some insight into RECORD.SIZE ...
BTW just to be pedantic (never!) when it comes to determining group size your calculation for average size doesn't take account of the record header.
------------------------------
Brian Leach
Director
Brian Leach Consulting
Chipping Norton GB
------------------------------
I have a challenge to automate the creation of a backup file for an existing data file. The data file being backed up will vary and they can be hashed or dynamic files.
It has been reasonably straight forward until I came to the effort to support an existing dynamic file.
All my previous posts about dynamic files are in an attempt to better understand how I can assess the existing file such that I can create the backup file using well-optimised specifications so the item copy process is as quick/efficient as possible.
I was first thinking that ANALYZE.FILE might help determine both the Minimum modulo needed as well as the group size needed to properly cater for the data, but now I'm unsure if that is the best approach.
My alternative method is to simply count the number of items in the existing file and generate an average item size, then set the MINIMUM.MODULO to the item count and set the RECORD.SIZE to the average item size and let UV's Dynamic file magic do it's work.
I have come up with this TCL command to both count the items as well as calculate the average record size:
SELECT COUNT(*), AVG(CAST(EVAL "LEN(@RECORD)+LEN(@ID)+1" AS INT)) FROM TEMP_DYN COUNT.SUP COL.SUP MARGIN 0;
Are these approaches valid, or are there other methods/options available to achieve the results I need to deliver?
Many thanks for any guidance on this.
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
All my previous posts about dynamic files are in an attempt to better understand how I can assess the existing file such that I can create the backup file using well-optimised specifications so the item copy process is as quick/efficient as possible.
Wouldn't the quickest way to copy the file to be to do an OS copy of the file and then create a new VOC pointer?
------------------------------
Martin Shields
Senior Technical Consultant
Meier Business Systems PTY LTD
Carnegie VIC AU
------------------------------
All my previous posts about dynamic files are in an attempt to better understand how I can assess the existing file such that I can create the backup file using well-optimised specifications so the item copy process is as quick/efficient as possible.
Wouldn't the quickest way to copy the file to be to do an OS copy of the file and then create a new VOC pointer?
------------------------------
Martin Shields
Senior Technical Consultant
Meier Business Systems PTY LTD
Carnegie VIC AU
------------------------------
Hi Martin.
An OS level copy is one fast method, but it does not cater for a couple of scenario:
- Sometimes not all the items in the source file are going to be adjusted. There are workloads where only a known subset need changing and those are the items that need backup. The backup file specs need to reflect the smaller quantity. In these cases the items IDs are known so it is possible to determine average record sizes.
- Files which are involved in ADE cannot be safely duplicated via OS-level copy commands. For such files, the backup file also needs to have ADE protection applied to ensure the backed up contents are kept safe.
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Hi Gregor
Dynamic files will treat large items effectively the same a static hashed files - they will be handled as out of line (oversized) records into the OVER.30. That is not necessarily a bad thing: the ID will be hashed in primary space and points directly to the first block of the record body and a chain count so it's only one additional fetch. If you only have a relatively small number of those it isn't worth getting hung up about, so long as it accommodates the vast majority well. As to the RECORD.SIZE parameter, it does seem to have some effect on calculation but I haven't spent time working out exactly what (!) so I always just go off the group size.
Unless someone else has some insight into RECORD.SIZE ...
BTW just to be pedantic (never!) when it comes to determining group size your calculation for average size doesn't take account of the record header.
------------------------------
Brian Leach
Director
Brian Leach Consulting
Chipping Norton GB
------------------------------
Thanks Brian.
I have raised a support case regarding the RECORD.SIZE parameter - I suspect that the GROUP.SIZE calculated by using the RECORD.SIZE parameter value is being capped at 2 (ie 4096 bytes), so while the RECORD.SIZE parameter might be helpful for larger records it is not - the GROUP.SIZE approach seems the best for all round consistent outcomes.
Regarding the record header in Type-30 files - how large is it, and does it change based on different GROUP.SIZE settings?
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Thanks Brian.
I have raised a support case regarding the RECORD.SIZE parameter - I suspect that the GROUP.SIZE calculated by using the RECORD.SIZE parameter value is being capped at 2 (ie 4096 bytes), so while the RECORD.SIZE parameter might be helpful for larger records it is not - the GROUP.SIZE approach seems the best for all round consistent outcomes.
Regarding the record header in Type-30 files - how large is it, and does it change based on different GROUP.SIZE settings?
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Looking at https://www.slideshare.net/rocketsoftware/universe-files (Thanks for posting this @Neil Morris!), slide 101 seems to indicate there is a 12 byte header per group.
I presume that is a fixed size and does not grow in line with different GROUP.SIZE settings.
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Looking at https://www.slideshare.net/rocketsoftware/universe-files (Thanks for posting this @Neil Morris!), slide 101 seems to indicate there is a 12 byte header per group.
I presume that is a fixed size and does not grow in line with different GROUP.SIZE settings.
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Hi Gregor
That is true for 32 bit files. For 64 bit the addresses are double the size (DWORD) so 20 bytes (IIRC flags are still a WORD?). It is independent of group size.
Brian
Sent from my iPhone
On 1 Nov 2023, at 00:49, Gregor Scott via Rocket Forum
wrote:
Looking at www.slideshare.net/rocketsoftware/universe-files (Thanks for posting this @Neil Morris!), slide 101 seems to indicate there is a 12...
Hi Gregor
That is true for 32 bit files. For 64 bit the addresses are double the size (DWORD) so 20 bytes (IIRC flags are still a WORD?). It is independent of group size.
Brian
Sent from my iPhone
On 1 Nov 2023, at 00:49, Gregor Scott via Rocket Forum wrote:
Looking at www.slideshare.net/rocketsoftware/universe-files (Thanks for posting this @Neil Morris!), slide 101 seems to indicate there is a 12...
Hi Gregor,
The group header size is fixed. The only variation as Brian states is that the size will be larger for a 64bit file. The size is 12bytes for a 32bit file and 24 bytes for a 64bit file.
Neil
------------------------------
Neil Morris
Universe Advanced Technical Support
Rocket Software
------------------------------
Hi Gregor
Dynamic files will treat large items effectively the same a static hashed files - they will be handled as out of line (oversized) records into the OVER.30. That is not necessarily a bad thing: the ID will be hashed in primary space and points directly to the first block of the record body and a chain count so it's only one additional fetch. If you only have a relatively small number of those it isn't worth getting hung up about, so long as it accommodates the vast majority well. As to the RECORD.SIZE parameter, it does seem to have some effect on calculation but I haven't spent time working out exactly what (!) so I always just go off the group size.
Unless someone else has some insight into RECORD.SIZE ...
BTW just to be pedantic (never!) when it comes to determining group size your calculation for average size doesn't take account of the record header.
------------------------------
Brian Leach
Director
Brian Leach Consulting
Chipping Norton GB
------------------------------
From what I can tell, the RECORD.SIZE parameter will determine whether to set the GROUP.SIZE to 1 (2048) or 2 (4096). And the LARGE.RECORD will be set to 80% of the group size. When originally implemented way back when, the GROUP.SIZE for dynamic files was limited to 1 or 2. In the relatively recent past (11.3.1?), the restriction of specifying only group sizes of 2K or 4K was lifted. The GROUP.SIZE parameter now accepts group sizes of 3,4,5,etc. where the actual group size will be a multiple of 2K of the value specified. The RECORD.SIZE parameter does appear to still be treating group sizes to only 1 (2k) or 2 (4k).
Neil
------------------------------
Neil Morris
Universe Advanced Technical Support
Rocket Software
------------------------------
I have a challenge to automate the creation of a backup file for an existing data file. The data file being backed up will vary and they can be hashed or dynamic files.
It has been reasonably straight forward until I came to the effort to support an existing dynamic file.
All my previous posts about dynamic files are in an attempt to better understand how I can assess the existing file such that I can create the backup file using well-optimised specifications so the item copy process is as quick/efficient as possible.
I was first thinking that ANALYZE.FILE might help determine both the Minimum modulo needed as well as the group size needed to properly cater for the data, but now I'm unsure if that is the best approach.
My alternative method is to simply count the number of items in the existing file and generate an average item size, then set the MINIMUM.MODULO to the item count and set the RECORD.SIZE to the average item size and let UV's Dynamic file magic do it's work.
I have come up with this TCL command to both count the items as well as calculate the average record size:
SELECT COUNT(*), AVG(CAST(EVAL "LEN(@RECORD)+LEN(@ID)+1" AS INT)) FROM TEMP_DYN COUNT.SUP COL.SUP MARGIN 0;
Are these approaches valid, or are there other methods/options available to achieve the results I need to deliver?
Many thanks for any guidance on this.
------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------
Hi Gregor,
This series may be of some interest...
------------------------------
Justin Gledhill
Business Analyst
Ecolab Incorporated
Saint Paul MN US
------------------------------