[UV 11.3.5] ANALYZE.FILE for Dynamic Files

Neil Morris
Rocketeer
Forum|Forum|2 years ago
October 30, 2023

Is ANALYZE.FILE meant to give different averages for the same items in different spec'd Dynamic Files?

I had an expectation that record size averages would not change regardless of the setup of the dynamic file. I was wrong, aparently.

This is my initial test file

>ANALYZE.FILE TEST_DYN STATS NO.PAGE
File name = TEST_DYN
File has 301 groups (each * represents 10 groups analyzed).
******************************File name .................. TEST_DYN
Pathname ................... TEST_DYN
File type .................. DYNAMIC
File style and revision .... 64BIT Revision 12
NLS Character Set Mapping .. NONE
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 301 current ( minimum 1, 0 empty,
300 overflowed, 300 badly )
Number of records .......... 2455
Large record size .......... 1619 bytes
Number of large records .... 1951
Group size ................. 2048 bytes
Load factors ............... 80% (split), 50% (merge) and 80% (actual)
Total size ................. 88696832 bytes
Total size of record data .. 88260487 bytes
Total size of record IDs ... 79401 bytes
Unused space ............... 352848 bytes
Total space for records .... 88692736 bytes

File name .................. TEST_DYN
Number per group ( total of 301 groups )
Average Minimum Maximum StdDev
Group buffers .............. 143.88 1 482 93.56
Records .................... 8.16 1 18 3.46
Large records .............. 6.48 1 15 3.01
Data bytes ................. 3224.21 1325 985231 1552.70
Record ID bytes ............ 263.79 36 589 112.10
Unused bytes ............... 1172.25 56 3028 606.56
Total bytes ................ 4660.25 2048 987136 0.00

Number per record ( total of 2455 records )
Average Minimum Maximum StdDev
Data bytes ................. 5951.32 38 518138 0649.61
Record ID bytes ............ 32.34 2 62 7.32
Total bytes ................ 5983.66 40 518200 0650.80

File name .................. TEST_DYN
Histogram of record and ID lengths

60.7%
Bytes ---------------------------------------------------------------------

I wanted to test the impact of a different GROUP.SIZE setting, thinking it would/should influence the oversize group results.

It did, but making them larger not smaller.

The other strange result was a change in the average record size - what's with that!

>ICREATE.FILE DATA TEST_DYN$_301023 DYNAMIC GENERAL GROUP.SIZE 2 MINIMUM.MODULUS 2456
Creating file "TEST_DYN$_301023" as Type 30.
>

>COPYI FROM TEST_DYN TO TEST_DYN$_301023 ALL OVERWRITING

2455 records copied.
>ANALYZE.FILE TEST_DYN$_301023 STATS NO.PAGE
File name = TEST_DYN$_301023
File has 2456 groups (each * represents 10 groups analyzed).
********************************************************************************
********************************************************************************
********************************************************************************
*****File name .................. TEST_DYN$_301023
Pathname ................... TEST_DYN$_301023
File type .................. DYNAMIC
File style and revision .... 64BIT Revision 12
NLS Character Set Mapping .. NONE
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 2456 current ( minimum 2456, 934 empty,
1285 overflowed, 1263 badly )
Number of records .......... 2455
Large record size .......... 3257 bytes
Number of large records .... 1867
Group size ................. 4096 bytes
Load factors ............... 80% (split), 50% (merge) and 6% (actual)
Total size ................. 99160064 bytes
Total size of record data .. 89682841 bytes
Total size of record IDs ... 78591 bytes
Unused space ............... 9390440 bytes
Total space for records .... 99151872 bytes

File name .................. TEST_DYN$_301023
Number per group ( total of 2456 groups )
Average Minimum Maximum StdDev
Group buffers .............. 9.86 1 148 15.37
Records .................... 1.00 1 5 1.02
Large records .............. 0.76 1 5 0.89
Data bytes ................. 6515.81 27 602257 2967.32
Record ID bytes ............ 32.00 2 175 33.24
Unused bytes ............... 3823.47 88 4096 558.45
Total bytes ................ 0371.28 4096 606208 0.00

Number per record ( total of 2455 records )
Average Minimum Maximum StdDev
Data bytes ................. 6530.69 38 516090 0615.47
Record ID bytes ............ 32.01 2 62 7.40
Total bytes ................ 6562.70 40 516152 0616.95

File name .................. TEST_DYN$_301023
Histogram of record and ID lengths

63.8%
Bytes ---------------------------------------------------------------------

------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------

Hi Gregor,

It looks like the average data bytes per record does include some overhead associated with each record/group. It looks to be calculated by dividing the "Total size of record data" by the number of records. The first file has that calculation set at 88260487 bytes and the second file has that set at 89682841. Given the same number of records, I'm guessing that changing the minimum modulus to 2456 from 301 in addition to changing the group size was a factor. Although this is just a guess at this point. I would be interested to see the results if you change the minimum modulus back to 301 on the TEST_DYN$_301023 file to see how that impacts the results?

On a related note, the averages appear to be restricted to 4 digits and the leading digit appears truncated. I was initially confused as multiplying the average by the number of records was not matching up with the total.

Thanks,

Neil

------------------------------
Neil Morris
Universe Advanced Technical Support
Rocket Software
------------------------------

Like

+1

Gregor Scott
Author
Participating Frequently
Forum|Forum|2 years ago
October 30, 2023

Hi Gregor,

It looks like the average data bytes per record does include some overhead associated with each record/group. It looks to be calculated by dividing the "Total size of record data" by the number of records. The first file has that calculation set at 88260487 bytes and the second file has that set at 89682841. Given the same number of records, I'm guessing that changing the minimum modulus to 2456 from 301 in addition to changing the group size was a factor. Although this is just a guess at this point. I would be interested to see the results if you change the minimum modulus back to 301 on the TEST_DYN$_301023 file to see how that impacts the results?

On a related note, the averages appear to be restricted to 4 digits and the leading digit appears truncated. I was initially confused as multiplying the average by the number of records was not matching up with the total.

Thanks,

Neil

------------------------------
Neil Morris
Universe Advanced Technical Support
Rocket Software
------------------------------

Interesting feedback @Neil Morris

It seems to me that both the number truncation and the average size calculation are bugs, unless they are both "by design".

------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------

Like

Neil Morris
Rocketeer
Forum|Forum|2 years ago
October 30, 2023

Interesting feedback @Neil Morris

It seems to me that both the number truncation and the average size calculation are bugs, unless they are both "by design".

------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------

Hi Gregor,

The truncation of the average size is something which should be looked at. Either no one has run into a situation where the average record size is 10,000 bytes or more. Or no one has looked closely enough to notice which is more likely. I'm sure the current behavior has been this way for years.

At this point, I'm not sure about the average record size calculation. I just did some tests by separately changing the minimum modulus and the large record size parameters on an existing dynamic file. And neither change caused the average record size to be altered. Note, I did not create a new file and copy records. Just changed the settings on an existing file. Will have to try with a new file to see if that changes the behavior.

Thanks,

Neil

------------------------------
Neil Morris
Universe Advanced Technical Support
Rocket Software
------------------------------

Like

+1

Gregor Scott
Author
Participating Frequently
Forum|Forum|2 years ago
October 30, 2023

Hi Gregor,

It looks like the average data bytes per record does include some overhead associated with each record/group. It looks to be calculated by dividing the "Total size of record data" by the number of records. The first file has that calculation set at 88260487 bytes and the second file has that set at 89682841. Given the same number of records, I'm guessing that changing the minimum modulus to 2456 from 301 in addition to changing the group size was a factor. Although this is just a guess at this point. I would be interested to see the results if you change the minimum modulus back to 301 on the TEST_DYN$_301023 file to see how that impacts the results?

On a related note, the averages appear to be restricted to 4 digits and the leading digit appears truncated. I was initially confused as multiplying the average by the number of records was not matching up with the total.

Thanks,

Neil

------------------------------
Neil Morris
Universe Advanced Technical Support
Rocket Software
------------------------------

FYI

I have raised a support case (#00929012) to look at the truncation of output in ANALYZE.FILE.

------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------

Like

J

John Zagnoli
New Participant
Forum|Forum|2 years ago
October 31, 2023

FYI

I have raised a support case (#00929012) to look at the truncation of output in ANALYZE.FILE.

------------------------------
Gregor Scott
Software Architect
Pentana Solutions Pty Ltd
Mount Waverley VIC AU
------------------------------

Gregor,

With OS read-ahead buffers pulling in 64Kbytes or more, just set your blocksize much, much higher, if not to the maximum permitted in UniVerse. This will increase your large record size as well. If a record is larger than the large-record size (which most of yours are in this examples) it automatically puts the record into the overflow group. Overflow disrupts the orderly reads of the disk, making the read-ahead less effective.

With all that said, we are a UniData shop and welcomed the much large blocksize available in Unidata 8.2. On one very large file, setting the blocksize to 1Mbyte improved select performance dramatically!

Neil, correct me if I am wrong, as my strong suit is in UniData, but the principles are the same.

-John Zagnoli

Trinity Petroleum Management

------------------------------
John Zagnoli
IT Director
Trinity Petroleum Management
DENVER CO US
------------------------------