Rocket U2 | UniVerse & UniData

 View Only
Expand all | Collapse all

one index for multiple files?

  • 1.  one index for multiple files?

    Posted 11 days ago

    Our Universe application consists of 8 accounts for 8 branches. In all accounts, there is a file for the items. The item file is identical for the master data in all accounts, only the transaction data differs. When writing, changes to the file, master data are distributed across all accounts to keep them consistent.

    Although this approach is certainly not optimal, we do not want to change it because the programming is not easy to modify.

    I conducted a test, and used: SET.INDEX ITEM TO H:\ALPHA\INDEXES\I_ITEM to set the target path for the index of all 8 item files to the same destination.

    During a bulk import into the item file, where all 8 files are updated, switching to a single index file, approximately double the speed is achieved.

    My question now is: Is there anything that speaks against using a single index file for multiple data files, as long as all indexed fields are identical for all data files?

    greetings

    Thomas



    ------------------------------
    Thomas Ludwig
    System Builder Developer
    Rocket Forum Shared Account
    ------------------------------


  • 2.  RE: one index for multiple files?

    Posted 11 days ago

    Could you clarify this a bit?  What do you mean by "Master Data" vs "item files". In Universe you can have one copy of the file with Q pointers in all 8 accounts pointing to that one copy. If there's data specific to the account, each of your 8 accounts can have it's own copy of that file.

    Having 8 files with one index sounds like you're going to have a lot of mismatches. I must not be understanding the question.



    ------------------------------
    Joe Goldthwaite
    Consultant
    Phoenix AZ US
    ------------------------------



  • 3.  RE: one index for multiple files?

    Posted 11 days ago
    Edited by Thomas Ludwig 11 days ago

    Thanks JOE for thinking about it.

    There ar 8 physical Data-Files with the same dict in 8 Accounts.

    about 60% of the data are identical in all 8 data-files.
    The keys and number of Entries in all 8 data-files are also identical.

    Only Fields are in the index that are identical in all 8 Files

    best regards

    Thomas



    ------------------------------
    Thomas Ludwig
    System Builder Developer
    Rocket Forum Shared Account
    ------------------------------



  • 4.  RE: one index for multiple files?

    Posted 11 days ago

    Ok, but I don't understand what you're trying to gain by having one index shared across multiple files.  Lets say your 8 files have a widget date and you want to index it to save time selecting on that date. Even if each of the 60% identical items have the same widget date associated with the same item id, you could still have issues if that date changes in one of the files. The index will get updated for that item with the new widget date. Now the other 7 files will be wrong unless you also update each of them with the correct widget date.

    What happens when synchronize the files?  As each file's widget date get's updated it's going to try and update the index. That's going to end up updating it  7 more times to the same value.

    Normally you'd just give each of the 8 files it's own index and let Universe keep them updated. I'm not sure what you're trying to gain by sharing it and it seems like you could run into a lot of potential issues.



    ------------------------------
    Joe Goldthwaite
    Consultant
    Phoenix AZ US
    ------------------------------



  • 5.  RE: one index for multiple files?

    ROCKETEER
    Posted 11 days ago

    Thomas,

    I would worry about how you are handling the writes to he 8 files, if/wen locks are involved.  What would happen if one of the files did not get updated?  I can see a situation where the index would return an item based on data that would not be in the version of the file that failed to write.

    As for the performance, I think I see why it is faster.

    While I am not 100% sure, I expect that the index would only be updated by the first file that was updated.  So, 9 writes for each update, 1 write for each of the 8 files and one index write, is approximately half the writes (16) if each file had its own index. 

    I understand your concern with changing code that works and the effort may not be worth the gain, but maintaining one file for the 60% that is identical, and a secondary file that uses the same id for the unique data in each of the 8 accounts may provide better performance.  The bulk import of the common data would be updating 2 files not 9 or 16.  



    ------------------------------
    Mike Rajkowski
    MultiValue Product Evangelist
    Rocket Internal - All Brands
    US
    ------------------------------



  • 6.  RE: one index for multiple files?

    PARTNER
    Posted 11 days ago
    From a database theory point of view this would be a NO-GO!

    The index points to where in the file a record resides.

    For one index to work for multiple files,


    * Each record would have to be the same size across all the files.

    * Every record would have to be in every file.

    * Every record would have to be updated in each file in the same order every time.

    Otherwise, the data record could be in different places in each file. If the prior record before the record you read is larger in one file than another, the location in the physical record would be different. One could even be in an overflow while the other is in the main record. If you update if differing orders, the records get placed in the physical record if differing orders.

    I do not know the details of the UV indices. If only the “bucket” is placed in the index and then the location of the record in the bucket is in the bucket, my explanation would not apply.

    If the records include a file indicator in the index key, this could work. Essentially each file has a unique index file within the same single index. This can be done with client data by including the client id in the data and putting all clients in one file.

    My question is, “With the files existing, how would you create and build the index?” Build and create it for one file, then read and write every record in the other files?

    This just send shivers up my spine. For you older folks, I hear the robot saying, “Danger Wil Robinson. Danger!” And for the younger generations, it makes my “spidie sense” tingle.

    BTW I have been an IT professional for 50 years most working with various database systems. So, I may be a bit set in my ways. But watch out for unexplained data corruption if you do this.
    ________________________________________________
    Sam Beard
    Senior Programmer Analyst
    8215 S. Elm Place, Broken Arrow, OK 74011
    sambeard@micahtek.com <mailto:sambeard@micahtek.com>
    www.micahtek.com <http: www.micahtek.com/="">
    Phone: 918.449.3300 x3035
    Fax: 918.449.3301

    [Description: facebook]<https: www.facebook.com/micahteksolutions/=""> [Description: m-icon] <http: www.micahtek.com/="">

    [Description: cid:image004.png@01D488B1.580182F0]
    "SOLUTIONS you need for the GROWTH you desire!"
    This message is for the named person's use only. It may contain confidential, proprietary, or legally privileged information. No confidentiality or privilege is waived or lost by any wrongful transmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it, and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. MicahTek reserves the right to monitor all e-mail communications through its networks. Any views or opinions expressed in this message are solely those of the individual sender, except (1) where the message states such views or opinions are on behalf of a particular entity; and (2) the sender is authorized by the entity to give such views or opinions.




  • 7.  RE: one index for multiple files?

    Posted 11 days ago
    Edited by Thomas Ludwig 11 days ago

    Thanks to all for your thoughts.

    @Joe Goldthwaite
    I think i should clearify this. Currently there is a index for 4 fields in this file and the data for these fields are identical in all accounts.

    There are "central" fields in this file for which the data is consistent over all accounts and "local" data for which the data differs through the acconts.

    @Mike Rajkowski

    You are right the consistency of the Data gets more important if there is only one Index-File but the application is grown for about 25 years now and the Data in these files ar always consistent for the "central" fields. This is the reason why i'm thinking about it.

    The growing of the Application for this long time is also a reason why i blench to change the programming to "really" split the "central" part of the file from the "local" one.  I'm sure this would be a big project with a long testing period and even still has a lot of risk to bring new bugs...

    The changing of the index is a fast way to get a lot more performance without having to change the source code. 
    I only got the idea to change that because I had just created a new account and initially forgot to change the position of the index file. After that worked without any issues, it gave me the idea to test the performance differences...

    @Sam Beard

    Is it really the case that all records have to be the same size across the account's? Technically I don't understand this. Does the Index return the "position" in the file instead of a key?

    I made a test yesterday on my test-system and set one index file for all data files, after that i made a bigger Import to check the performance. And did some adjustments specially to the indexed fields to check if everything works.

    Everything seems to work perfectly as it should I select through the indexed fields in every account without problems.


    This Screen demonstrates the result of this. I get the same result in every Account and all items are reachable even if the size of the items is for sure not identical. In the example the field "Herst. Art. Nr." is used for selection and is part of the index.



    ------------------------------
    Thomas Ludwig
    System Builder Developer
    Rocket Forum Shared Account
    ------------------------------



  • 8.  RE: one index for multiple files?

    Posted 10 days ago

    Hi Thomas,

    Have you had a look at creating a distributed file ? Where the 8 files would be partfiles of a Distributed file.  Once you create the index on the header file you can use it on the header file (across all 8)  or on the individual part files.

    Regards,

    Greg



    ------------------------------
    Greg Livingston
    Universe Practice Lead
    OUTsurance Insurance
    Centurion GA ZA
    ------------------------------



  • 9.  RE: one index for multiple files?

    Posted 10 days ago

    Hi Greg,

    this is a very good poin. I have never heard or thought about distributed files. By reading the  UV-Documentation it turns out that it could be worth to investigagte further, although i think the algorythm could be a bit tricky because i cannot split on a complete record. Instead i have to split the record itself based on the field-positions.

    but i would defnetly investigate this and try to create one in a test-envornment.

    best regards



    ------------------------------
    Thomas Ludwig
    System Builder Developer
    Rocket Forum Shared Account
    ------------------------------



  • 10.  RE: one index for multiple files?

    Posted 7 days ago

    My 1st thought, Like Mr. Livingston, was distributed files.   The indexes are maintained individually at the part-file level  but, can be referenced from the distributed file level.  Quite beautiful.

    The tricky bit for you might be lie in that the ID determines which part-file the data record actually resides in, according to an algorithm you write. 

    The easiest case for DFs  is that keys like A12345, B12345 distribute to the 1st & 2nd part-file,  but one can get quite creative, even to the point of having some sort of external control array or table.

    All of that will have a performance hit, but this approach seems safer than gerry-rigging a shared index across separate files.   With the speed of today's, let alone tomorrow's system, performance may not even be an issue, i.e.,  only  "Danger Will Robinson. Danger!", while nary a tingle of any "spidie sense".



    ------------------------------
    Chuck Stevenson
    DBA / SW Developer
    Pomeroy
    US
    ------------------------------



  • 11.  RE: one index for multiple files?

    Posted 7 days ago

    Remember that distributed files can only partition based upon information in the record ID. The easy way to think about this is that, while you have the whole record to examine for partitioning when you write the record, you have only the ID when you go looking for it.



    ------------------------------
    Mark A Baldridge
    Principal Consultant
    Thought Mirror
    Nacogdoches, Texas United States
    ------------------------------



  • 12.  RE: one index for multiple files?

    Posted 10 days ago
    Edited by Mike Bojaczko 10 days ago

    If the accounts are on the same machine, a pointer from each account to one item file with one index might work. 

    If speed is all you want, file resizing could give you benefits as well. Optimized file parameters can change the world.



    ------------------------------
    Mike Bojaczko
    PROGRAMMER ANALYST
    US
    ------------------------------



  • 13.  RE: one index for multiple files?

    Posted 10 days ago

    Thank you Mike, 

    speed was not the initial-reason for investigating this the intial reason was that there was one index for two files for a while because of an error by an account-duplication  and it worked without problems.

    The item's-File currently has about 1.700.000 Items and works well with file-type 30.

    But my thought was "if it is possible to speed up without to much effort -> why not."

    I think i keep the system like it is at the moment but learned a lot because of this thread especially about distributed files.

    best regards



    ------------------------------
    Thomas Ludwig
    System Builder Developer
    Rocket Forum Shared Account
    ------------------------------



  • 14.  RE: one index for multiple files?

    Posted 10 days ago
    Edited by Mike Bojaczko 10 days ago
    Thomas,
     
    Disclaimer, i don't use Windows OS for production.
     
    To the first question, it sounds unpredictable. I wouldn't do it.
     
    To the question of file tuning. The seconday indexes should be very helpful already and i would not change anything without testing and analysis. Also, there is system tuning that can be explored.
     
    I only use index files in extreme big data cases, mostly because it's new to me, and i don't want to roll my own as others have from back in the golden years of pick. Along with that, i only use dynamic files with uv index because i don't want to worry about a static hashed file. I run with a one db file to one index file config and no problems so far.
     
    Static files can fly fast but they should be monitored. If they need resized then you are looking at downtime for the files. I've fallen into the trap of oversized static files to avoid overflow but it seems wasteful. In the past, I built a process that will select a prcentage of sample records from a target file, create multiple static hashed files with different parameters, run tcl and basic tests which are based on cpu cycles time and elapsed time, then write the test results to a file for reporting. For my big data proceses that are built into front end user programs i use default dynamic / type-30 64bit files with a uv secondary index on some db attributes and dict i-descriptors. That is the only way to go in some of my cases. The secondary indexes speed up the Tcl and Basic operations so, its a win win. It is possible to tune dynamic files but i cant help much with that. I can't say much about distributed files other than, I've tested with them but never put them into production.
     
    The file and system tuning rabbit holes run deep. There are a ton of IQ points around on this forum, not me,  and I've gained priceless info from some awesome people. You could get some file and or system tuning fourm threads going for a good price ;-) . I would say there are people here that can help for fun and profit and some that feel compelled for the sake of knowledge and learning. Also on this forum are people who have published technical articles that are not easily found on the web unless they share as Pick db forums, publications, and db maintenance utilites have come and gone over the years. It's not a job if it's something you love, its a hobby! And, if you're lucky, you get paid for it! Hope this helps.



    ------------------------------
    Mike Bojaczko
    PROGRAMMER ANALYST
    US
    ------------------------------



  • 15.  RE: one index for multiple files?

    Posted 7 days ago

    Hi all,

    thanks for your thoughts its impressive how many great answers and effort is spent!


    in summary of this tread I will keep the system like it is for now especially because we are not facing performance issues at the moment. But made a todo to examine the distributed files in near future.

    @Mike Bojaczko I also would like to "NOT" use windows in production system, but the Linux knowledge in our Team (what includes me :-))  is to weak to have a linux machine for universe running, at least for now...

    best regards



    ------------------------------
    Thomas Ludwig
    System Builder Developer
    Rocket Forum Shared Account
    ------------------------------



  • 16.  RE: one index for multiple files?

    Posted 7 days ago
    Edited by Mike Bojaczko 7 days ago

    Sorry Mark, I accidentally replied private...

    I wonder of the @WHO could be used in the DF algorithm or one of the attributes returned from the Basic STATUS statement like the I-node or the Absolute pathname.



    ------------------------------
    Mike Bojaczko
    PROGRAMMER ANALYST
    US
    ------------------------------



  • 17.  RE: one index for multiple files?

    Posted 7 days ago
    Edited by Chuck Stevenson 7 days ago

    Just remember that the algorithm has to be good for consistently retrieving existing records from the correct part file, not just where to write a new record.  In other words, a given record ID must resolve to the same part-file, regardless of what account a user is currently in, regardless of the current value of @WHO.



    ------------------------------
    Chuck Stevenson
    DBA / SW Developer
    Pomeroy
    US
    ------------------------------



  • 18.  RE: one index for multiple files?

    Posted 6 days ago

    The inode changes upon RESIZE. IAM can override the account name.

    Administrators can move files and confound the path, but that would probably be you, so it is all within your control. All it does is complicate the admin work. The full path, by default, to the index directory lives in the file header. We already need to copy production files with care so a test file update will not change a production file index. 

    One project added a new month file to a distributed file, as well as combining and integrating 12 monthly files into a year file. It was worth building a set of programs to take the old distributed file apart, and reassemble it using parameters for the number of year and month files. At least for the dozen or so sets of these files, when we messed them up, they would all be consistent!



    ------------------------------
    Mark A Baldridge
    Principal Consultant
    Thought Mirror
    Nacogdoches, Texas United States
    ------------------------------