Open-source Languages & Tools for z/OS

 View Only
  • 1.  Fread() does not work for large csv file

    Posted 11-20-2017 07:53

    z/OS V2.1
    R 3.3.2

    I used fread() function provided by data.table package to read csv file.
    I got the following error when I tried to read a large csv file (about 5GB).

    Error in fread(fileName, header = FALSE, data.table = FALSE) :
      Opened file ok, obtained its size on disk (0.0MB), but couldn't memory map it. This is a 64bit machine so this is surprising. Please report to datatable-help.
    

    It’s ok when I use smaller file (about 1GB).

    Do you have any idea to solve this issue?

    Regards.
    Tomohiro Taguchi



  • 2.  RE: Fread() does not work for large csv file

    Posted 11-20-2017 10:22

    The z/OS implementation of mmap can only use storage “below the bar”, i.e., in the 31-bit portion of the address space. I recently learned this in the context of a similar problem in another piece of software. See:

    https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.bpxb100/mmp.htm

    Note 14 says:

    The mmap service is not enabled to map storage above the 2-gigabyte addressing range.

    I’m not all that familiar with R; does its fread function have an option to tell it to use ordinary I/O instead of mmap?



  • 3.  RE: Fread() does not work for large csv file

    Posted 11-21-2017 19:13

    Thank you for your response.

    Unfortunately, there is no option to switch the internal I/O method for fread() function.

    I found the following description in R_README.ZOS.

    Everything is built 64 bit and in ASCII mode.
    

    fread() function is included in a library which is packaged in R 3.3.2 for z/OS.
    Can’t it use above the 2GB addressing even though 64bit mode?

    regards.
    Tomohiro Taguchi



  • 4.  RE: Fread() does not work for large csv file

    Posted 11-22-2017 08:59

    As noted in the IBM documentation mentioned above, the mmap system call in z/OS does not support the use of memory over the 2GB limit imposed by 31-bit addressing, regardless of whether or not the program making the call is a 64-bit program.

    The solution is to modify the fread() function to use an alternate means of loading data (meaning, allocate memory above the 2GB bar and fill it with ordinary read() calls) if the mmap fails. However, there will not be any changes to R in the near term.