ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Varlen Value To Use In File With Mostly Blank Fields

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Varlen Value To Use In File With Mostly Blank Fields

    I have a file that has 20 pairs of fields, a Header (30 chars) and Text (256 chars), and I am trying to determine the best use of VARLEN to reduce the size of the file.
    Although there are 20 pairs of fields in each record, most are empty, and in 90% of cases there is only 1 pair that has a value, the rest will be blank.
    What would you recommend for the VARLEN value?

    Below is an extract of the DDS:
    Code:
    A          R RPRDFLDS                                       
    A            DIMITM         5S 0       COLHDG('Item' 'No')  
    A            DIMH01        30          COLHDG('Header' '01')
    A            DIMT01       256          COLHDG('Text'   '01')
    A                                      VARLEN               
    A            DIMH02        30          COLHDG('Header' '02')
    A            DIMT02       256          COLHDG('Text'   '02')
    A                                      VARLEN
    I didn't create this, if I did I would have used multiple records with 1 pair per record. I am just trying to improve the file storage requirements.

    The file is populated by CPYFRMIMPF and does have trailing blanks stripped.

    Thanks for your advice on this.
    Poddys Rambles On

  • #2
    Re: Varlen Value To Use In File With Mostly Blank Fields

    I seem to recall reading a few years ago that if you do not force allocation of space with the keyword (e.g. VARLEN(25) ) then all of the storage space is allocated outside of the physical file object. This means that none of the data for the field is stored in the actual PF which forces the compiler to perform two reads - one for the PF and one for the additional storage area.

    Supposedly a good strategy if the majority of your fields are empty...because it reduces the size of the PF.

    Other folks who have experience will likely chime in here...

    Good Luck

    Comment


    • #3
      Re: Varlen Value To Use In File With Mostly Blank Fields

      Thanks Terry. That makes sense, if that is the way that the data is stored. If so, I would be better to set the VARLEN size of the first of the Text values to a length that covers maybe 80% of the data, and to leave the rest of the values (19 of them) to default outside. Probably 75-80% of the records in the file only have 1 value (the first 1), the rest are blank.
      Poddys Rambles On

      Comment


      • #4
        Re: Varlen Value To Use In File With Mostly Blank Fields

        FWIW...

        You might run one or more queries to list the SQL LENGTH() values. Sorting those might give you a curve that lets you determine a "sweet spot".

        I can't recall all of the guaranteed physical details, but a variable-length field consists of a couple parts out on disk. The first part is in the fixed-length area of each record. It consists of either (1) a 16-byte pointer plus a minimum of 13 bytes up to something like maybe 18 bytes of padding to ensure proper pointer alignment in the buffer, or (2) the field data if it fits in the allocated size plus two or four length bytes. The second part exists only if the column data is larger than the allocated size, and it would be placed out in the "overflow" area. That data would be what the pointer points to if the first part contains a pointer. (I once worked out why the "padding" had to range from the odd sounding "13 to 18 bytes", but I've mostly let that slip from memory. The padding can vary column by column because alignment varies depending on where the column appears in the buffer. IIRC, it might be possible to reduce physical record size by rearranging the order of the columns in the buffer in order to maximize alignments, if you know the details of the rules used by DB2.)

        Anyway, one point is that, in terms of actual physical size on disk, there will be some amount of bytes that DB2 will use as a minimum in the fixed-length portion for each varying-length column no matter what you specify for an allocation.

        Also, space in the overflow area is automatically reused. There is no attribute like "REUSEDLT()" for that like there is for the fixed-length portion. It happens no matter what. However, it only happens when a new data value fits entirely within a previously deleted value space. So if you add a row that has a 50-byte value and later delete that row, there is a 50-byte 'hole' in the overflow area until a new column is inserted or updated that is 50-bytes or less. E.g., a 45-byte value can fit in it fine. But that now leaves a 5-byte 'hole' that possibly will never be filled.

        The point there is that the overflow area can become very fragmented with lots of small 'holes' throughout its space. If your file has a large number of rows and the rows are very volatile, a large amount of wasted space can result. There is no automatic 'defrag' function that keeps it cleaned up. The overflow is defragged when you run RGZPFM through to completion. (The recent ALWCANCEL() parm is not helpful for this.) I *think* some useful defragging might also happen if you run ALTER TABLE with a new allocate-clause for the columns, but I've actually seen at least one table increase in size when doing that. I kind of recall that one difference was that specifying a new ALLOCATE() didn't truncate the date space where RGZPFM would, or something like that. Maybe not.

        Anyway, with all of that said, I'll caution you not to take any detail as authoritative fact. It's all from memory from a few years ago when working with IBM Support (Partner World) on a very specific problem. It's mostly probably close to correct up to i 7.1, but technology refreshes or PTFs might have altered or eliminated one or more details. And maybe more important for this forum, somebody else might see this and offer corrections or new data. That's a big reason I decided to write all of this.

        If this is a very large volatile file, you might contact IBM Support for any guidance they can supply. Bear in mind that a reduction in space might be your current primary objective, but performance also might become a significant concern at some point. With 20 varying-length columns, it's conceivable that it could take 21 disk accesses to read a single record. For your file, it'd be infrequent for more than two apparently; and if the value lengths usually fit within the minimum that DB2 always uses, maybe those values don't get into the overflow area anyway.

        If this file mostly has null varying-length values yet is enough of a problem to cause storage space concerns, it would seem to usually have a large number of records. If it only has a few records, there might never be a storage problem. But maybe it simply had a large number at one point and never had RGZPFM run against it.
        Tom

        There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.

        Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth?

        Comment


        • #5
          Re: Varlen Value To Use In File With Mostly Blank Fields

          Thanks very much Tom, that information was invaluable and included a few things that I was not aware of.

          Given the relatively small size of this file and the fact that it changes infrequently I don't think I have to worry about it, however there are some files on the system which have variable length fields and which are a bit more volatile. Again in the scheme of things they are not huge (30,000 records not 30 million) so frequent RGZPFM's are not necessary.

          Appreciate the help.
          Poddys Rambles On

          Comment

          Working...
          X