ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Exclude deleted records in my COUNT(*)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exclude deleted records in my COUNT(*)

    Is there a way to exclude records that are "marked for delete" in a physical file from my SQL COUNT(*)? The following SQL results in a count of 7 million. I would expect the 16 million number but that is not happening. I believe this is because the physical file is reusing deleted records (REUSEDLT). I have tried IS NOT NULL and other field comparisons, but always get the same number.

    SELECT COUNT(*) FROM GANCMFILES/CCEKCPP
    WHERE RRN(CCEKCPP) >= 1 AND RRN(CCEKCPP) <= 16164260
    ​​
    Attached Files

  • #2
    When you say "marked for delete", do you mean marked as deleted by the OS or do you have a deleted flag in the record? If you mean marked as deleted by the OS, those records are already excluded from COUNT(*). COUNT(*) should return the same value as the current number of records for the member in DSPFD.

    Comment


    • tdavis
      tdavis commented
      Editing a comment
      Yes I know they are excluded. Is there a way to include them in the count?

    • Brian Rusch
      Brian Rusch commented
      Editing a comment
      Sorry, I was confused since in your original post you asked "Is there a way to exclude records that are "marked for delete" in a physical file from my SQL COUNT(*)?" There is not a way that I know of to include them in the COUNT(*) function.

  • #3
    I agree with Brian - but I also find myself wondering why you are constraining the search by RRN?

    Comment


    • tdavis
      tdavis commented
      Editing a comment
      Our replication software, for initial loads only, is not efficient. We found that copying a view, or multiple views simultaneously, saved quite a bit of time (20 minutes versus 7 minutes for a file that has 200 millions records). So, to create those views I am splitting up the total record count and then creating those views using the RRN range. The example below would be the first in a number of CREATE VIEW statements run. When it runs, it does not select the 1.6 million rows, it selects far less than that, which leads me to believe that the deleted records (marked by OS400 not a field value) are being included in that range comparison.

      Create VIEW GGINITLOAD/GANCMFILES_CCEKCPP_GG_VW_1 for system name GGGAN00002 as
      Select * From GANCMFILES/CCEKCPP Where RRN(CCEKCPP) >= 1 and RRN(CCEKCPP) <= 16
      136473

  • #4
    When it runs, it does not select the 1.6 million rows, it selects far less than that, which leads me to believe that the deleted records (marked by OS400 not a field value) are being included in that range comparison.
    Yes, they will be. The RRN is a physical record number. i.e. the relative position in the file. Think houses - just because number 24 is currently unoccupied (deleted record) does not mean that house 26 has to be renumbered.

    Hopefully, your software doesn't actually use the RRN because in the replica you are creating the RRNs of the records will not match that of the original.

    Another thought. Why not re-ord the files to remove the deleted records before you attempt to replicate them?​

    Comment


    • tdavis
      tdavis commented
      Editing a comment
      Yes, reorg the files should resolve the problem, but that is not possible during business hours. But, I am not sure when they want to run this utility. Perhaps that is the only solution. I do understand what a deleted record is. We have utilities that can "undelete" them if that is ever necessary. I was hoping there was some way to identify them in order to compensate for this predicament.

    • Brian Rusch
      Brian Rusch commented
      Editing a comment
      You don't have to have an exclusive lock on a file to reorganize it anymore. Take a look at the ALWCANCEL and LOCK parameters of the RGZPFM command to see if that may meet your needs.

  • #5
    I agree with JonBoy. What you might try is to first reorganize the physical file to remove the deleted records and then run your replication software to see if that improves the time for the initial load. If it does, then there may be no need to split up the file into chunks with multiple views.

    Comment


    • #6

      ...We have utilities that can "undelete" them if that is ever necessary. ...
      That probably only works if the deleted record has not been re-used. Sounds like a tool of limited value, compared to recovering from the journal.
      Last edited by MFisher; April 20, 2023, 06:57 AM.

      Comment


      • #7
        I am not sure if I understand this correct.
        But for me it looks like the replication software works something like this:
        read a record, write it to the replication file, delete the record just read.
        This means that more and more records in the beginning of the file are deleted.

        When reading the file it starts from record number 1 searching for the first record not deleted.
        This means that all deleted records are read one by one and this takes a long time. Every time !
        I have experienced this in an application more than 25 years ago.

        If it is possible to build an index over the physical file you could define a logical file using the FIFO ( first in first out )
        keyword. Thus the arrival sequence within the key is kept.

        If it is a physical file you are accessing with relative record number using RPG SETLL and READ then you could save
        the first record number for an active record and use that for the SETLL.

        If the physical file is defined with a unique key then you could specify it with reuse deleted records.

        Comment


        • #8
          Using the relative record no when accessing with SQL is not a good idea. Isn't there a unique key you can use for splitting your data?
          When using the relative record no (IMHO) a table scan may be performed and the deleted rows (which are still there) will be read but not copied.
          When using a unique key (in composition with an index) only the selected rows will be read

          Comment


          • #9
            I am now adding the current number of records to the deleted number of records and using that total to get the from and to range numbers. Still using RRN, but the view will include those records number; that is, instead of from 1 to 16,136,476 (for the first view) I am using 1 to 26,630,814. This is for a file that has 258,848,929 current records and 167,244,091 deleted records. I want to create 16 views.

            Create VIEW GGINITLOAD/GANCMFILES_CCEKCPP_GG_VW_1 for system name GGGAN00002 as
            Select * From GANCMFILES/CCEKCPP Where RRN(CCEKCPP) >= 1 and RRN(CCEKCPP) <= 26630814​

            Comment

            Working...
            X