ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Fetch Duplicate Records based on Multiple fields

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fetch Duplicate Records based on Multiple fields

    Hi,
    I have a table/file which has duplicate records (Lets say there are 10 fields in a record. Out of which, if 5 fields put together are same across more than 1 record, then it is considered as a duplicate). I need to delete those duplicate records alone using a SQL statement embedded in a SQLRPGLE program.

    1) I am able to successfully fetch the Total duplicate records using
    SELECT Fld1, Fld2.. Fld5 From Table Group By Fld1.. Fld5 Having Count(*) > 1
    Note: It returns only 1 record per set of duplicate records.
    2) I want to delete all of those duplicate records returned by the above SELECT statement. I tried with DELETE statement. It is not working.

    Basically I need to write an SQL statement which returns the result of all the duplicate records alone individually(Not 1 record per set of duplicates). And also need to delete the fetched duplicate records.

    PLEASE HELP ME WITH THE SOLUTION. Heard that it is possible with SQL CURSORS. Haven't tried it yet. Any other way ?

  • #2
    If you want to retrieve all rows, you then have to join the result of the SQL-Statement with the Table again (using the duplicate keys).
    If you want then to delete all duplicate rows, I'd either exclude the lowest or highest RRN of the duplicate keys ... and then delete the rows.

    For determining the relative record no of the rows to be deleted:
    Code:
    With x as (Select Key1, Key2, Key3, Key4, Key5, Min(RRN(a)) MinRRN
                                    From MyTable a
                                    Group By Key1, Key2, Key3, Key4, Key5
                                    Having Count(*) > 1)
    Select RRN(b) RRNDlt
      from x join MyTable b on     x.Key1 = b.Key1
                               and x.Key2 = b.Key2
                               and X.Key3 = b.Key3
                               and x.Key4 = b.Key4
                               and x.Key5 = b.Key5
                               and RRN(b) <> MinRRN;​
    If you want to include this SELECT-Statement into a DELETE statement, you will learn, Common Table Expressions are not considered ... but you still can use nested sub-selects.
    The following DELETE Statement will remove all duplicates (note I'll keep the row with the lowest relative record no:
    Code:
    Delete from MyTAble c
      Where rrn(c) in (Select RRN(b) rrnDlt
                         from (Select  Key1, Key2, Key3, Key4, Key5,
                                       Min(RRN(a)) MinRRN
                                 from MyTable a
                                 Group By Key1, Key2, Key3) x
                              join MyTable b on     x.Key1 = b.Key1
                                                and x.Key2 = b.Key2
                                                and X.Key3 = b.Key3
                                                and x.Key4 = b.Key4
                                                and x.Key5 = b.Key5
                                                and RRN(b) <> MinRRN);​

    Comment


    • #3
      Thank you very much B.Hauser for your response.

      I will try this code. However, I don't want to keep even a single record of the duplicate record sets. So, RRN based selection not required, I believe. Also curious to know, is there any simple method exists to solve this problem ?

      How about doing this using Cursor ? Because I need to fetch each duplicate record and print it in an error report and delete it completely from the Table.

      Comment

      Working...
      X