Announcement

**Vectorspace** · July 27, 2021, 01:57 AM

1) To delete duplicate records from a file.
How would you define a duplicate? Two records with the same key, or two records where every column is the same?

2) To show distinct count of a customers if he has more than one department id
Do you mean that if the customer as multiple department IDs, then there is one record where Dept ID = "a,b,c,d,e,f" - i.e. it is a comma separated list of dept IDs?
If so, then you can use

Code:

select CustomerId,
  regexp_count(DeptId,'[^,]*[A-Za-z0-9]+[^,]*') as DeptIdCount
  from myTable;

regexp_count returns the number of locations in the string in parameter 1, that match the regexp pattern in parameter 2
The regexp pattern matches any string that does not include a comma, and contains at least 1 character in the ranges A-Z, a-z, and 0-9 (I assume that your department IDs should be alphanumeric)

3)
Your example is not sorting rows, it is sorting columns?
E.g. row 2, in example 1 is fld1 = 25, fld2 = 60, fld3 = 18
And in example 2 is fld1 = 18, fld2 = 25, fld3 = 60
That isn't sorting query results, that's changing column values. I'm not sure about that. Are you sure about what you're asking for here? Please confirm what you want to achieve

**Vectorspace** · July 29, 2021, 02:15 AM

#1 is fairly easy if matching on a key, or a subset of the columns. You just have to pick a way to order the rows such that the record you want to keep is first. If there is no particular way, you can use the relative record number. My example below keeps the record with the oldest (smallest) rrn.
(note I have not tested this, but the structure should be about right. And there may be more efficient ways)

Code:

delete from myTable a
  where exists (
    select 1 from myTable b
      where (a.fld1, a.fld2,a.fld3) = (b.fld1, b.fld2,b.fld3)
        and rrn(b) < rrn(a)
        limit 1);

This will delete a row only if there exists another row with the same fld1/2/3, and has a lower rrn (i.e. if there is a duplicate with a lower rrn). The record with the lowest rrn will stay as there will be no record with as lower rrn to join to

Matching on all columns is very difficult. Offhand I'm not sure how, except by using the above query and matching on all columns, which would be very inefficient.

#3. So when you say you want to sort rows, you mean you want to sort the columns in each row. Is this just for display purposes, or do you want to actually update the table?
If it's for display purposes, then you might be able to accomplish it by "unpivoting" the table (convert columns to rows), then sorting the results, then "pivoting" those results (convert rows to columns). See here for info on both pivot and unpivot: http://www.dbatodba.com/db2/how-to-d...ws-to-columns/
If you want to update the table, then I would read each row into an SQLRPGLE program, read each column into an array, sort the array, then update the row with the sorted values

**Vectorspace** · July 29, 2021, 02:48 AM

#1 is fairly easy if matching on a key, or a subset of the columns. You just have to pick a way to order the rows such that the record you want to keep is first. If there is no particular way, you can use the relative record number. My example below keeps the record with the oldest (smallest) rrn.
(note I have not tested this, but the structure should be about right. And there may be more efficient ways)

Code:

delete from myTable a
  where exists (
    select 1 from myTable b
      where (a.fld1, a.fld2,a.fld3) = (b.fld1, b.fld2,b.fld3)
        and rrn(b) < rrn(a)
        limit 1);

This will delete a row only if there exists another row with the same fld1/2/3, and has a lower rrn (i.e. if there is a duplicate with a lower rrn). The record with the lowest rrn will stay as there will be no record with as lower rrn to join to

Matching on all columns is very difficult. Offhand I'm not sure how, except by using the above query and matching on all columns, which would be very inefficient.

#3. So when you say you want to sort rows, you mean you want to sort the columns in each row. Is this just for display purposes, or do you want to actually update the table?
If it's for display purposes, then you might be able to accomplish it by "unpivoting" the table (convert columns to rows), then sorting the results, then "pivoting" those results (convert rows to columns). See here for info on both pivot and unpivot: http://www.dbatodba.com/db2/how-to-d...ws-to-columns/
If you want to update the table, then I would read each row into an SQLRPGLE program, read each column into an array, sort the array, then update the row with the sorted values

Announcement

SQL query

SQL query

Comment

Comment

Comment