Announcement

**TedHolt** · May 17, 2022, 06:26 AM

Join the files. Build indexes over the join columns.

**Victor Pomortseff** · May 22, 2022, 06:06 AM

Hi

It's not entirely clear what's going on there.
Do I understand correctly that you get some ID from file A and then using it as a key you read the record from file B?
And then what? Is it just to read and that's all, or is there some additional processing?
It seems to me that the problem is not actually in reading the record, time is spent on processing. And even if you switch to sqlrpgle, create indexes, and link files with join, it won't improve things much.
If you optimize "on the little things" - file A must be opened in block reading mode (block(*yes) in the file declaration, provided that setll + read is used, not chain and not reade).
For sqlrpgle it is also faster to read in blocks - declare an array ds where the result is placed and use a construction like

Code:

EXEC SQL FETCH currACY1 FOR :items2Read ROWS INTO :sqlData;

where sqlData is the ds array where the result is placed, and items2Read is the size of this array (number of rows read by the query in one call)

But when it comes to processing really large amounts of data, it can be accelerated only through parallelization.
This will require

head job
conveyor
several job processors

It works as follows.
The head job is started, which creates the conveyor. As a conveyor, you can use a queue - *DTAQ or *USRQ (the first one is easier to work with, the second one is 4-5 times faster and about the same time more economical in terms of using processor resources).
Next, the head job launches the required number of processor jobs (this is the same program, several instances of which are launched in separate batch jobs). SBMJOB is used to start (or if you are writing in C, you can use the spawn function).

Further, when the conveyor is created and the handlers are launched, the head job selects data (in your case, reading the ID from file A), forming packages (for example, 100 IDs per package) and inserting them into the conveyor.
Processors take packages from the conveyor and process the IDs contained in the package (reading from file B and what needs to be done with the received data there).

When the data is over, the head job puts special empty "terminator" packets onto the conveyor. Their number is equal to the number of processor jobs. When the processor receives such a packet from the conveyor, it understands that there is no more data and exits.

This approach will allow you to speed up processing as many times as you run the number of processor instances.

This description is in the most general terms. If you are interested, I can tell you the details and subtle points associated with this processing method.
We use this approach quite often and there are ready-made developments and templates.

**TedHolt** · May 25, 2022, 08:16 AM

I appreciate Victor's post. There's a lot of good information there. Certainly more than what I put in my previous post in this thread.

This reminds me of some COBOL programs I rewrote a few years ago. Each program had three loops nested inside one another and implemented as SQL cursors. The programs had been converted from a CICS COBOL program that was doing get-next-within-parent operations in the two inner loops. Each program took about half an hour to run.

I replaced the nested loops with a join and each program ran in about a minute or so.

My data sets were not nearly as large. I was not dealing with 3 million rows. But going from a half hour to a minute was good enough for me. There was no need to optimize further.

So, Victor's parallelization idea is good, but before I went to all that trouble, I would join the files, build the indexes, and see if the new run time was good enough.

**TedHolt** · May 25, 2022, 09:56 AM

Iceberg Use CREATE INDEX to build indexes. Use Visual Explain and the Index Advisor in Navigator to help you know which indexes to build.

Victor Pomortseff I want you to understand I'm not arguing with you. I appreciate your sharing your experience. It may be just what Iceberg needs.

**Iceberg** · May 26, 2022, 06:06 AM

Ok so what about *USRQ, What exactly *USRQ is and how it is different from *DTAQ and how we can use to handle large number of data.

Thanks

Announcement

Optimize code

Optimize code

Comment

Comment

Comment

Comment

Comment