ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Optimize code

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimize code

    Hi All,
    I have a program which reads more than 30 lakhs record from a file A in a loop (i.e picks each customer ID ) and within this loop its again reading another file B for each customer. So cause of this multiple loop this pgm takes more than 12 hrs.
    So is there a way to optimize this pgm like using sqlrpgle.

    Thanks

  • #2
    Join the files. Build indexes over the join columns.

    Comment


    • #3
      Hi

      It's not entirely clear what's going on there.
      Do I understand correctly that you get some ID from file A and then using it as a key you read the record from file B?
      And then what? Is it just to read and that's all, or is there some additional processing?
      It seems to me that the problem is not actually in reading the record, time is spent on processing. And even if you switch to sqlrpgle, create indexes, and link files with join, it won't improve things much.
      If you optimize "on the little things" - file A must be opened in block reading mode (block(*yes) in the file declaration, provided that setll + read is used, not chain and not reade).
      For sqlrpgle it is also faster to read in blocks - declare an array ds where the result is placed and use a construction like

      Code:
      EXEC SQL FETCH currACY1 FOR :items2Read ROWS INTO :sqlData;
      where sqlData is the ds array where the result is placed, and items2Read is the size of this array (number of rows read by the query in one call)

      But when it comes to processing really large amounts of data, it can be accelerated only through parallelization.
      This will require
      • head job
      • conveyor
      • several job processors
      It works as follows.
      The head job is started, which creates the conveyor. As a conveyor, you can use a queue - *DTAQ or *USRQ (the first one is easier to work with, the second one is 4-5 times faster and about the same time more economical in terms of using processor resources).
      Next, the head job launches the required number of processor jobs (this is the same program, several instances of which are launched in separate batch jobs). SBMJOB is used to start (or if you are writing in C, you can use the spawn function).

      Further, when the conveyor is created and the handlers are launched, the head job selects data (in your case, reading the ID from file A), forming packages (for example, 100 IDs per package) and inserting them into the conveyor.
      Processors take packages from the conveyor and process the IDs contained in the package (reading from file B and what needs to be done with the received data there).

      When the data is over, the head job puts special empty "terminator" packets onto the conveyor. Their number is equal to the number of processor jobs. When the processor receives such a packet from the conveyor, it understands that there is no more data and exits.

      This approach will allow you to speed up processing as many times as you run the number of processor instances.

      This description is in the most general terms. If you are interested, I can tell you the details and subtle points associated with this processing method.
      We use this approach quite often and there are ready-made developments and templates.

      Comment


      • Iceberg
        Iceberg commented
        Editing a comment
        Yeah got your point but how "packed" them into a package i.e what exactly is package?

      • Victor Pomortseff
        Victor Pomortseff commented
        Editing a comment
        In my particular case, the package has the following structure:

        // client data (its unique identifier)
        Dcl-Ds dsDataT Qualified Template;
        cus Char(6) inz(*blanks);
        clc Char(3) inz(*blanks);
        End-Ds;

        // The data structure of the exchange between the head and child threads
        Dcl-Ds dsPacketT Qualified Template;
        count int(5) inz(*zero); // Number of transferred data in the array
        data LikeDs(dsDataT) Dim(100); // Data array
        End-Ds;

        We read data from file A (cus + clc) and put them in the next element of the dsPacket.data array, increase the dsPacket.count counter. As soon as the array is completely filled, we send dsPacket to the queue, clear it and fill it again. And so on until the data runs out.

        The handler reads a packet from the queue and in a loop from 1 to dsPacket.count calls the processing function for a specific client, passing it dsPacket.data(i).

        Since there are several handlers (for example, 10), not one, but 10 clients are processed at the same time.

        But, I repeat, all this makes sense when you need to process many elements of the same type (each is processed according to the same algorithm and the elements are independent of each other - the order of their processing is not important), and processing one element takes a long (relatively) time.

        The fact is that all this is associated with some technical problems:
        The main task should ensure that the queue does not overflow - control the degree of filling of the queue and when it exceeds a certain threshold value (I usually take 75% of the maximum capacity), suspend the distribution of packets.
        The main task must ensure that all handlers are in the active state - none of them went into the *msgw state due to an error.
        The main task, after all the data has been queued, goes into the waiting state (usually I queue several, by the number of handlers, empty packets with dsPacket.count = 0 - having received such a packet, the handler understands that there is no more data and exits ) until all handlers have completed their work, after which it removes the queue and terminates itself.

        I have a ready-made module that implements all the necessary service operations (working with the queue, monitoring the status of handlers, etc.), written in C ++. I just connect it to my programs and use it.

        In the main task it looks like this:

        // Create a queue, run the required number of handlers
        Master = Batch_CreateMaster('*LIBL' : 'ELB07S' : '' : 'ETLPROC' :
        DtaQLib : 'DQELBMAST' : 'DQELBWORK' :
        WorkersCount :
        (%size(t_qdsClient) * SendBlockSize) :
        (%size(t_qdsClient) * SendBlockSize) :
        PingTime : ResendCount : ReceiveTimeout :
        strerror);

        Further in the cycle

        Batch_MasterSendData(Master : pData : DataLen : ErrStr);

        When all the data has been transferred, we wait until they are processed (the queue will empty)

        dow MsgCount > 0;
        Wait_Time(1000);
        MsgCount = Batch_GetWorkerDtaQMsgCount(Master);
        enddo;

        And after that we wait until all handlers are completed

        Batch_WaitWorkersEnded(Master : PingTime : strError);

        and finish the job

        Batch_DeleteMaster(Master);

        In the handler it's easier:

        connect to the queue

        Batch_CreateWorker(ResendCount : RecieveTimeout : strError);

        further in the loop, while we receive some data, we call

        Batch_WorkerRecvData(pBuffer : DataLen : strError);

        If the data is received, we process it. When the data is over -

        Batch_DeleteWorker();

        and finish the job

      • Iceberg
        Iceberg commented
        Editing a comment
        Thanks alot Victor/Ted.

    • #4
      I appreciate Victor's post. There's a lot of good information there. Certainly more than what I put in my previous post in this thread.

      This reminds me of some COBOL programs I rewrote a few years ago. Each program had three loops nested inside one another and implemented as SQL cursors. The programs had been converted from a CICS COBOL program that was doing get-next-within-parent operations in the two inner loops. Each program took about half an hour to run.

      I replaced the nested loops with a join and each program ran in about a minute or so.

      My data sets were not nearly as large. I was not dealing with 3 million rows. But going from a half hour to a minute was good enough for me. There was no need to optimize further.

      So, Victor's parallelization idea is good, but before I went to all that trouble, I would join the files, build the indexes, and see if the new run time was good enough.

      Comment


      • Iceberg
        Iceberg commented
        Editing a comment
        Thanks Ted, Though we have large number of data around 30 lakhs, will try joining these 2 files and try only fields that i need. But can you please clear me with index, like how to build?

      • Victor Pomortseff
        Victor Pomortseff commented
        Editing a comment
        We use parallel processing very widely because the amount of data is quite large.
        Not so long ago, I had to parallelize one process - the processing speed there is about 800,000 elements per hour. Previously, it was about processing 50,000 - 100,000 elements. And suddenly it happened that we were talking about processing 5,000,000 elements ... Naturally, we had to abandon processing in one thread and switch to multi-threaded processing.
        I also had to do tasks for parallel processing of 20,000,000 - 40,000,000 elements.

    • #5
      Iceberg Use CREATE INDEX to build indexes. Use Visual Explain and the Index Advisor in Navigator to help you know which indexes to build.

      Victor Pomortseff I want you to understand I'm not arguing with you. I appreciate your sharing your experience. It may be just what Iceberg needs.

      Comment


      • Victor Pomortseff
        Victor Pomortseff commented
        Editing a comment
        I don't argue either. I just want to say that the stable and secure implementation of this approach is fraught with certain technical difficulties and is justified only where it is really necessary.
        Naturally, you need to start with simpler optimization methods - get rid of unnecessary nested loops, use indexes, use block reading, etc.
        But when all this does not give the desired result, then already think about more complex solutions. And parallelization is still a difficult decision.

    • #6
      Ok so what about *USRQ, What exactly *USRQ is and how it is different from *DTAQ and how we can use to handle large number of data.

      Thanks

      Comment


      • Victor Pomortseff
        Victor Pomortseff commented
        Editing a comment
        *USRQ this is a separate issue. Just recently I dealt with this and made SRVPGM (based on a class in C ++, on top of a wrapper in C and RPG interfaces to it).
        Briefly.
        *USRQ is 4-5 times faster and the same number of times more economical in terms of CPU resource utilization.
        *USRQ can only be local.
        *USRQ is never stored on disk - only the description of the object is stored on disk, all contents are stored only in memory (*DTAQ is stored on disk and when there are too many messages in it, the speed of operation drops significantly).
        *USRQ not journaled
        *USRQ can be created both in the *SYSTEM domain and in the *USER domain (and since working with *USRQ is possible only through MI, it must be created in the *USER domain if your system is running at security level 40+ - at this level for MI does not have access to objects in *SYSTEM domain)
        In general, when it comes to transport for multi-threaded processing, *USRQ would be preferable in my opinion. Although it is more difficult to work.

      • Victor Pomortseff
        Victor Pomortseff commented
        Editing a comment
        By the way, the maximum *USRQ size is the same as *DTAQ - 2GB
        Here you can read the history of my experiments with *USRQ https://stackoverflow.com/questions/...ssages-in-usrq
    Working...
    X