ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Long batch CL - need to have a better way to recover

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long batch CL - need to have a better way to recover

    Hi, all,

    Twice a day we run a user-submitted batch program (CLP) that calls multiple programs, clears files, sorts files (FMTDTA) etc. I've found that some of the old COBOL programs that get called don't do error handling very well. I'm working on fixing those, but it made me start thinking.

    On the mainframe, with JCL we can restart in a STEP. I'd like a way to do that in CL. Originally, I thought I could clone the monster program and use parameters and labels, but "GOTO" doesn't like having a variable, and that seems wrong and clunky.

    The last time we had problems, I ended up entering each command from the CL manually - a long, tedious, and dangerous process. I suppose that I could clone the source and remove the parts that I don't want to run, but I was trying to avoid having to update the program on the fly. I'd rather have a repeatable process that we can easily follow.

    The time before last, a key program wasn't handling duplicate records. It would just write an error report message and go on it's way - our plants didn't recover for days - and that cannot happen! I'm working on fixing the old COBOL program by doing this:

    PHP Code:
    0192.00                                                                                               
    0193.00                  CALL       PGM
    (AA0055PARM(&RETURN) /* ASSIGN SCHED ID TO SA  */                                                                                 
    0193.03      
    0193.09                  
    IF         COND(&RETURN  *GT  '0000'THEN(DO)                               
    0193.10                  SNDUSRMSG  MSG('AA0055 COULD NOT ASSIGN SCHEDULE IDS.  +                    
    0193.11                               STOP AND CONTACT APP SUPPORT.  Do NOT +                       
    0193.12                               continue.  Job will End.'
    )                                      
    0193.13                  GOTO ENDCL                                                                   
    0193.14                  ENDDO                                                                        
    0193.15                                                                                               
    0194.00                  CALL       PGM
    (AA0240/* UPD QTY ON SA HDR      */ 

    In the snippet above, I call AA0055 and if that program has a "duplicate record" error, I come back here and end the CL - if everything is OK, we go on to AA0240. It takes effort for AA0055 to get a duplicate record error, but if it happens, it can bring the plant down.

    One crazy idea that just crossed my mind is to have the user "submit" the job from the Halcyon scheduler....If we had the job submit and then the user does something to release it, then Halcyon could handle running each step rather than it all being in a giant CL..... I need to think on that one.

    If anyone has encountered something similar or has any ideas or key words that I should try on GOOGLE, I'd appreciate it. I hope everyone is doing OK. Thank you for reading my post!

  • #2
    Ugh! I used to have to maintain old procedures like this at a former job. When something crashed, what I'd do is make a custom copy of the CL that didn't do the earlier steps (the ones that succeded) but just did the later steps that were needed to pick up where it left off. Then, run that to get things straightened out.

    Eventually I rewrote the whole process to handle problems better. It no longer relied on legacy stuff like FMTDTA and the files no longer allowed duplicate keys, so it couldn't crash on that, and so forth. In other words, I made it "dummy-proof". Also before doing each step, it would check to see if it had already done it (by looking for the output of that step) and if so, would skip it, so you could always restart it. It was quite a lot of work to rewrite it, but honestly, over the course of the next year or two, it more than paid for itself in the time I'd save having to "put out fires."

    Anyway... simply use IF statements to skip each part if that part has already been done. How you check whether it has been done will depend on how the code works...

    Comment

    Working...
    X