Customer Portal

How does the System Execute Command work related to connections?

Comments 3

  • Avatar
    Lukas Cholasta
    0
    Comment actions Permalink
    Hi pintail,

    I've seen that you have started two very similar threads. Therefore, I will use only this one to, hopefully help you with this issue.

    The script is indeed saved into a file in a temp folder. The exact path can be seen in run log of the graph when you switch the logging level to DEBUG. There might be other useful informations there as well. This can be done in the CloverETL Server console -> Sandboxes -> <sandbox_name> -> Config properties.

    When the connection is made from a Python script then it is not using any Java related connection, therefore, I cannot see how it can be affected by CloverETL Server or WebLogic. I'd suggest to use strace or similar tool to debug the python process.

    I'm also not sure whether I understand correctly your current setup. Is it a SystemExute running the python script on a CloverETL Server while the script is fetching data from some remote webservice running on a non-CloverETL Server?

    Best regards,
  • Avatar
    pintail
    0
    Comment actions Permalink
    Ok, that's what I thought about the file getting written out and executed separately. thanks.

    It's a little odd what's happening. Essentially a high level here is the process...

    1) Jobflow is executed which reads a file sending 500 about input parameters. The jobflow calls a single graph for each new parameter to get passed to the the system execute component, which executes a py script.

    2) When we run this job - every hour so often - it will work fine for a while. Eventually we'll get the error which says "connection refused". This error is generated in the py script.

    3) we've tried running synchronous and asynchronous and it doesn't make a difference.

    4) Architecture is Linux OS, Clover ETL Server, Weblogic.

    5) If we restart the clover service this fixes all issues, so it seems like it's stuck threads to me as it's consistent with the error message of connection refused.

    6) When I look in the weblogic logs, clover logs and several linux commands to look for number of used threads/available threads, stuck files, etc nothing really jumps out. There look to be more than enough open threads, available files, etc. That said, it's definitely a networking issues and a restart of the clover service fixes it (clover and weblogic I should say).

    So, for now we've stabilized it by just writing a cron job to restart the service every night. Sort of a sledgehammer approach but it is working for us. it seems like to me that the python scripts/clover/weblogic combination is not shutting down properly potentially on an error and that's getting caught up somewhere (but I don't see it in the logs with any of the architecture components). That's my best guess. We gave up hunting it down when the brute force restart fixed the issue and things stabilized. it would be good to know if you have any other thoughts though.

    thanks!
  • Avatar
    Lukas Cholasta
    0
    Comment actions Permalink
    Hi pintail,

    I've got two ideas on how to avoid this behavior.

      1. Set the timeout property on the SystemExecute component. The command is executed as a separate process and if it keeps waiting for a response for example, this should kill the process after the selected time.
      2. Try using Execute Script instead of SystemExecute. This is a jobflow component and is designed specifically for running scripts as the name implies. It is also much more configurable via its input port.


    Please let me about your findings.

    Best regards,

Please sign in to leave a comment.