I have an Amazon EMR job flow which does three functions, after the first input, output from the second after the third The second task is used by a work-distributed catcher.
I have created a full flow of work on the EMR web site (console), but the cluster fails immediately because it does not get distributed cache file - because it is not made by step # 1 yet is.
Is my only option of making these steps from CLI through a bootstrap action, and - Steps for waiting
option? It seems strange that I can not execute a multi-step job flow where input of one task depends on the output of another. Finally, I had an Amazon EMR cluster that was creating a bootstrap, but there was no step for it. To make it around, found around Then I did SSH in the head and ran the hadoop jobs on the console.
Now I have the flexibility to add a copy to the individual configuration option per job.
Comments
Post a Comment