hadoop - Amazon Elastic Map Reduce: Job flow fails because output file is not yet generated -


I have an Amazon EMR job flow which does three functions, after the first input, output from the second after the third The second task is used by a work-distributed catcher.

I have created a full flow of work on the EMR web site (console), but the cluster fails immediately because it does not get distributed cache file - because it is not made by step # 1 yet is.

Is my only option of making these steps from CLI through a bootstrap action, and - Steps for waiting option? It seems strange that I can not execute a multi-step job flow where input of one task depends on the output of another. Finally, I had an Amazon EMR cluster that was creating a bootstrap, but there was no step for it. To make it around, found around Then I did SSH in the head and ran the hadoop jobs on the console.

Now I have the flexibility to add a copy to the individual configuration option per job.


Comments