Pigfall #2

Today I want to share some 10 hours of debugging stress in 5 minutes of writing. The subject is again the infamous Pig of course, of which relatively little resource exists on the web. The specific issue is loading input files. Consider the following script:

f1 = load ‘$file’ using ETLProjector(‘color,weight’);

f2 = load ‘$file’ using ETLProjector(‘color,volume’);

f3 = JOIN f1 BY color, f2 BY color;

STORE f3 INTO ‘$OUT’ Using JsonStorage();

This will unfortunately generate error message of a mysterious kind: 

java.lang.Exception: java.lang.RuntimeException: java.io.EOFException: Unexpected end of input stream

I never dug into the Pig source code but presumably this is caused by loading the file twice, resulting in some kind of conflict. Real experts are welcome to help out here. To circumvent that, make sure to load the file only once:

f3 = ETLProjector(‘color,volume,weight’);

Of course sometimes the loading twice approach can save some bandwidth, but sorry, Pig doesn’t seem to like it. 

Advertisements

About aquazorcarson

math PhD at Stanford, studying probability
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s