The book doesn’t have it because you do not need it on Linux. Notice the additional ‘-‘ after the arguments to paste. The main machine through which I am parallelizing things is also running OS X Mavericks. In my case, Cadmius happens to be Ubuntu 14.04, and macusers-Macbook is running OS X Mavericks. Next, create your instances file (named ‘instances’), and add the hostnames of your local machines as shown in the screenshot. Here is a walkthrough that basically replicates the toy example in the book, but highlights the differences you’ll need to incorporate in an OS X environment.įirst, you can install GNU Parallel on OS X through Homebrew: (3) you are using the OS X variant of paste (which has a nuance compared to the Ubuntu version) (2) all your machines are local (as in connected through a LAN) (1) you are primarily using OS X and might have some Ubuntu machines as some of your instances I am presenting a tutorial that works with the premise that (3) you are using GNU paste that comes pre-installed on all Ubuntu systems (2) you are using a bunch of Amazon EC2 instances to do your parallelization (and hence need to find out the IPs of all your instances in a non-straightforward way) (1) all machines you are using are running Ubuntu or some variant of Linux The toy example/ tutorial in the book makes three assumptions: The book Data Science at the Command Line discusses, amongst several other things, how to use GNU Parallel to distribute your data over different machines. ![]() ![]() GNU Parallel is a great utility to parallelize any computation through the command line.
0 Comments
Leave a Reply. |