Aaron Ardiri - IoT Blog: Anatomy of a parallel application (MPI basics)

RSS/XML feed
198 entries available (show all)


PLEASE TAKE A MOMENT TO FOLLOW MY NEW VENTURE: RIoT Secure AB ALL SECURITY RELATED TOPICS ON IoT wILL BE POSTED THERE

2016-11-30	>> ANATOMY OF A PARALLEL APPLICATION (MPI BASICS) It is one thing to just run an application; but another to understand how it works internally. In an earlier blog we demonstrated the execution of the Pokémon GO brute-force hash generator algorithm running as a parallel application on our Raspberry Pi cluster. While we were quick to jump into seeing the performance of our cluster; one thing we overlooked was explaining how parallel applications are written and walk through the code to understand how it works. Up until a few days ago; the only remnants of parallel programming I had in my head was dating way back to my university years when we had one course on the subject and we had access to a 64-CPU 1Mhz computer offering vector processing (run the same command across multiple nodes). The first step was to study the tutorials over at mpitutorial.com to get acquainted with MPI. Thankfully; I am a quick learner - here are my findings and how I designed the parallel application. DECLARATIONS AND CONSTANTS First of all; we need to define that we wish to use the MPI library - a simple `#include` does this. #include <mpi.h> #define NODE_MASTER 0 #define MIN_NODES 2 #define MAX_NODES 16 #define TAG_SEED 0x01 // master to slave - are you alive? #define TAG_WORK 0x02 // master to slave - here is a job for you #define TAG_DIE 0x03 // master to slave - ok, you can shutdown #define TAG_RESULT 0x04 // slave to master - here is the result We also need to define a few constants to help with the flow of the application. MPI messaging allows for the tagging of messages, which are great for logic decisions; we have created a few in our case - the first to seed the slaves; the second to do some work and a third to tell the slaves they can shutdown. We also need another to send results from the slave back to the master. MPI INITIALIZATION MPI is more than just a library; there is a lot that happens behind the scenes in regards to inter process communication between different nodes (utilizing `ssh`) and the spawning of the various processes. The good news is that initialization is a one liner; we then need to identify our position in the MPI space and determine what type of node we need to act as. int rank, size; char name[MPI_MAX_PROCESSOR_NAME]; int name_len; // initialize MPI MPI_Init(&argc, &argv); // get the environment information from MPI MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); // get the name of the worker MPI_Get_processor_name(name, &name_len); fprintf(stdout, ":: '%s' - process %d of %d\n", name, rank, size); // do we have enough processes: we need at least two nodes if ((size < MIN_NODES) \|\| (size > MAX_NODES)) exit(0); At this point, we have two variables `rank` and `size`. While it is common practice (as we will do) to make the master the node that has the rank of `0` - it doesn't have to be this way. You may wish to utilize a specific machine for the master and you could check the processor name to do this. switch (rank) { // MASTER case NODE_MASTER: // put the master logic here break; // SLAVE default: // put the slave logic here break; } Based on the `rank` variable - we switch the application into acting as a master or slave. MPI deploys the same application across all the nodes where it spawns the process, so we need to manage this check ourselves and act appropriately, a simple `switch` or `if-then-else` statement will be sufficient to separate the logic between being a master and a slave. LOGIC: MASTER NODE The master node is the brains of the application as it is responsible for issuing jobs to all the slave nodes until a result it found. The master will issue three types of messages to the slaves - the first to see if the slave is alive; the second to issue a job to perform and the last to shutdown the slave. int node, job, result; // seed the slaves for (node=1; node<size; node++) MPI_Send(NULL, 0, MPI_INT, node, TAG_SEED, MPI_COMM_WORLD); // keep passing out tasks to slaves until a solution is found for (;;) { // let's receive the status of the slave computation MPI_Recv(&result, 1, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status); if (result == 1) break; // if we didn't get a result, issue a new tasks to the same slave node = status.MPI_SOURCE; MPI_Send(&job, 1, MPI_INT, node, TAG_WORK, MPI_COMM_WORLD); } // inform all slaves to shutdown for (node=1; node<size; node++) MPI_Send(NULL, 0, MPI_INT, node, TAG_DIE, MPI_COMM_WORLD); Seeing the slaves isn't technically required; but it does make the flow of main processing loop simpler. The basic idea of the master is to wait until a response is received by a slave and then issue the slave that responded with a new task. This ensures maximum use of all the slaves, the last thing we want is slaves waiting for their turn - as I can try to explain below with two slaves: {node} {execution time} ---------------------------------------------------------- slave 1: xxxxxxxxxx slave 2: xxxxxx slave 2: xxxxxxxxxx slave 1: xxxxxxxxxxx salve 2: xxxxxxxxxxxxxxxxxxx slave 1: xxxxxxxxxxx slave 1: xxxxxxxxxxx salve 2: xxxxxxxxxxxxxx In the above example; we have two slave nodes executing in parallel. We issue a new task to the one that responds first, as not all tasks may take the same amount of time. If we were to assume linear assignment - we would have a number of dead spots, for example (same data): {node} {execution time} ---------------------------------------------------------- slave 1: xxxxxxxxxx slave 2: xxxxxx.... slave 1: xxxxxxxxxxx slave 2: xxxxxxxxx. slave 1: xxxxxxxxxxx....... salve 2: xxxxxxxxxxxxxxxxxxx slave 1: xxxxxxxxx salve 2: xxxxxxxxx It should be clear here that points where a dot ('`.`') exist are times where the slave will sit idle; waiting for its turn to be issued a new command, it is great that we know the id of the node that responded so we can avoid this type of dead time in the parallel execution process. LOGIC: SLAVE NODE The slave node just needs to execute the job it has been asked to do do and send back a result. // the slaves run until told to shutdown for (;;) { MPI_Status status; int job, result; // receive the work request from the master MPI_Recv(&job, 1, MPI_INT, NODE_MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // attempt to break the code switch (status.MPI_TAG) { case TAG_SEED: result = 0; // no result, let the master know we are here break; case TAG_WORK: result = / do the job, based on 'job' params passed / break; case TAG_DIE: goto SLAVE_DONE; } // send the result of the work request back to the master MPI_Send(&result, 1, MPI_INT, NODE_MASTER, TAG_RESULT, MPI_COMM_WORLD); } SLAVE_DONE:; The core logic is to run in an infinite loop until a `TAG_WORK` or `TAG_DIE` message is received from the master. 99.9% of the time, the slave will be working way doing some form of complex processing based on the parameters that have been sent from the master. When it received the `TAG_DIE` message, it should break from the infinite loop and shutdown nicely. MPI TERMINATION The final step of the parallel application is to tell the MPI environment that it can shut itself down. // shutdown MPI MPI_Finalize(); In turn, this will clean up all the resources that were utilized to run the application in parallel across the various nodes that were defined. It will also bring down any slaves that may not have shutdown correctly and collate any error messages that may have incurred as a result of the shutdown. Hopefully; with this knowledge available - you can go back to the previous blog entry and download the source code and get a better understanding of how the single process was parallelized using MPI, in the `mpi-sources.zip` I have included a number of versions, to do some time comparisons. >> max=50/min=12 split=4_4 >> max=50/min=12 split=5_3 mpi_call: 30958 mpi_call: 323521 attempts: 1435441080 attempts: 1428129840 real 2m24.406s real 2m59.222s If we run the numbers, we can approximate that the overhead of the MPI messaging framework compared to the time it takes to calculate a hash lets us know that we can do approximately 1200 hash calculations in the same time a single MPI request is made. It makes sense to identify this ratio to ensure you can maximize the benefit of making the application parallel in the first place.
advertisement (self plug): need assistance in an IoT project? contact us for a free consultation.
comments powered by Disqus

µTLS - defining lightweight security for IoT (part 1)

Parallel applications on a Raspberry Pi cluster

DISCLAIMER:
All content provided on this blog is for informational purposes only.
All comments are generated by users and moderated for inappropriateness periodically.
The owner will not be liable for any losses, injuries, or damages from the display or use of this information.