yapapi- Golem's high-level API - and its dependencies in the requestor agent, of course).
worker.pyfile from the
workshopbranch, you'll see the following boilerplate:
RESULT_PATH- those are the paths to the locations within the Docker image that contain the hash to be cracked, the slice of the dictionary we want this node to process and finally, the path to the result file - in case a result is found within the processed slice of the dictionary.
sha256function from the
hashliblibrary (bundled with Python), we need to import it by adding a line to our imports at the top of the file:
data/words-short.json), which is also included in our example alongside with a sample hash derived from one of the words in that shorter list (
data/hash-short.json). The hash should match the word
testfrom that list.
worker.py's input paths. Let's replace the constants in the beginning of the file to point to our shorter lists:
worker.pyscript (needs to be executed from the project's root directory):
"test"which matches the expected password as mentioned above.
worker.pyscript ready, it's time to take a look at the VM image which will be used to run our code on providers.
pythonimage since we want it to run our
worker.pyscript and choose the
slimvariant to reduce the image's size.
/golem/output. Volumes are directories that can be shared with the host machine and, more importantly, through which the execution environment supervisor (the process on the provider's host machine) will be able to transfer data to and out of the VM. For a Golem VM, the image must define at least one volume.
worker.pyscript to the path
/golem/entrypointwithin the image. Later on we'll see how the requestor code uses this path to run our script.
/golem/entrypointas the working directory of the image. It will be the default location for commands executed by this image.
ENTRYPOINTstatement - if present in your Dockerfile - is effectively ignored and replaced with the exeunit's own entrypoint.
.gvmifile to Golem's image repository.
gvmkit-buildis included in
requirements.txt, so it should be installed in the virtual environment used for this example.
yagnadaemon and handled by our high-level API via the daemon's REST API.
Taskobjects that directly represent the singular jobs that are given to provider nodes.
workshopbranch, you'll see the following boilerplate:
words. These are paths to files containing the hash we're looking for and the dictionary which we hope to find the hash in, respectively. Otherwise, it's just a regular Python argument parser invocation using
steps- the filling of which will be our main task in this section.
mainwhich we will also need to supplement with a proper call to our API's
Golemclass to bind the previous two together.
mainroutine and does some rudimentary error handling just in case something goes amiss and we're forced to abort our task with a somewhat rude Ctrl-C.
datafunction. It accepts the
words_filepath and the
chunk_size, which is the size of each dictionary slice defined by its line count.
datafunction produces a generator yielding
Taskobjects that describe each task fragment.
chunk) which it fills with the lines from said file, stripping them of any preceding or trailing whitespace or newline characters (
chunk_size- or once all lines have been read from the input file - it then yields the respective
dataset to the just-constructed list.
stepsin our example. It accepts
context, which is a
tasks- an iterable of
Taskswhich will be filled with task fragments coming from our
datafunction that we defined in the previous step.
WorkContextgives us a simple interface to construct a script that translates directly to commands interacting with the execution unit on provider's end. Each such work context refers to one activity started on one provider node. While constructing such a script, we can define those steps that need to happen once per a worker run (in other words, once per provider node) - those are placed outside of the loop iterating over
.send_file()invocation. It transfers the file containing the hash we'd like to crack and instructs the execution unit to store it under
worker.HASH_PATH, which is a location within the VM container that we had previously defined in our
worker.pyscript. We perform this step just once here because that piece of task input doesn't change.
.send_json()which tells the exe-unit to store the given subset of words as a JSON-serialized file in another path within the VM that we had defined in
worker.WORDS_PATH, note that in this function the destination comes first, followed by an object to be serialized),
.run()call which is the one that actually executes the
worker.pyscript inside the provider's VM, which in turn produces output (as you remember, this may be empty or may contain our solution),
.download_file()call which transfers that solution file back to a temporary file on the requestor's end,
.commit()on our work context and yield that to the calling code (the processing inside the
Golemclass) which takes our script and orchestrates its execution on provider's end.
taskhas already been completed. Now, we only need to call
Task.accept_result()with the result coming from the temporary file transferred from the provider. This ensures that the result is what's yielded from the
Golemto the final loop in our
mainfunction that we'll define next.
mainfunction in the boilerplate.
vm.repo()invocation with the noted-down one:
Golemengine. It is given our GLM
subnet_tag- a subnet identifier for the collection of nodes that we want to utilize to run our tasks - unless you know what you're doing, you're better-off leaving this at the value defined as the default parameter in our boilerplate code.
golemis used with
async withas an asynchronous context manager. This guarantees that all internal mechanisms the engine needs for computing our tasks are started before the code in the body of
async withis executed, and are properly shut down afterwards.
golemstarted, we are ready to call its
execute_tasksmethod. Here we instruct
golemto use the
stepsfunction for producing commands for each task, and the iterator produced by
data(args.words)to provide the tasks themselves. We also tell it that the provider nodes need to use the
payloadspecified by the
packagewe defined above. And finally, there's the
timeoutin which we expect the whole processing on all nodes to have finished.
async forwe iterate over tasks computed by
execute_tasksand check their results. As soon as we encounter a task with
task.resultset to a non-empty string we can
breakfrom the loop instead of waiting until the remaining tasks are computed.
resultshould contain our solution and the solution is printed to your console. (Unless of course it happens that the hash we're trying to break is not found within the dictionary that we have initially assumed it would come from - which we assure you is not the case for our example hash ;) ).
yagnadaemon is running and is properly funded and initialized as a requestor.
requestor.pyscript within the checked-out repo and run: