Skip to content

Dispatching compute functions to a worker and receiving data via zmq #866

Open
@kushalkolar

Description

@kushalkolar

So I got this idea because I often run something like backprop that takes a few minutes to converge, and then I view the results. But it would be nice to see the results as the algorithm is iterating to know if I should stop it and re-initialize because I can see that it's not going to converge. This is often done when prototyping so I don't want to write the full overhead of another process to run the compute and setup ZMQ push-pull. It would be nice if I could just decorate the function and even have expressions like graphic.data = new_data within the function that gets dispatched to another worker process! I think this can be done.

Idea of user-API:

worker = Worker(port=12345)

fig = fpl.Figure(shape=(1, 2))

image1 = fig[0, 0].add_image(np.random.rand(10, 10))
image2 = fig[0, 1].add_image(np.random.rand(10, 10))

fig.show()

@worker(graphics=[image1, image2])
def run_long_algo(algo_param1, algo_param2):
    image1.data = np.random.rand(10, 10) * algo_param1
    image2.data = np.random.rand(10, 10) * algo_param2

The decorator takes the real graphics in the main (viz) process and from those it creates "pseudographics" for use in the run_long_algo() worker process. These pseudographics only allow assignment operators on graphic features and no other functionality from graphics. What actually happens is that then each of those assigment operators in the decorated function run a hook that sends the new data via zmq!

class PseudoGraphic:
    def __init__(self, graphic_cls: str, socket):
        self._graphic_cls = graphic_cls
        self._socket = socket

    def __setattr__(self, feature, value):
        if feature not in self._graphic_cls._features:
            raise AttributeError
        self._socket.send((feature, value.tobytes())

For now just support simple full assigmnment = operators, we can think of data[:, 1] = new_y_vals etc. later.

Similarly, in the main viz process when the @worker decorator gets the list of graphics to use in the dispatched run_long_algo() function, it creates hooks that automatically add an animation function to the plot area that does this whole song and dance:

def get_bytes(graphic, sub):
    """
    Gets the bytes from the publisher
    """
    try:
        feature_name, b = sub.recv(zmq.NOBLOCK)
    except zmq.Again:
        pass
    else:
        # example for data, need to think about if str is passed
        np.frombuffer(b, dtype=graphic.data.value.dtype).reshape(graphic.value.shape)

@clewis7 Questions since you're the most experienced with zmq from improv.

  • What do you think? This type of thing would be very useful for me, and has lots of use cases I think. A lot of times setting up the boilerplate for multiprocessing and ZMQ, or using improv, is a lot of work just for prototyping a quick computation. Also I first thought of using python multiprocessing, but the equivalent to zmq.CONFLATE in python multiprocessing seems to be much less efficient. We know from tests a few years ago that we can get 5,000 Hz @ for 512x512 uint8 data with zmq.CONFLATE.
  • Using zmq rather than python multiprocessing also has the advantage that you can dispatch the compute to another computer over the network and visualize the results as they come in on the fly (@apasarkar, you might find this really cool 😄 )
  • The easiest implementation is to setup an individual socket for each graphic, do you know if this is fine? Could we use one context per decorated worker and one socket (port) for each individual graphic? Or is it required to have one-context-one-socket?

Things to figure out:

  • I think we can simplify the "PseudoGraphic" by just having one __setattr__() method and it'll know the valid graphic features based on the graphic type.
  • I think there's some way in zmq to send a message with two objects, need to check, so we can send (feature_name, new_value)
  • How much of the scope from the main process should be pickled and sent to the worker? There are a few approaches we could use:
    1. explicitly define the scope, like @worker(graphics=..., scope=[np, data_matrix, scipy])
    2. filter globals() and send only imported modules and classes? Not sure how robust this would work.
    3. pickle and send all of globals() from the main thread, naive and probably error-prone approach since not all objects are necessarily pickleable, and if there are large objects (large arrays) in the global scope that the function does not need then this can be a huge pickle to send to workers.
    4. do not send any scope, user must explicitly define all imports and any data within the decorated function:
      @worker(graphics=...)
      def run_long_algo():
          import numpy as np
          ...

I think I like (i), if the scope is None then full explicit imports etc. are required, otherwise anything within the scope list is pickled and available to the worker process. If an object is not pickleable then it will raise in the main process.

EDIT: more sophisticated idea, parse the function and determine if the function uses any objects within the global scope, if so pickle those objects: https://stackoverflow.com/questions/32382963/how-to-parse-python-code-to-identify-global-variable-uses
This would be very nice, would be basically 1:1 with regular in-main-process code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy