Dispatching compute functions to a worker and receiving data via zmq

So I got this idea because I often run something like backprop that takes a few minutes to converge, and then I view the results. But it would be nice to see the results as the algorithm is iterating to know if I should stop it and re-initialize because I can see that it's not going to converge. This is often done when prototyping so I don't want to write the full overhead of another process to run the compute and setup ZMQ push-pull. It would be nice if I could just decorate the function and even have expressions like `graphic.data = new_data` within the function that gets dispatched to another worker process! I think this can be done.

Idea of user-API:

```python
worker = Worker(port=12345)

fig = fpl.Figure(shape=(1, 2))

image1 = fig[0, 0].add_image(np.random.rand(10, 10))
image2 = fig[0, 1].add_image(np.random.rand(10, 10))

fig.show()

@worker(graphics=[image1, image2])
def run_long_algo(algo_param1, algo_param2):
    image1.data = np.random.rand(10, 10) * algo_param1
    image2.data = np.random.rand(10, 10) * algo_param2
```

The decorator takes the real graphics in the main (viz) process and from those it creates "pseudographics" for use in the `run_long_algo()` worker process. These pseudographics only allow assignment operators on graphic features and no other functionality from graphics. What actually happens is that then each of those assigment operators in the decorated function run a hook that sends the new data via zmq!

```python
class PseudoGraphic:
    def __init__(self, graphic_cls: str, socket):
        self._graphic_cls = graphic_cls
        self._socket = socket

    def __setattr__(self, feature, value):
        if feature not in self._graphic_cls._features:
            raise AttributeError
        self._socket.send((feature, value.tobytes())
````

For now just support simple full assigmnment `=` operators, we can think of `data[:, 1] = new_y_vals` etc. later.

Similarly, in the main viz process when the `@worker` decorator gets the list of graphics to use in the dispatched `run_long_algo()` function, it creates hooks that automatically add an animation function to the plot area that does this whole song and dance:

```python
def get_bytes(graphic, sub):
    """
    Gets the bytes from the publisher
    """
    try:
        feature_name, b = sub.recv(zmq.NOBLOCK)
    except zmq.Again:
        pass
    else:
        # example for data, need to think about if str is passed
        np.frombuffer(b, dtype=graphic.data.value.dtype).reshape(graphic.value.shape)
```

@clewis7 Questions since you're the most experienced with zmq from [improv](https://github.com/project-improv/improv). 

- What do you think? This type of thing would be very useful for me, and has lots of use cases I think. A lot of times setting up the boilerplate for multiprocessing and ZMQ, or using improv, is a lot of work just for prototyping a quick computation. Also I first thought of using python multiprocessing, but the equivalent to `zmq.CONFLATE` in python multiprocessing seems to be much less efficient. We know from tests a few years ago that we can get 5,000 Hz @ for 512x512 uint8 data with `zmq.CONFLATE`. 
- Using zmq rather than python multiprocessing also has the advantage that _you can dispatch the compute to another computer over the network and visualize the results as they come in on the fly_ (@apasarkar, you might find this really cool 😄 )
- The easiest implementation is to setup an individual socket for each graphic, do you know if this is fine? Could we use one context per decorated worker and one socket (port) for each individual graphic? Or is it required to have one-context-one-socket? 

Things to figure out:
- I think we can simplify the "PseudoGraphic" by just having one `__setattr__()` method and it'll know the valid graphic features based on the graphic type.
- I think there's some way in zmq to send a message with two objects, need to check, so we can send `(feature_name, new_value)`
- How much of the scope from the main process should be pickled and sent to the worker? There are a few approaches we could use:
    1. explicitly define the scope, like `@worker(graphics=..., scope=[np, data_matrix, scipy])`
    2. filter `globals()` and send only imported modules and classes? Not sure how robust this would work.
    3. pickle and send all of `globals()` from the main thread, naive and probably error-prone approach since not all objects are necessarily pickleable, and if there are large objects (large arrays) in the global scope that the function does not need then this can be a huge pickle to send to workers.
    4. do not send any scope, user must explicitly define all imports and any data within the decorated function:
        ```python
        @worker(graphics=...)
        def run_long_algo():
            import numpy as np
            ...
        ```

I think I like (i), if the scope is `None` then full explicit imports etc. are required, otherwise anything within the `scope` list is pickled and available to the worker process. If an object is not pickleable then it will raise in the main process.

EDIT: more sophisticated idea, parse the function and determine if the function uses any objects within the global scope, if so pickle those objects: https://stackoverflow.com/questions/32382963/how-to-parse-python-code-to-identify-global-variable-uses
This would be very nice, would be basically 1:1 with regular in-main-process code. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dispatching compute functions to a worker and receiving data via zmq #866

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Dispatching compute functions to a worker and receiving data via zmq #866

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.