06_control.py =============== To start all the components of this example, run the environment and perform a evolutionary step, we will create a ``06_control.py``, which we can execute cell by cell with interactive python in vscode or a ``06_control.ipynb`` jupyter notebook with the same code. First we import all required packages. .. code-block:: python import uuid import socket import time from swergio import Client, MESSAGE_TYPE from swergio_toolbox.swarm_control import Swarm We will use the swarm class from the swergio toolbox to simplify the handling of multiple components at once. Of course we can also just start each component one by one by just running the according script. The swarm class requires a swarm.yaml file, that contains the specification about each component most importantly the path to each script. Once the YAML is defined we run each component at once by instantiating a swarm object and calling the ``swarm.start()`` method. .. code-block:: python swarm = Swarm() swarm.start() To start all 20 models we'll just run a for loop and add each model to our swarm and start it. All the models refer to the same file, but are separate processes. .. code-block:: python for i in range(20): swarm.add("model"+str(i), "02_model.py", terminal=False) swarm.start("model"+str(i)) After all the components are running, we can define another component that we will use a control unit by sending message in the *control* room. We will set up a swergio client with a name and the same information as before and also join the *control* .. code-block:: python COMPONENT_NAME = 'control' PORT = 8080 SERVER = socket.gethostbyname(socket.gethostname()) ADDR = (SERVER, PORT) FORMAT = 'utf-8' HEADER_LENGTH = 10 client = Client(COMPONENT_NAME,SERVER,PORT,FORMAT,HEADER_LENGTH) client.join_room('control') Now let's define a function we can use to run N number of episodes of the gym environment. This function will send the number of episodes as well as our deterministic flag to the *control* room. Once the environment component will receive the message it will send observations. We can wait until we receive a feedback from the environment component that all episode are done including our final score the current models were able to reach. .. code-block:: python def start(nr_of_episodes = 1, deterministic = False): msg = {'ROOT_ID':uuid.uuid4().hex, 'ID':uuid.uuid4().hex, 'TYPE': MESSAGE_TYPE.DATA.CUSTOM.id, 'NR_OF_EPISODE': nr_of_episodes, 'DETERMINISTIC': deterministic, 'TO_ROOM': 'control' } client.send(msg) ## WAIT FOR RESPONSE while True: message = client.receive() if message is not False: if 'AVG_SCORE' in message.keys() and 'STATUS' in message.keys(): if message['STATUS'] == 'ENV_DONE': return message['AVG_SCORE'] If we want to perform a evolutionary step, we will send a message including ``'EVOLUTION': True``. The evolutionary component will then start to perform the evolution and update the model weights accordingly. Again we will wait until we receive feedback that this step is done. .. code-block:: python def start_evo(): msg = {'ROOT_ID':uuid.uuid4().hex, 'ID':uuid.uuid4().hex, 'TYPE': MESSAGE_TYPE.DATA.CUSTOM.id, 'EVOLUTION': True, 'TO_ROOM': 'control' } client.send(msg) ## WAIT FOR RESPONSE while True: message = client.receive() if message is not False: if 'STATUS' in message.keys(): if message['STATUS'] == 'EVO_DONE': return True Let's run a few steps of episodes while training the models via RL and in between perform some evolutions to improve the model pool. In this example we will run 20 episodes none deterministic, meaning our actions are sampled and not necessary the best according to our policies. Then we can run 1 episode to check how well our models perform after RL training. Afterwards we start a evolutionary step and check again the performance of the new model pool. .. code-block:: python TRAIN_EVALS = 20 for i in range(5): print("ROUND " + str(i)) score = start(TRAIN_EVALS,deterministic=False) print(score) score = start(1,deterministic=True) print(score) evdone = start_evo() print(evdone) time.sleep(1) score = start(1,deterministic=True) print(score) Finally we can stop all the components (Server,ControlModel,Trebuchet and Trainer) all at once with the ``swarm.stop()`` method. .. code-block:: python swarm.stop()