.. _tutorial_distributed_raytracing: ``ARTIST`` Tutorial: Distributed Ray Tracing ============================================ .. note:: You can find the corresponding ``Python`` script for this tutorial here: https://github.com/ARTIST-Association/ARTIST/blob/main/tutorials/02_heliostat_raytracing_distributed_tutorial.py This tutorial demonstrates how to set up a distributed environment and perform distributed ray tracing in ``ARTIST``. It is recommended that you are already familiar with the following processes in ``ARTIST``: - How to load a scenario, - aligning heliostats, and - performing heliostat ray tracing to generate a flux density image on a target area. If you need help with these topics, check our tutorial on :ref:`heliostat raytracing `. Initial Setup ------------- ``ARTIST`` is designed for parallel computation. To enable parallelization even when considering different types of heliostats with different kinematics and actuator configurations, we use ``HeliostatGroups``. Detailed information on heliostat groups and how ``ARTIST`` is structured can be found in the description of :ref:`what is happening under the hood`. Before proceeding, we need to determine how many heliostat groups are present in the scenario: .. code-block:: python number_of_heliostat_groups = Scenario.get_number_of_heliostat_groups_from_hdf5( scenario_path=scenario_path ) During distributed ray tracing, the heliostat tracing process can be distributed and parallelized using `distributed data parallelism in PyTorch `_. When using DDP, not only can the heliostat groups be processed in parallel, but the data samples within each group can also be handled in parallel. We will see how this works in more details later in the tutorial. The Distributed Environment --------------------------- Before we start the actual ray tracing, we need to set up the distributed environment. Based on the available devices, the environment is initialized with an appropriate communication backend. For computation on GPUs, we use the ``nccl`` backend optimized for NVIDIA GPUs. For computation on CPUs, ``gloo`` is used as backend. All of this setup is handled automatically via: .. code-block:: python with setup_distributed_environment( number_of_heliostat_groups=number_of_heliostat_groups, device=device, ) as ddp_setup: .. note:: The rest of the tutorial takes place within this ``with`` block. This ensures that the distributed environment remains active during execution and is automatically cleaned up afterwards. The dictionary ``ddp_setup`` contains all parameters related to the distributed environment. Mapping between Active Heliostats, Target Areas and Incident Ray Directions --------------------------------------------------------------------------- ``ARTIST`` offers the flexibility to activate and deactivate certain heliostats in the scenario. This makes it possible to have some heliostats aim at one target area while others aim elsewhere, or to use different incident ray directions for different heliostats in the same alignment and ray tracing process for calibration tasks. To map each heliostat to its designated target area and incident ray direction, we use the following mapping structure: .. code-block:: python heliostat_target_light_source_mapping = [ ("heliostat_1", "target_name_2", incident_ray_direction_tensor_1), ("heliostat_2", "target_name_2", incident_ray_direction_tensor_2), (...) ] As we want to consider all heliostats in this tutorial, we set our mapping to ``None``: .. code-block:: python heliostat_target_light_source_mapping = None It is still possible to set a specific default target area index and a default incident ray direction later. If these are not provided, all heliostats are assigned to the first target area found in the scenario with an incident ray direction of "north", i.e., the light source position is directly in the south. Distributed Raytracing ---------------------- Before we can start distributed ray tracing, we need to set the resolution of the generated bitmap and create a tensor to store the final result: .. code-block:: python bitmap_resolution = torch.tensor([256, 256]) combined_bitmaps_per_target = torch.zeros( ( scenario.target_areas.number_of_target_areas, bitmap_resolution[indices.unbatched_bitmap_e], bitmap_resolution[indices.unbatched_bitmap_u], ), device=device, ) Now the heliostat groups come in to play. Each heliostat group must be considered separately – in a distributed setting, these groups can be computed in parallel; otherwise, they are processed sequentially. Therefore, the entire distributed ray tracing process takes place within a ``for`` loop: .. code-block:: python for heliostat_group_index in ddp_setup["groups_to_ranks_mapping"][ ddp_setup["rank"] ]: heliostat_group = scenario.heliostat_field.heliostat_groups[ heliostat_group_index ] Within this loop, the first step is to determine which heliostats are activated and which target areas are used. This is done using the ``heliostat_target_light_source_mapping`` defined earlier: .. code-block:: python ( active_heliostats_mask, target_area_indices, incident_ray_directions, ) = scenario.index_mapping( heliostat_group=heliostat_group, string_mapping=heliostat_target_light_source_mapping, device=device, ) We then activate the heliostats as in the :ref:`previous tutorial on single heliostat ray tracing`: .. code-block:: python # For each index, 0 indicates a deactivated heliostat, 1 indicates an activated one. # An integer greater than 1 means the heliostat at this index is considered multiple times. heliostat_group.activate_heliostats( active_heliostats_mask=active_heliostats_mask, device=device ) and align the surfaces for all activated heliostats with the incident ray direction: .. code-block:: python heliostat_group.align_surfaces_with_incident_ray_directions( aim_points=scenario.solar_tower.get_centers_of_target_areas( target_area_indices, device=device ), incident_ray_directions=incident_ray_directions, active_heliostats_mask=active_heliostats_mask, device=device, ) Now we are ready to create a distributed ``HeliostatRayTracer``. Here, it is important to provide the overall number of processes ``world_size``, the individual process ID ``rank``, the ``batch_size``, and a ``random_seed``: .. code-block:: python ray_tracer = HeliostatRayTracer( scenario=scenario, heliostat_group=heliostat_group, world_size=ddp_setup["heliostat_group_world_size"], rank=ddp_setup["heliostat_group_rank"], batch_size=heliostat_group.number_of_active_heliostats, random_seed=ddp_setup["heliostat_group_rank"], bitmap_resolution=bitmap_resolution, ) In this tutorial, the ``batch_size`` is equal to the number of active heliostats. It determines how many heliostats are handled in parallel within a group's ray tracing process. If the number of active heliostats is high and your GPUs do not have enough memory capacity, reduce the ``batch_size`` to prevent ``CUDA out of memory`` errors during runtime. However, this increases runtimes as the batches within each group are computed sequentially (while heliostats within each batch are handled in parallel). We can now perform ray tracing per heliostat with ``trace_rays()``: .. code-block:: python bitmaps_per_heliostat, _, _, _ = ray_tracer.trace_rays( incident_ray_directions=incident_ray_directions, active_heliostats_mask=active_heliostats_mask, target_area_indices=target_area_indices, device=device, ) Consider an example scenario of two heliostat groups with two heliostats each in a distributed environment with three processes: - Group 0: ``AA28``, ``AC43`` - Group 1: ``AA31``, ``AA39`` The ``world_size`` is 3, corresponding to ranks 0, 1, and 2. Ranks are distributed among groups in a round-robin fashion: Group 0 is computed on ranks 0 and 2, while group 1 is computed on rank 1. Since group 0 has two ranks available, it can perform nested parallelization. Heliostat 0 of group 0, named ``AA28``, is handled by rank 0, and heliostat 1 of group 0, named ``AC43``, is handled by rank 2. Group 1 has two heliostats but only one rank assigned, thus nested parallelization is not possible. The ``trace_rays()`` method produces bitmaps per heliostat. .. list-table:: Bitmaps per heliostats :widths: 33 33 33 :header-rows: 0 * - .. figure:: ./images/bitmap_of_heliostat_AA28_in_group_0_on_rank_0.png :scale: 30% Rank 0 - .. figure:: ./images/bitmap_of_heliostat_AA31_in_group_1_on_rank_1.png :scale: 30% Rank 1 - .. figure:: ./images/bitmap_of_heliostat_AA28_in_group_0_on_rank_2.png :scale: 30% Rank 2 * - .. figure:: ./images/bitmap_of_heliostat_AC43_in_group_0_on_rank_0.png :scale: 30% Rank 0 - .. figure:: ./images/bitmap_of_heliostat_AA39_in_group_1_on_rank_1.png :scale: 30% Rank 1 - .. figure:: ./images/bitmap_of_heliostat_AC43_in_group_0_on_rank_2.png :scale: 30% Rank 2 When multiple heliostats in a scenario focus on the same target, we need to combine their flux image into one resulting image with ``get_bitmaps_per_target()``: .. code-block:: python bitmaps_per_target = ray_tracer.get_bitmaps_per_target( bitmaps_per_heliostat=bitmaps_per_heliostat, target_area_indices=target_area_indices, device=device, ) Since there may also be multiple heliostats in one group, we need to make sure the results from all heliostats are considered in the combined bitmap via: .. code-block:: python combined_bitmaps_per_target = combined_bitmaps_per_target + bitmaps_per_target All heliostats in this example aim at the first target area in the scenario, called the ``multi_focus_tower``. As a result, all bitmaps in the ``combined_bitmaps_per_target`` tensor are empty, except the ones at index 0 plotted below: .. list-table:: Bitmaps per target area (on the ``multi_focus_tower``) :widths: 33 33 33 :header-rows: 0 * - .. figure:: ./images/combined_bitmap_on_multi_focus_tower_from_group_0_on_rank_0.png :scale: 30% Rank 0 - .. figure:: ./images/combined_bitmap_on_multi_focus_tower_from_group_1_on_rank_1.png :scale: 30% Rank 1 - .. figure:: ./images/combined_bitmap_on_multi_focus_tower_from_group_0_on_rank_2.png :scale: 30% Rank 2 Since the ranks have not been synchronized yet, each rank initially only has the results it computed locally. For example, the bitmap on rank 1 is the combined flux of heliostats ``AA31`` and ``AA39`` because both were computed on that rank. However, neither the ray tracing results within each group nor the combined results across groups are available globally at this point. To obtain the final bitmap per target, we need to perform an ``all_reduce``. In principle, one final ``all_reduce`` is sufficient, but for the purpose of this tutorial, it is interesting to look at intermediate results using a nested ``all_reduce``: .. code-block:: python if ddp_setup["is_nested"]: torch.distributed.all_reduce( combined_bitmaps_per_target, op=torch.distributed.ReduceOp.SUM, group=ddp_setup["process_subgroup"], ) This ``all_reduce`` is performed per process subgroup, meaning it only reduces the results of heliostats within the respective group. .. list-table:: Bitmaps per target area (on the ``multi_focus_tower``) after nested ``all_reduce``. :widths: 33 33 33 :header-rows: 0 * - .. figure:: ./images/reduced_bitmap_on_multi_focus_tower_on_rank_0.png :scale: 30% Rank 0 - .. figure:: ./images/reduced_bitmap_on_multi_focus_tower_on_rank_1.png :scale: 30% Rank 1 - .. figure:: ./images/reduced_bitmap_on_multi_focus_tower_on_rank_2.png :scale: 30% Rank 2 In practice, the global ``all_reduce`` is sufficient to obtain the final bitmap on each target: .. code-block:: python if ddp_setup["is_distributed"]: torch.distributed.all_reduce( combined_bitmaps_per_target, op=torch.distributed.ReduceOp.SUM ) .. list-table:: Bitmaps per target area (on the ``multi_focus_tower``) after final ``all_reduce``. :widths: 33 33 33 :header-rows: 0 * - .. figure:: ./images/final_reduced_bitmap_on_multi_focus_tower_on_rank_0.png :scale: 30% Rank 0 - .. figure:: ./images/final_reduced_bitmap_on_multi_focus_tower_on_rank_1.png :scale: 30% Rank 1 - .. figure:: ./images/final_reduced_bitmap_on_multi_focus_tower_on_rank_2.png :scale: 30% Rank 2 Now all ranks are synchronized and we have the final image shared across them. With that we have completed fully distributed raytracing in ``ARTIST``! .. note:: The images generated in this tutorial are for illustrative purposes, often with reduced resolution and without hyperparameter optimization. Therefore, they should not be taken as a measure of the quality of ``ARTIST``. Please see our publications for further information.