`ARTIST` Tutorial: Distributed Ray Tracing

Note

You can find the corresponding Python script for this tutorial here: https://github.com/ARTIST-Association/ARTIST/blob/main/tutorials/02_heliostat_raytracing_distributed_tutorial.py

This tutorial provides a brief introduction to ARTIST showcasing how the distributed environment is set up by performing distributed ray tracing.

It is best if you already know about the following processes in ARTIST

How to load a scenario.
Aligning heliostats.
Performing heliostat ray tracing to generate a flux density image on a target area.

If you need help with this look into our tutorial on heliostat raytracing.

Initial Setup

ARTIST is designed for parallel computation. To enable this (even when considering different types of heliostats with different kinematic and actuator configurations) we require HeliostatGroups. Detailed information on heliostat groups and how ARTIST is designed can be found in this description of what is happening under the hood in ARTIST.

Therefore, before we do anything we need to make sure we know how many heliostat groups are present. This can be achieved by calling the get_number_of_heliostat_groups_from_hdf5() function in the Scenario class:

number_of_heliostat_groups = Scenario.get_number_of_heliostat_groups_from_hdf5(
    scenario_path=scenario_path
)

In the distributed ray tracing the heliostat-tracing process can be distributed and parallelized using Distributed Data Parallel. For the distributed ray tracing using DDP, not only are the heliostat groups computed in parallel, but the data samples per group can also be computed in parallel. We will see exactly how this works later in the tutorial.

The Distributed Environment

Before we start running raytracing, we need to set up the distributed environment. Based on the available devices, the environment is initialized with the appropriate communication backend. For computation on GPUs the nccl backend optimized for NVIDIA GPUs is chosen. For computation on CPUs gloo is used as backend. If the program is run without the intention of being distributed, the world size will be set to 1, accordingly the only rank is 0.

All of this setup is handled automatically via:

with setup_distributed_environment(
    number_of_heliostat_groups=number_of_heliostat_groups,
    device=device,
) as ddp_setup:

Note: The rest of the tutorial occurs within this with block. This ensures that the distributed environment is running during execution and will be automatically cleaned up afterwards. The dictionary ddp_setup contains all distributed environment parameters.

Mapping between active heliostats, target areas and incident ray directions

ARTIST offers the flexibility, to activate and deactivate certain heliostats in the scenario, to have some heliostats aim at one target area, while others aim elsewhere and also to have different incident ray directions for different heliostats in the same alignment and raytracing process. Differing incident ray directions for different heliostats may not make much sense in the usual operation of the power plant, but this is very useful for calibration tasks.

To map each heliostat with its designated target area and incident ray direction you can use the following mapping structure:

# heliostat_target_light_source_mapping = [
    ("heliostat_1", "target_name_2", incident_ray_direction_tensor_1),
    ("heliostat_2", "target_name_2", incident_ray_direction_tensor_2),
    (...)
]

However, in this tutorial we want to consider all heliostats and therefore set our mapping to None:

heliostat_target_light_source_mapping = None

In this case it is later still possible to set a specific default target area index and a default incident ray direction, however if these are not provided then all heliostats are assigned to the first target area found in the scenario with a incident ray direction of “north”, i.e., the light source position is directly in the south.

Distributed Raytracing

Now we are almost ready to start the distributed raytracing, however we need to first set the resolution of the generated bitmap, and also create a tensor to store the final result:

bitmap_resolution = torch.tensor([256, 256])

combined_bitmaps_per_target = torch.zeros(
    (
        scenario.target_areas.number_of_target_areas,
        bitmap_resolution[index_mapping.unbatched_bitmap_e],
        bitmap_resolution[index_mapping.unbatched_bitmap_u],
    ),
    device=device,
)

Now the heliostat groups come in to play. We need to consider each heliostat group separately - in a distributed setting these groups can be computed in parallel, otherwise they will be processed sequentially. Therefore, the entire distributed raytracing process takes place within a for loop:

for heliostat_group_index in ddp_setup[config_dictionary.groups_to_ranks_mapping][
    ddp_setup[config_dictionary.rank]
]:
    heliostat_group = scenario.heliostat_field.heliostat_groups[
        heliostat_group_index
    ]

Within this loop, the first step is to determine which heliostats are being considered (“activated”) and which target areas are being used – this is achieved using the heliostat_target_light_source_mapping that we defined earlier:

(
    active_heliostats_mask,
    target_area_mask,
    incident_ray_directions,
) = scenario.index_mapping(
    heliostat_group=heliostat_group,
    string_mapping=heliostat_target_light_source_mapping,
    device=device,
)

We can then activate the heliostats as in the previous tutorial on single heliostat raytracing:

# For each index 0 indicates a deactivated heliostat and 1 an activated one.
# An integer greater than 1 indicates that the heliostat in this index is regarded multiple times.
heliostat_group.activate_heliostats(
    active_heliostats_mask=active_heliostats_mask, device=device
)

and also align the surfaces for all activated heliostats with the incident ray direction:

heliostat_group.align_surfaces_with_incident_ray_directions(
    aim_points=scenario.target_areas.centers[target_area_mask],
    incident_ray_directions=incident_ray_directions,
    active_heliostats_mask=active_heliostats_mask,
    device=device,
)

Now we are ready to create a distributed HeliostatRayTracer. In this case it is important to provide the world_size, the rank, the batch_size, and a random_seed:

ray_tracer = HeliostatRayTracer(
    scenario=scenario,
    heliostat_group=heliostat_group,
    world_size=ddp_setup[config_dictionary.heliostat_group_world_size],
    rank=ddp_setup[config_dictionary.heliostat_group_rank],
    batch_size=heliostat_group.number_of_active_heliostats,
    random_seed=ddp_setup[config_dictionary.heliostat_group_rank],
    bitmap_resolution=bitmap_resolution,
)

In this tutorial the batch_size is equal to the number of active heliostats. The batch_size determines how many heliostats are parallelized within this group’s raytracing process. If the number of active heliostats is high and your GPUs do not have enough memory capacity, you can reduce the batch_size to prevent CUDA out of memory errors during runtime. However, this also means slightly longer runtimes, as the batches within each group are then also computed sequentially.

Now we are ready to perform raytracing! This is still performed on a per-heliostat basis with the function trace_rays():

bitmaps_per_heliostat = ray_tracer.trace_rays(
    incident_ray_directions=incident_ray_directions,
    active_heliostats_mask=active_heliostats_mask,
    target_area_mask=target_area_mask,
    device=device,
)

Consider an example scenario, with two heliostat groups that have two heliostats each:

Group 0: AA28, AC43
Group 1: AA31, AA39

The world_size is three, this means there is rank 0, rank 1 and rank 2. The ranks are distributed among the groups in a round-robin fashion, therefore Group 0 is computed on rank 0 and rank 2 while Group 1 is computed on rank 1. Since Group 0 has 2 ranks available, this group can perform nested parallelization. Heliostat 0 of Group 0, named AA28 is handled by rank 0 and heliostat 1 of Group 0 named AC43 is handled by rank 2. Group 1 has two heliostats but only one rank assigned, meaning there is no nested parallelization possible. The ray tracer method trace_rays() produces bitmaps per heliostat.

Bitmaps per heliostats
Rank 0	Rank 1	Rank 2
Rank 0	Rank 1	Rank 2

However, now there may be multiple heliostats in the scenario all focusing on the same target. In this case, we need to determine the resulting flux image for that target, i.e., the combined result of all heliostats focusing on this target. This can be achieved with the get_bitmaps_per_target() function:

bitmaps_per_target = ray_tracer.get_bitmaps_per_target(
    bitmaps_per_heliostat=bitmaps_per_heliostat,
    target_area_mask=target_area_mask,
    device=device,
)

Since there may also be multiple heliostats in one group, we need to make sure the results from all heliostats are considered in this bitmap:

combined_bitmaps_per_target = combined_bitmaps_per_target + bitmaps_per_target

All heliostats in this example are aimed at the same target area, called the multi_focus_tower, this is the first target area in this scenario. This means all bitmaps in the combined_bitmaps_per_target tensor are empty, except the ones in index 0 (only those will be plotted from now on).

Bitmaps per target area (on the `multi_focus_tower`)
Rank 0	Rank 1	Rank 2

Notice how only the bitmap on rank 1 is actually a combined bitmap of two individual fluxes. This is because both of those fluxes, from heliostats AA31 and AA39 were actually computed on the same rank and since the ranks have not been synchronized yet, each rank only has the information it computed on its own. Neither the ray tracing results within each group, nor the combined results from each group have been synchronized. Therefore, to obtain the final bitmap per target we need to perform an all_reduce. One final all_reduce is sufficient, but for the purpose of this tutorial it is interesting to look at intermediate results and the nested all_reduce.

if ddp_setup[config_dictionary.is_nested]:
    torch.distributed.all_reduce(
        combined_bitmaps_per_target,
        op=torch.distributed.ReduceOp.SUM,
        group=ddp_setup[config_dictionary.process_subgroup],
    )

Bitmaps per target area (on the `multi_focus_tower`) after nested reduce
Rank 0	Rank 1	Rank 2

This all_reduce is performed per process subgroup, meaning it only reduces the results of heliostats within the respective group and can be skipped because the global all_reduce would handle it as well. The final bitmap on each target is reduced by:

if ddp_setup[config_dictionary.is_distributed]:
    torch.distributed.all_reduce(
        combined_bitmaps_per_target, op=torch.distributed.ReduceOp.SUM
    )

Bitmaps per target area (on the `multi_focus_tower`) after final reduce
Rank 0	Rank 1	Rank 2

Now all ranks are synchronized and we have the final image shared across them. With that we have completed fully distributed raytracing in ARTIST!

Note

The images generated in this tutorial are for illustrative purposes, often with reduced resolution and without hyperparameter optimization. Therefore, they should not be taken as a measure of the quality of ARTIST. Please see our publications for further information.

ARTIST Tutorial: Distributed Ray Tracing