artist.util.env

Attributes

log

A logger for the environment.

Classes

DdpSetup

Initialize self. See help(type(self)) for accurate signature.

Functions

initialize_ddp_environment(→ tuple[torch.device, bool, ...)

Set up the distributed environment.

create_subgroups_for_nested_ddp(→ tuple[int, int, ...)

Assign the current process (rank) to a subgroup based on a predefined group assignment map.

setup_distributed_environment(...)

Set up the distributed environment.

distribute_groups_among_ranks(→ tuple[dict[int, ...)

Distribute groups among ranks in round-robin fashion.

get_device(→ torch.device)

Get the correct GPU device type for common operating systems, default to CPU if none is found.

Module Contents

class artist.util.env.DdpSetup

Bases: TypedDict

Initialize self. See help(type(self)) for accurate signature.

device: torch.device
is_distributed: bool
is_nested: bool
rank: int
world_size: int
process_subgroup: torch.distributed.ProcessGroup | None
groups_to_ranks_mapping: dict[int, list[int]]
heliostat_group_rank: int
heliostat_group_world_size: int
ranks_to_groups_mapping: dict[int, list[int]]
artist.util.env.log

A logger for the environment.

artist.util.env.initialize_ddp_environment(device: torch.device | None = None) tuple[torch.device, bool, int, int]

Set up the distributed environment.

Based on the available devices, the outer process group is initialized with the appropriate backend. For computation on GPUs the nccl backend optimized for NVIDIA GPUs is chosen. For computation on CPUs gloo is used as backend. If the program is run without the intention of being distributed, the world_size will be set to 1, accordingly the only rank is 0.

Parameters

devicetorch.device | None

The device on which to perform computations or load tensors and models (default is None). If None, ARTIST will automatically select the most appropriate device (CUDA or CPU) based on availability and OS.

Yields

torch.device

The device for each rank.

bool

Distributed mode enabled or disabled.

int

The rank of the current process.

int

The world size or total number of processes.

artist.util.env.create_subgroups_for_nested_ddp(rank: int, groups_to_ranks_mapping: dict[int, list[int]]) tuple[int, int, torch.distributed.ProcessGroup | None, dict[int, list[int]]]

Assign the current process (rank) to a subgroup based on a predefined group assignment map.

Parameters

rankint

The current process.

groups_to_ranks_mappingdict[int, list[int]]

The mapping from heliostat group to rank.

Returns

int

The rank within the heliostat group.

int

The world size of the heliostat group.

torch.distributed.ProcessGroup | None

The distributed process group.

dict[int, list[int]]

The mapping from ranks to heliostat groups.

artist.util.env.setup_distributed_environment(number_of_heliostat_groups: int, device: torch.device | None = None) collections.abc.Generator[DdpSetup, None, None]

Set up the distributed environment.

Parameters

number_of_heliostat_groupsint

The number of distinct heliostat groups in the scenario.

devicetorch.device | None

The device on which to perform computations or load tensors and models (default is None). If None, ARTIST will automatically select the most appropriate device (CUDA or CPU) based on availability and OS.

Yields

DdpSetup

A typed dictionary describing the full distributed setup, containing: device, is_distributed, is_nested, rank, world_size, process_subgroup, groups_to_ranks_mapping, heliostat_group_rank, heliostat_group_world_size, and ranks_to_groups_mapping.

artist.util.env.distribute_groups_among_ranks(world_size: int, number_of_heliostat_groups: int) tuple[dict[int, list[int]], bool]

Distribute groups among ranks in round-robin fashion.

If there are fewer ranks than groups, some ranks receive multiple groups. If there are more ranks than groups, some groups are handled by multiple ranks, enabling nested distribution.

Parameters

world_sizeint

Total number of processes in the global process group.

number_of_heliostat_groupsint

The number of heliostat groups.

Returns

dict[int, list[int]]

The dictionary mapping heliostat groups to ranks.

bool

Indicates whether the distributed setup is nested or not.

artist.util.env.get_device(device: torch.device | None = None) torch.device

Get the correct GPU device type for common operating systems, default to CPU if none is found.

Parameters

devicetorch.device | None

The device on which to perform computations or load tensors and models (default is None). If None, ARTIST will automatically select the most appropriate device (CUDA or CPU) based on availability and OS. MPS (for Mac) is not supported due to limitations in torch.

Returns

torch.device

The device.