artist.util.env
===============

.. py:module:: artist.util.env


Attributes
----------

.. autoapisummary::

   artist.util.env.log


Classes
-------

.. autoapisummary::

   artist.util.env.DdpSetup


Functions
---------

.. autoapisummary::

   artist.util.env.initialize_ddp_environment
   artist.util.env.create_subgroups_for_nested_ddp
   artist.util.env.setup_distributed_environment
   artist.util.env.distribute_groups_among_ranks
   artist.util.env.get_device


Module Contents
---------------

.. py:class:: DdpSetup

   Bases: :py:obj:`TypedDict`


   Initialize self.  See help(type(self)) for accurate signature.


   .. py:attribute:: device
      :type:  torch.device


   .. py:attribute:: is_distributed
      :type:  bool


   .. py:attribute:: is_nested
      :type:  bool


   .. py:attribute:: rank
      :type:  int


   .. py:attribute:: world_size
      :type:  int


   .. py:attribute:: process_subgroup
      :type:  torch.distributed.ProcessGroup | None


   .. py:attribute:: groups_to_ranks_mapping
      :type:  dict[int, list[int]]


   .. py:attribute:: heliostat_group_rank
      :type:  int


   .. py:attribute:: heliostat_group_world_size
      :type:  int


   .. py:attribute:: ranks_to_groups_mapping
      :type:  dict[int, list[int]]


.. py:data:: log

   A logger for the environment.


.. py:function:: initialize_ddp_environment(device: torch.device | None = None) -> tuple[torch.device, bool, int, int]

   Set up the distributed environment.

   Based on the available devices, the outer process group is initialized with the
   appropriate backend. For computation on GPUs the nccl backend optimized for
   NVIDIA GPUs is chosen. For computation on CPUs gloo is used as backend. If
   the program is run without the intention of being distributed, the world_size
   will be set to 1, accordingly the only rank is 0.

   Parameters
   ----------
   device : torch.device | None
       The device on which to perform computations or load tensors and models (default is None).
       If None, ``ARTIST`` will automatically select the most appropriate
       device (CUDA or CPU) based on availability and OS.

   Yields
   ------
   torch.device
       The device for each rank.
   bool
       Distributed mode enabled or disabled.
   int
       The rank of the current process.
   int
       The world size or total number of processes.


.. py:function:: create_subgroups_for_nested_ddp(rank: int, groups_to_ranks_mapping: dict[int, list[int]]) -> tuple[int, int, torch.distributed.ProcessGroup | None, dict[int, list[int]]]

   Assign the current process (rank) to a subgroup based on a predefined group assignment map.

   Parameters
   ----------
   rank : int
       The current process.
   groups_to_ranks_mapping : dict[int, list[int]]
       The mapping from heliostat group to rank.

   Returns
   -------
   int
       The rank within the heliostat group.
   int
       The world size of the heliostat group.
   torch.distributed.ProcessGroup | None
       The distributed process group.
   dict[int, list[int]]
       The mapping from ranks to heliostat groups.


.. py:function:: setup_distributed_environment(number_of_heliostat_groups: int, device: torch.device | None = None) -> collections.abc.Generator[DdpSetup, None, None]

   Set up the distributed environment.

   Parameters
   ----------
   number_of_heliostat_groups : int
       The number of distinct heliostat groups in the scenario.
   device : torch.device | None
       The device on which to perform computations or load tensors and models (default is None).
       If None, ``ARTIST`` will automatically select the most appropriate
       device (CUDA or CPU) based on availability and OS.

   Yields
   ------
   DdpSetup
       A typed dictionary describing the full distributed setup, containing:
       ``device``, ``is_distributed``, ``is_nested``, ``rank``, ``world_size``,
       ``process_subgroup``, ``groups_to_ranks_mapping``, ``heliostat_group_rank``,
       ``heliostat_group_world_size``, and ``ranks_to_groups_mapping``.


.. py:function:: distribute_groups_among_ranks(world_size: int, number_of_heliostat_groups: int) -> tuple[dict[int, list[int]], bool]

   Distribute groups among ranks in round-robin fashion.

   If there are fewer ranks than groups, some ranks receive multiple groups.
   If there are more ranks than groups, some groups are handled by multiple ranks, enabling nested distribution.

   Parameters
   ----------
   world_size : int
       Total number of processes in the global process group.
   number_of_heliostat_groups : int
       The number of heliostat groups.

   Returns
   -------
   dict[int, list[int]]
       The dictionary mapping heliostat groups to ranks.
   bool
       Indicates whether the distributed setup is nested or not.


.. py:function:: get_device(device: torch.device | None = None) -> torch.device

   Get the correct GPU device type for common operating systems, default to CPU if none is found.

   Parameters
   ----------
   device : torch.device | None
       The device on which to perform computations or load tensors and models (default is None).
       If None, ``ARTIST`` will automatically select the most appropriate
       device (CUDA or CPU) based on availability and OS. MPS (for Mac) is not supported due to
       limitations in torch.

   Returns
   -------
   torch.device
       The device.