/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-composable-kernel/checkouts/3643/include/ck_tile/core/utility/persistent_async_input_scheduler.hpp Source File

/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-composable-kernel/checkouts/3643/include/ck_tile/core/utility/persistent_async_input_scheduler.hpp Source File#

Composable Kernel: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-composable-kernel/checkouts/3643/include/ck_tile/core/utility/persistent_async_input_scheduler.hpp Source File
persistent_async_input_scheduler.hpp
Go to the documentation of this file.
1 // Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
2 // SPDX-License-Identifier: MIT
3 
4 #pragma once
5 
6 #include <cstdint>
7 
8 namespace ck_tile {
9 
27 {
32 
36  uint32_t* chunk_signals = nullptr;
37 
42 
47 };
48 
49 } // namespace ck_tile
Definition: cluster_descriptor.hpp:13
int32_t int32_t
Definition: integer.hpp:10
unsigned int uint32_t
Definition: stdint.h:126
Scheduler for persistent GEMM kernels with asynchronous input streaming.
Definition: persistent_async_input_scheduler.hpp:27
uint32_t tiles_per_chunk_m
Number of M-dimension tiles grouped into each chunk. Grouping tiles balances synchronization overhead...
Definition: persistent_async_input_scheduler.hpp:31
int32_t tile_idx_pivot_m
Pivot offset for rotating the chunk assignment. Allows shifting which tiles map to which chunks,...
Definition: persistent_async_input_scheduler.hpp:41
uint32_t * chunk_signals
Device pointer to array of signal values (uint32_t), one per chunk. Producer sets signals to coordina...
Definition: persistent_async_input_scheduler.hpp:36
uint32_t num_chunks
Number of signal chunks allocated. Must equal ceil((tiles_m + tile_idx_pivot_m) / tiles_per_chunk_m)....
Definition: persistent_async_input_scheduler.hpp:46