Adventures with TinyGrad
  1. Appendix A: helpers.py
  • Adventures with TinyGrad
  • 0 - Introduction
  • 1 - UOps
  • 2 - View and ShapeTracker
  • 3 - The Pattern Matcher
  • 4 - The .arange() insanity
  • Appendix A: helpers.py
  • Appendix B - UOp Summary
  • Misc 1 - elf.py and the ELF format
  • Misc 2 - CUDA Runtime library
  • Misc 3 - Bounty hunt

Appendix A: helpers.py

tinygrad/helpers.py contains a bunch of helper functions that are used all over TinyGrad.

Go over the list, it will make reading the code easier. When in doubt, always refer to the code.

dedup(x: Iterable[T]) -> list[T]
  • Description: Removes duplicate elements from an iterable while preserving the original order of the first appearance.
  • Input: An iterable x.
  • Output: A list containing unique elements from x in their original order.
  • Example: dedup([1, 3, 2, 3, 1]) returns [1, 3, 2].
  • Usage: Used throughout tinygrad to ensure uniqueness in lists where order matters, like kernel arguments, variable lists, or UOp sources.
  • Gotchas: Requires hashable elements.
argfix(*x) -> tuple
  • Description: Normalizes function arguments. It allows a function to accept arguments either as a sequence (list/tuple) or as individual arguments.
  • Input: Variable positional arguments *x.
  • Output: A tuple.
  • Example:
    • argfix(1, 2, 3) returns (1, 2, 3).
    • argfix((1, 2, 3)) or argfix([1, 2, 3]) returns (1, 2, 3).
  • Usage: Commonly used in Tensor methods (e.g., reshape, pad, permute, expand) to allow flexible input like t.reshape(2, 3) or t.reshape((2, 3)).
  • Gotchas: Raises a ValueError if the first argument is a list/tuple and there are more arguments (e.g., argfix((1, 2), 3) is invalid).
argsort(x: Union[list, tuple])
  • Description: Returns the indices that would sort the input iterable x. Mimics numpy.argsort.
  • Input: An iterable x.
  • Output: A sequence of the same type as x (e.g., list or tuple) containing integer indices.
  • Example: argsort([30, 10, 20]) returns [1, 2, 0].
  • Usage: Used internally, for example, in gradient.py to compute the inverse permutation needed for the gradient of permute. Also used in View.__add__ for sorting strides during view merging/inversion.
  • Gotchas: Returns a list or a tuple, depending on the input type. Not intended to be used on Tensors.
all_same(items: Union[tuple[T, ...], list[T]]) -> bool
  • Description: Checks if all elements in a non-empty list or tuple are identical (equal to the first element).
  • Input: A list or tuple.
  • Output: bool.
  • Example: all_same([1, 1, 1]) returns True. all_same((2, 3, 2)) returns False.
  • Usage: Used for assertions and checks, e.g., ensuring source UOps have the same shape (UOp.st), checking if all elements in a VCONST are the same (UOp.const), verifying multi-device tensors have consistent properties (tensor.py, multi.py), checking kernel arguments (Kernel.linearize).
  • Gotchas: Returns True for empty or single-element lists/tuples.
all_int(t: Sequence[Any]) -> TypeGuard[tuple[int, ...]]
  • Description: Checks if all elements in a sequence are of type int. Uses TypeGuard for static analysis benefits.
  • Input: A Sequence (like list or tuple).
  • Output: bool. If True, type checkers can infer the input is specifically a sequence of int.
  • Example: all_int([1, 2, 3]) returns True. all_int((1, 2.0)) returns False.
  • Usage: Crucial for distinguishing between concrete shapes (tuples of int) and symbolic shapes (tuples containing Variable or UOp). Used extensively in ShapeTracker, View, and Tensor methods to determine if operations like size calculation or certain reshapes are possible without symbolic evaluation.
colored(st, color: Optional[str], background=False) -> str
  • Description: Adds ANSI escape codes to a string st to color it for terminal output. Supports foreground and background colors. Bright colors can be selected by passing the color name in uppercase (e.g., “RED”).
  • Input: String st, optional color name color (e.g., “red”, “GREEN”), boolean background.
  • Output: A string, potentially with ANSI color codes.
  • Example: colored("Error", "RED") returns "\u001b[91mError\u001b[0m".
  • Usage: Used for enhancing debug prints, progress bars (tqdm), visualizing kernel shapes (Kernel.colored_shape), and highlighting warnings/errors.
  • Gotchas: Only effective in terminals that support ANSI escape codes.
colorize_float(x: float) -> str
  • Description: Formats a float x (intended as a ratio, often speedup/slowdown) and colors it: green if < 0.75, yellow if 0.75 <= x <= 1.15, red if > 1.15.
  • Input: A float x.
  • Output: A colored string representation (e.g., " 0.50x").
  • Example: colorize_float(0.5) returns a green string. colorize_float(1.2) returns a red string.
  • Usage: Used in profiling output (helpers.Profiling) to visually indicate performance ratios.
time_to_str(t: float, w=8) -> str
  • Description: Converts a time duration t (in seconds) into a human-readable string, automatically selecting units (s, ms, us) based on magnitude. Formats the output to a specified width w.
  • Input: Time t in seconds (float), optional field width w (int).
  • Output: A formatted string representing the time.
  • Example: time_to_str(0.0015) might return " 1.50ms". time_to_str(1.2) might return " 1.20s ".
  • Usage: Used extensively in debug output (DEBUG>=2), profiling (Profiling, Timing), and beam search logging to display kernel runtimes and compile times concisely.
ansistrip(s: str) -> str
  • Description: Removes ANSI escape codes (primarily color/formatting codes) from a string s.
  • Input: A string s.
  • Output: The string s with ANSI codes removed.
  • Example: ansistrip("\u001b[91mError\u001b[0m") returns "Error".
  • Usage: Used by ansilen to calculate visible length and by to_function_name to sanitize potentially colored strings before creating identifiers.
ansilen(s: str) -> int
  • Description: Calculates the visible length of a string s by first removing ANSI escape codes using ansistrip.
  • Input: A string s.
  • Output: The length of the string without ANSI codes (int).
  • Example: ansilen(colored("hello", "red")) returns 5.
  • Usage: Useful for formatting output containing colored text, ensuring correct alignment based on visible characters (e.g., in Kernel.colored_shape(pad=...)).
make_tuple(x: Union[int, Sequence[int]], cnt: int) -> tuple[int, ...]
  • Description: Creates a tuple of integers of length cnt. If x is an integer, it’s repeated cnt times. If x is a sequence, it’s converted to a tuple (its length should match cnt).
  • Input: An integer or sequence of integers x, and a target count cnt.
  • Output: A tuple of integers of length cnt.
  • Example: make_tuple(3, 2) returns (3, 3). make_tuple([1, 2], 2) returns (1, 2).
  • Usage: Normalizes arguments like stride, padding which can be a single value or a sequence.
  • Gotchas: Does not check if len(x) == cnt if x is not an int.
flatten(l: Iterable[Iterable[T]]) -> list[T]
  • Description: Flattens an iterable of iterables by one level.
  • Input: An iterable of iterables l.
  • Output: A flattened list.
  • Example: flatten([[1, 2], [3, 4]]) returns [1, 2, 3, 4].
  • Usage: Widespread utility function used throughout the codebase.
fully_flatten(l)
  • Description: Recursively flattens nested lists/tuples into a single list. Handles numpy 0-dim arrays.
  • Input: A potentially nested structure l.
  • Output: A completely flattened list.
  • Example: fully_flatten([1, [2, [3, 4]]]) returns [1, 2, 3, 4].
  • Usage: Primarily used by _frompy to prepare data for struct.pack.
fromimport(mod, frm)
  • Description: Dynamically imports frm from module mod. Equivalent to from mod import frm, returning frm.
  • Input: Module name mod (str), item name frm (str).
  • Output: The imported object.
  • Example: fromimport("os", "path") is equivalent to from os import path.
strip_parens(fst: str) -> str
  • Description: Heuristically removes the outermost matching parentheses from a string.
  • Input: A string fst.
  • Output: The string with outermost parentheses removed, if they match.
  • Example: strip_parens("(a+b)") returns "a+b". strip_parens("(a+b)*(c+d)") returns "(a+b)*(c+d)".
  • Usage: Used by C-style renderers to simplify generated code expressions.
ceildiv(num, amt) -> int
  • Description: Computes ceil(num / amt) using integer arithmetic.
  • Input: Numerator num, denominator amt.
  • Output: Ceiling of the division as an int.
  • Example: ceildiv(5, 2) returns 3.
  • Usage: Ubiquitous for calculating sizes, bounds, counts that require rounding up.
round_up(num: int, amt: int) -> int
  • Description: Rounds num up to the nearest multiple of amt.
  • Input: Integer num, integer amt.
  • Output: num rounded up to the nearest multiple of amt.
  • Example: round_up(5, 4) returns 8.
  • Usage: Used for alignment and ensuring sizes are multiples of factors (memory, block sizes).
lo32(x: Any) -> Any, hi32(x: Any) -> Any
  • Description: Extract the lower (lo32) or upper (hi32) 32 bits of a (presumably 64-bit) integer or symbolic integer (sint).
  • Input: A value x (typically an integer or symbolic integer).
  • Output: The lower or upper 32 bits of x.
  • Example: lo32(0x12345678ABCDEF) returns 0xABCDEF. hi32(0x12345678ABCDEF) returns 0x12345678.
  • Usage: Used in low-level code generation or runtime interactions needing 32-bit components of 64-bit values (e.g., specific hardware instructions, register packing).
  • Gotchas: Uses bitwise AND (&) and right shift (>>). The Any type hint means behavior with non-integer/sint types is undefined.
data64(data: Any) -> tuple[Any, Any], data64_le(data: Any) -> tuple[Any, Any]
  • Description: Split a 64-bit value into two 32-bit components. data64 returns (high_32, low_32). data64_le returns (low_32, high_32) (Little-Endian order).
  • Input: A 64-bit value data.
  • Output: A tuple of two 32-bit components.
  • Example: data64(0x12345678ABCDEF) returns (0x12345678, 0xABCDEF).
  • Usage: Similar to lo32/hi32, used for low-level interactions requiring split 64-bit values, potentially considering endianness.
getbits(value: int, start: int, end: int) -> int
  • Description: Extracts bits start through end (inclusive) from value.
  • Input: Integer value, bit positions start and end.
  • Output: The extracted bits as an integer.
  • Example: getbits(0b110101, 1, 3) returns 0b101 (5).
i2u(bits: int, value: int) -> int
  • Description: Converts a signed bits-bit integer value to its unsigned representation.
  • Input: Bit width bits, signed integer value.
  • Output: Unsigned integer representation.
  • Example: i2u(8, -1) returns 255.
merge_dicts(ds: Iterable[dict[T, U]]) -> dict[T, U]
  • Description: Merges dictionaries, asserting that shared keys have identical values.
  • Input: An iterable of dictionaries ds.
  • Output: A merged dictionary.
  • Example: merge_dicts([{"a": 1}, {"b": 2}]) returns {"a": 1, "b": 2}.
  • Usage: Consistently combines var_vals dictionaries during graph processing and JIT compilation.
partition(itr: Iterable[T], fxn: Callable[[T], bool]) -> tuple[list[T], list[T]]
  • Description: Splits iterable itr into two lists based on predicate fxn.
  • Input: An iterable itr and a predicate function fxn.
  • Output: A tuple of two lists: (items_where_fxn_is_true, items_where_fxn_is_false).
  • Example: partition([1, 2, 3, 4], lambda x: x % 2 == 0) returns ([2, 4], [1, 3]).
  • Usage: General utility for separating elements based on a condition, used in various places like separating UOps by type.
unwrap(x: Optional[T]) -> T
  • Description: Asserts x is not None and returns it, satisfying type checkers.
  • Input: A potentially None value x.
  • Output: The value x if it’s not None.
  • Example: unwrap(some_dict.get("key")) raises an assertion error if the key doesn’t exist.
  • Usage: Widely used for safety and to satisfy type checkers, especially when accessing optional values that should be present.
get_single_element(x: list[T]) -> T
  • Description: Asserts list x has exactly one element and returns it.
  • Input: A list x.
  • Output: The single element in the list.
  • Example: get_single_element([42]) returns 42.
  • Usage: Used when a list is expected to contain exactly one item, often in graph processing or optimization passes.
get_child(obj, key)
  • Description: Accesses nested attributes/items in obj using a dot-separated key string.
  • Input: An object obj and a string key with dot notation (e.g., “attr1.attr2.0”).
  • Output: The nested attribute or item.
  • Example: get_child({"a": {"b": [10, 20]}}, "a.b.1") returns 20.
  • Usage: Used for accessing deeply nested structures, especially in debugging and serialization.
  • Note: Handles both attribute access (for objects) and item access (for dicts/lists), converting numeric strings to integers for indexing.
word_wrap(x, wrap=80)
  • Description: Basic line wrapping at wrap characters.
  • Input: A string x and optional wrap length wrap.
  • Output: The string with newlines inserted every wrap characters.
  • Example: word_wrap("abcdefghijklmnopqrstuvwxyz", 10) returns "abcdefghij\nklmnopqrst\nuvwxyz".
  • Usage: Used for formatting debug output and error messages.
  • Note: Doesn’t respect word boundaries, simply inserts newlines every wrap characters.
pluralize(st: str, cnt: int) -> str
  • Description: Formats count cnt with noun st, adding ‘s’ if cnt != 1.
  • Input: A string st (noun) and an integer cnt.
  • Output: A formatted string with the count and properly pluralized noun.
  • Example: pluralize("item", 1) returns "1 item". pluralize("item", 2) returns "2 items".
  • Usage: Used for generating human-readable log messages and status reports.
  • Note: Uses simple ‘s’ pluralization, which doesn’t handle irregular plurals.
polyN(x: T, p: list[float]) -> T
  • Description: Evaluates polynomial p at x using Horner’s method.
  • Input: A value x and a list of coefficients p in descending order of power.
  • Output: The polynomial evaluation result.
  • Example: polyN(2, [1, 2, 3]) computes 1*2^2 + 2*2^1 + 3*2^0 = 1*4 + 2*2 + 3*1 = 11.
  • Usage: Used in transcendental.py for efficient function approximations where x is often a UOp.
  • Note: Coefficients are in descending order (highest power first).
to_function_name(s: str) -> str
  • Description: Converts string s to a valid C/Python function name by removing ANSI codes and hex-encoding invalid characters.
  • Input: A string s.
  • Output: A valid function name string.
  • Example: to_function_name("kernel(a+b)") might return "kernel_a_b_".
  • Usage: Generates valid, deterministic kernel function names for code generation.
  • Note: Results are cached for performance. Handles ANSI color codes by removing them.
getenv(key: str, default=0)
  • Description: Gets environment variable key, casting to type of default.
  • Input: Environment variable name key and optional default value.
  • Output: The environment variable value cast to the type of default.
  • Example: getenv("DEBUG", 0) returns the integer value of the DEBUG environment variable, or 0 if not set.
  • Usage: Pervasive configuration mechanism for tinygrad features and debug levels.
  • Gotchas: Caches results for performance, which means runtime changes to environment variables aren’t seen after first access.
temp(x: str, append_user: bool = False) -> str
  • Description: Creates a path string in the system’s temporary directory, optionally making it user-specific.
  • Input: A string x (path component) and optional flag append_user.
  • Output: A full path string in the temporary directory.
  • Example: temp("tinygrad/cache") might return /tmp/tinygrad/cache.
  • Usage: Locating caches, downloads, profiling outputs in a system-appropriate temporary location.
  • Gotchas: If append_user is True, appends the username to make the path user-specific.