List of AI News about machine learning data leakage
| Time | Details |
|---|---|
|
2025-12-09 03:40 |
Python random.seed() Integer Sign Bug: Identical RNG Streams for Positive and Negative Seeds Exposed
According to Andrej Karpathy on Twitter, the Python random.seed() function produces identical random number generator (RNG) streams when seeded with positive and negative integers of the same magnitude, such as 3 and -3. This behavior results from the CPython source code, which applies abs() to integer seeds, discarding the sign and thus creating the same RNG object for both values [Source: Karpathy Twitter, Python random docs, CPython GitHub]. This can lead to subtle but critical errors in AI and machine learning workflows, such as data leakage between train and test sets if sign is used to differentiate splits. The random module's documentation guarantees only that identical seeds yield identical sequences, but not that different seeds produce distinct streams. This pitfall highlights the importance of understanding library implementations to avoid reproducibility and data contamination issues in AI model training and evaluation. |