machine learning data leakage AI News List

machine learning data leakage AI News List | Blockchain.News

AI News List

List of AI News about machine learning data leakage

Time	Details
2025-12-09 03:40	Python random.seed() Integer Sign Bug: Identical RNG Streams for Positive and Negative Seeds Exposed According to Andrej Karpathy on Twitter, the Python random.seed() function produces identical random number generator (RNG) streams when seeded with positive and negative integers of the same magnitude, such as 3 and -3. This behavior results from the CPython source code, which applies abs() to integer seeds, discarding the sign and thus creating the same RNG object for both values [Source: Karpathy Twitter, Python random docs, CPython GitHub]. This can lead to subtle but critical errors in AI and machine learning workflows, such as data leakage between train and test sets if sign is used to differentiate splits. The random module's documentation guarantees only that identical seeds yield identical sequences, but not that different seeds produce distinct streams. This pitfall highlights the importance of understanding library implementations to avoid reproducibility and data contamination issues in AI model training and evaluation. Source

Time

Details

2025-12-09
03:40

Python random.seed() Integer Sign Bug: Identical RNG Streams for Positive and Negative Seeds Exposed

According to Andrej Karpathy on Twitter, the Python random.seed() function produces identical random number generator (RNG) streams when seeded with positive and negative integers of the same magnitude, such as 3 and -3. This behavior results from the CPython source code, which applies abs() to integer seeds, discarding the sign and thus creating the same RNG object for both values [Source: Karpathy Twitter, Python random docs, CPython GitHub]. This can lead to subtle but critical errors in AI and machine learning workflows, such as data leakage between train and test sets if sign is used to differentiate splits. The random module's documentation guarantees only that identical seeds yield identical sequences, but not that different seeds produce distinct streams. This pitfall highlights the importance of understanding library implementations to avoid reproducibility and data contamination issues in AI model training and evaluation.

Source