Tests are flaky across runs, but the invocation is missing the blaze flag --runs_per_test_detects_flakes, so blaze is reporting the target as failed instead of flaky. Blaze flag description