https://github.com/LiveCodeBench/LiveCodeBench/blob/28fef95ea8c9f7a547c8329f2cd3d32b92c1fa24/lcb_runner/evaluation/compute_code_execution_metrics.py#L15 Why assert output == generation instead of using input or comparing directly?Am I misunderstanding something?