defpredict_choice(model, tokenizer, prompt_fmt, max_new_tokens=8): pred = None for t in generate_text_basic_stream_cache( model=model, token_ids=prompt_fmt, max_new_tokens=max_new_tokens, eos_token_id=tokenizer.eos_token_id, ): answer = tokenizer.decode(t.squeeze(0).tolist()) for letter in answer: letter = letter.upper() if letter in"ABCD": pred = letter break if pred: break return pred
defrubric_prompt(instruction, reference_answer, model_answer): rubric = ( "You are a fair judge assistant. You will be " "given an instruction, a reference answer, and " "a candidate answer to evaluate, according to " "the following rubric:\n\n" "1: The response fails to address the " "instruction, providing irrelevant, incorrect, " "or excessively verbose content.\n" "2: The response partially addresses the " "instruction but contains major errors, " "omissions, or irrelevant details.\n" "3: The response addresses the instruction to " "some degree but is incomplete, partially " "correct, or unclear in places.\n" "4: The response mostly adheres to the " "instruction, with only minor errors, " "omissions, or lack of clarity.\n" "5: The response fully adheres to the " "instruction, providing a clear, accurate, and " "relevant answer in a concise and efficient " "manner.\n\n" "Now here is the instruction, the reference " "answer, and the response.\n" )
The candidate answer directly addresses the question, correctly applies the given premises, and concisely states that a penguin would be able to fly. It is accurate, relevant, and clear.