Measure AI Answer Quality with a Small Evaluation Set | GAUAB