BEIR Evaluation Guideline¶
Data Download¶
Please download the latest BEIR dataset [link].
Data Preprocess¶
For 18 datasets in BEIR, 15 of them can be directly loaded via OpenMatch. However, a small preprocessing is required before loading thehotpotqa, fever, robust04 datasets:
hotpotqaandfeverneed to filter out the ‘metadata’ in thequeries.jsonlfile, otherwise an error will be reported when loading.robust04requires the following filtering operations in thecorpus.jsonlfile:## robust04 text = re.sub(r"[^A-Za-z0-9=(),!?\'\`]", " ", data['text']) text = " ".join(text.split())
Evaluation¶
Here we use the trec-covid dataset as an example to describe the evaluation method. If you want to test all 18 datasets at once, you can run the shell script OpenMatch/scripts/BEIR/eval_beir.sh
## *************************************
## Model
export OUTPUT_DIR=/data/private/experiments # Path to store evaluation results.
export train_job_name=cocodr-base-msmarco # Folder of model files (Placed under OUTPUT_DIR by default).
## Dataset
export DATA_DIR=/data/private/dataset/beir # Path to store dataset files.
export dataset_name=trec-covid # Folder of dataset files (Placed under DATA_DIR by default).
## *************************************
## GPU setup
TOT_CUDA="1,2,3,4"
CUDAs=(${TOT_CUDA//,/ })
CUDA_NUM=${#CUDAs[@]}
PORT="1234"
## *************************************
export q_max_len=64
export p_max_len=128
export infer_job_name=inference.${train_job_name}.${dataset_name}
## *************************************
CUDA_VISIBLE_DEVICES=${TOT_CUDA} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${CUDA_NUM} --master_port=${PORT} -m openmatch.driver.beir_eval_pipeline \
--data_dir ${DATA_DIR}/${dataset_name} \
--model_name_or_path ${OUTPUT_DIR}/${train_job_name} \
--output_dir ${OUTPUT_DIR}/${infer_job_name} \
--query_template "<text>" \
--doc_template "<title> <text>" \
--q_max_len ${q_max_len} \
--p_max_len ${p_max_len} \
--per_device_eval_batch_size 4096 \
--dataloader_num_workers 1 \
--fp16 \
--use_gpu \
--overwrite_output_dir \
--use_split_search \
--max_inmem_docs 5000000 \