Dataset - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English
  • Input file: Randomly generate 10,000,000 lines (about 1GB) of csv file similar to L2/demos/text/dup_match/data/test.csv as test input file.
  • The Demo execute time 8,215.56 s.
  • Baseline (Dedupe Python: https://github.com/dedupeio/dedupe) execute time 35,030.751 s
  • Accelaration Ratio: 5.1X

Note

1. The baseline version run on Intel(R) Xeon(R) CPU E5-2690 v4, clocked at 2.60GHz.
2. The training result of Baseline includes self.predicate=((TfidfNGramCanopyPredicate: (0.8, Site name), TfidfTextCanopyPredicate: (0.8, Address)), (SimplePredicate: (alphaNumericPredicate, Site name), TfidfTextCanopyPredicate: (0.8, Site name)), (SimplePredicate: (wholeFieldPredicate, Site name), SimplePredicate: (wholeFieldPredicate, Zip))).