This is a post summarizing some of the features of tpch data sets generated by the tpch data generator (http://www.tpc.org/information/current_specifications.asp).
This post will keep getting updated as I get to play with more tables in the tpch benchmark.
- The generator takes a scale factor input (-s). Following is the criteria to measure them (credit to Holger at MIT)
- SF 10 (very small data set, still going be > 3 G)
- SF 30 (small， 12G)
- SF 100 (normal)
- SF 1000 (pretty big)
- Decimal are incremented in step size of 0.01. As a result, all the precision are in 2 decimal places
- There are 4 combinations for the l_returnflag and l_linestatus, they are “A F”, “N F”, “N O”, “R F”.