tpch data set features

This is a post summarizing some of the features of tpch data sets generated by the tpch data generator (

This post will keep getting updated as I get to play with more tables in the tpch benchmark.


  • The generator takes a scale factor input (-s). Following is the criteria to measure them (credit to Holger at MIT)
    • SF 10 (very small data set, still going be > 3 G)
    • SF 30 (small´╝î 12G)
    • SF 100 (normal)
    • SF 1000 (pretty big)

Lineitems table

  • Decimal are incremented in step size of 0.01. As a result, all the precision are in 2 decimal places
  • There are 4 combinations for the l_returnflag and l_linestatus, they are “A F”, “N F”, “N O”, “R F”.
This entry was posted in Database and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s