Optimized Load (eng)

In large applications, it is important to optimize data loading to shorten the time. One of the things you can use for this is optimized load which loads QVD file in a very efficient way.



It is very common for us to store information from various data sources in intermediate QVD files. A QVD file can contain a table with multiple columns and as many rows as needed. The purpose of a QVD file is both to customize the data source from multiple readings of the same information from several different apps and to be able to apply transformation logic in several steps.


When we then read data from a QVD file into an app, it can be read in two ways - optimized and non-optimized load. Optimized load is considerably faster but puts great demands on how the reading is done. In order for a data load to be optimized, we can only:

  • Filter data with a single exists() function

  • Rename columns

  • Read a subset of the columns

  • Keep/Join towards another table

  • Read a field several times

  • Read data distinct

Thus we can NOT:

  • Calculate a new column (e.g. ColumnA + 1 as Column3)

  • Use where statements in addition to a single exists() function

  • Use ApplyMap()

As an example here, we will test read a QVD file with 131 million lines & 13 columns (just about 4GB of data) for taxi journeys in New York. The version we are testing to load data into is QlikView Desktop April 2020 SR1.


To load data as optimized, we use:

Data:
LOAD VendorID, 
     tpep_pickup_datetime, 
     tpep_dropoff_datetime, 
     passenger_count, 
     trip_distance, 
     pickup_longitude, 
     pickup_latitude, 
     dropoff_longitude, 
     dropoff_latitude, 
     payment_type, 
     fare_amount, 
     tip_amount, 
     trip_time
FROM [F:\Temp\NY_ShortFull.qvd] (qvd);

You can see that the data load is optimized as it appears in the log for the loading:

Data << NY_ShortFull (row-based qvd optimized) 131,165,043 Lines fetched

To force the data load not to be optimized without adding any major complexity (which in itself impair the performance), we add a simple where statement:

Data:
LOAD VendorID, 
     tpep_pickup_datetime, 
     tpep_dropoff_datetime, 
     passenger_count, 
     trip_distance, 
     pickup_longitude, 
     pickup_latitude, 
     dropoff_longitude, 
     dropoff_latitude, 
     payment_type, 
     fare_amount, 
     tip_amount, 
     trip_time
FROM [F:\Temp\NY_ShortFull.qvd] (qvd) where 1=1;

For non-optimized load, nothing special appears in the log:

Data << NY_ShortFull 131,165,043 Lines fetched