FRONTBASE TECHNICAL NOTES
FRONTBASE TECHNICAL NOTES

Indexes and FrontBase™

By: Geert Clemmensen, Frontline Software,November 09, 2000

FrontBase™ offers two strategies for maintaining indexes:

  1. The default which is very space efficient and works well for tables with up to a few hundred thousand rows. This mode is called PRESERVE SPACE.
  2. The alternative method which is fast when searching through millions of rows with the trade-off being more disk space consumption. This mode is called PRESERVE TIME.

The PRESERVE SPACE mode is, as implied by its name, very disk space efficient. An index on a table, no matter the number of colums, costs less than 5 bytes per row. If you have a table with 100,000 rows, creating an index on this table will thus increase the disk footprint with less than 500 KB. The reason for this memory efficiency is that column values are not stored together with the index information (an optimized B-tree). The trade-off is that whenever an index is used, rows has to be loaded to get to the column values. This index mode works well in most cases and users like the low disk space footprint of a database.

The PRESERVE TIME mode is for tables with 500,000 or more rows. The mode scales very well and can easily handle tables with many millions of rows. Column values are copied into the B-tree, which increases the disk space footprint, but lookups are very fast. The actual rows are only loaded when needed, which in some cases are never. If a given SELECT only fetches column values that are part of an index, the actual rows are not loaded at all, i.e. this type of SELECTs becomes very fast.

Time for an example, so let's use the traditional indexing setup with a WORD table, a DOCUMENT table, and a RELation table:

CREATE TABLE WORD(
       WORDPK INT PRIMARY KEY,       -- Implies an index
       WORD   VARCHAR(64));
CREATE INDEX ON WORD(WORD);

CREATE TABLE DOCUMENT(
       DOCPK INT PRIMARY KEY,        -- Implies an index
       DOC   CLOB);

CREATE TABLE REL(
       WORDFK INT,
       DOCFK INT,
       PRIMARY KEY(WORDFK, DOCFK));  -- Implies an index
CREATE INDEX ON REL(DOCFK, WORDFK);
COMMIT;

To get a list of DOCFKs identifying the documents in which a given word is found:

SELECT DOCFK
       FROM  REL, WORD
      WHERE REL.WORDFK = WORD.WORDPK AND WORD.WORD = '<some word>';

To get a list of WORDFKs identifying the words found in a given document:

SELECT WORDFK
       FROM  REL
       WHERE DOCFK = <some document PK>;

To find a list of DOCFKs identifying the documents in which two given words are found (could/should be wrapped into a VIEW):

SELECT DOCFK
      FROM  REL, WORD
      WHERE REL.WORDFK = WORD.WORDPK AND WORD.WORD = '<word1>'

INTERSECT

SELECT DOCFK
      FROM  REL, WORD
      WHERE REL.WORDFK = WORD.WORDPK AND WORD.WORD = '<word2>';

In a reasonable setup with 100,000 documents and 100 words on average per document, the REL table holds 10,000,000 rows and we have the perfect candidate for PRESERVE TIME. In all of the above SELECT statements, the actual rows are not loaded as the indexes hold the fetched column values (assuming PRESERVE TIME).

The WORD table is a less likely candidate for PRESERVE TIME, here it will probably make more sense to use PRESERVE SPACE combined with a proper sized cache (not covered in this paper).

How to change the index mode

The indexes created on a given table will all have to be of the same mode which is set as follows:

	ALTER TABLE REL SET INDEX PRESERVE TIME;
		or
	ALTER TABLE REL SET INDEX PRESERVE SPACE;
	COMMIT;

Changing the index mode implies that all current indexes are dropped and then recreated. For large tables this operation will take some time to execute, but can be left unattended.

If a table is being bulk-loaded from e.g. flat files, it is in general a good idea to drop all constraints and indexes before the loading starts. Once loading is completed, set the index mode and then add the necessary constraints and indexes.

To increase the SELECT speed, all PRESERVE TIME indexes created on all tables in a given database can be rearranged for even better performance by applying the FBIndexOptimizer tool:

	FBIndexOptimizer -v <database name>

Please note that the database must be stopped before applying FBIndexOptimizer. The FBIndexOptimizer tool is currently not part of the standard distribution, but can be made available on request by sending an e-mail to: support@frontbase.com.