
Optimization of LIKE expression when using Sqlite in iOS application
Recently, I was faced with the task of optimizing a query to Sqlite in my iOS application.
The task was as follows. There was a list of documents (PDF files), a dictionary (a list of words), well, a relationship of documents and words from a dictionary (the entry of words in a document). It was necessary to implement a search and display a list of documents in which there is an entered word.
The base structure was as follows:
A query to retrieve all documents in the selected sections that contain the specified word:
With the size dicrionary ~ = 400K, document ~ = 1K and document_index ~ = 500K entries, the request ran for about 30 seconds on my iPad 2, which was unacceptable for my application.
In the process of finding a solution to speed up the query, I found out that in Sqlite3 when using the LIKE expression, the index is not taken into account, which was not at all good, given my number of records. I also could not refuse to use LIKE and replace it with the comparison operation for equality, because I needed a substring search. Then I stumbled upon this article , it suggested replacing the LIKE expression with the operations> = and <+ Yaya symbols (zzz if the word base is in English) in the second expression.
With this optimization, the index is taken into account during the search, and all words starting with 'abakan' are found. The execution time is only 0.5 seconds! Of course, there are limitations, you cannot implement a substring search from the beginning of a line, i.e. find words starting with any characters and ending with 'abakan', but in my particular case it was not necessary.
The task was as follows. There was a list of documents (PDF files), a dictionary (a list of words), well, a relationship of documents and words from a dictionary (the entry of words in a document). It was necessary to implement a search and display a list of documents in which there is an entered word.
The base structure was as follows:
CREATE TABLE document (
id Int PRIMARY KEY NOT NULL,
root_id Int,
name Varchar(100),
active Tinyint
);
CREATE INDEX IDX_documentId ON document (id);
CREATE INDEX IDX_documentName ON document (name);
CREATE TABLE dictionary (
id Int PRIMARY KEY NOT NULL,
word Varchar(100) NOT NULL
);
CREATE INDEX IDX_dictionaryId ON dictionary (id);
CREATE UNIQUE INDEX IDX_dictionaryWord ON dictionary (word ASC);
CREATE TABLE document_index (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
document_id Int NOT NULL,
word_id Int NOT NULL,
FOREIGN KEY(document_id) REFERENCES document(id),
FOREIGN KEY(word_id) REFERENCES dictionary(id)
);
CREATE INDEX IDX_documentIndexId ON document_index (id);
CREATE INDEX IDX_documentIndexDocId ON document_index (document_id);
CREATE INDEX IDX_documentIndexWordId ON document_index (word_id);
A query to retrieve all documents in the selected sections that contain the specified word:
SELECT document.id, document.name
FROM document
INNER JOIN document_index on document_index.document_id=document.id
INNER JOIN dictionary on dictionary.id=document_index.word_id
WHERE dictionary.word LIKE @pQuery
AND document.active = 1
AND document.root_id in (@pRoot1, @pRoot2, @pRoot3, @pRoot4, @pRoot5, @pRoot6, @pRoot7)
LIMIT @First, @Count
With the size dicrionary ~ = 400K, document ~ = 1K and document_index ~ = 500K entries, the request ran for about 30 seconds on my iPad 2, which was unacceptable for my application.
In the process of finding a solution to speed up the query, I found out that in Sqlite3 when using the LIKE expression, the index is not taken into account, which was not at all good, given my number of records. I also could not refuse to use LIKE and replace it with the comparison operation for equality, because I needed a substring search. Then I stumbled upon this article , it suggested replacing the LIKE expression with the operations> = and <+ Yaya symbols (zzz if the word base is in English) in the second expression.
/* Было */
dictionary.word LIKE 'абакан%'
/* Стало */
dictionary.word >= 'абакан' AND dictionary.word < 'абаканяяя'
With this optimization, the index is taken into account during the search, and all words starting with 'abakan' are found. The execution time is only 0.5 seconds! Of course, there are limitations, you cannot implement a substring search from the beginning of a line, i.e. find words starting with any characters and ending with 'abakan', but in my particular case it was not necessary.