Skip to content
Snippets Groups Projects
  • Jared Hancock's avatar
    60fcf00c
    Crazy performance penalty scanning blob tables · 60fcf00c
    Jared Hancock authored
    When scanning the file_chunk table for orphaned file chunks that can be
    deleted, apparently, MySQL will read (at least part of) the blob data from
    the disk. For databases with lots of large attachments, this can take
    considerable time. Considering that it is triggered from the autocron and
    will run everytime the cron is run, the database will spend considerable
    time scanning for rows to be cleaned.
    
    This patch changes the orphan cleanup into two phases. The first will search
    just for the pk's of file chunks to be deleted. If any are found, then the
    chunks are deleted by the file_id and chunk_id, which is the primary key of
    the table.
    
    The SELECT query seems to run at least 20 times faster than the delete
    statement, and DELETEing against the primary key of the blob table should
    be the fastest possible operation. Somehow, both queries required a full
    table scan; however, because the SELECT statement is explictly only
    interested in two fields, it is more clear to the query optimizer that the
    blob data should not be scanned.
    
    References:
    http://stackoverflow.com/q/9511476
    60fcf00c
    History
    Crazy performance penalty scanning blob tables
    Jared Hancock authored
    When scanning the file_chunk table for orphaned file chunks that can be
    deleted, apparently, MySQL will read (at least part of) the blob data from
    the disk. For databases with lots of large attachments, this can take
    considerable time. Considering that it is triggered from the autocron and
    will run everytime the cron is run, the database will spend considerable
    time scanning for rows to be cleaned.
    
    This patch changes the orphan cleanup into two phases. The first will search
    just for the pk's of file chunks to be deleted. If any are found, then the
    chunks are deleted by the file_id and chunk_id, which is the primary key of
    the table.
    
    The SELECT query seems to run at least 20 times faster than the delete
    statement, and DELETEing against the primary key of the blob table should
    be the fastest possible operation. Somehow, both queries required a full
    table scan; however, because the SELECT statement is explictly only
    interested in two fields, it is more clear to the query optimizer that the
    blob data should not be scanned.
    
    References:
    http://stackoverflow.com/q/9511476