# Cluster upgrade

Doris can upgrade smoothly by rolling upgrades. The following steps are recommended for security upgrade.

Note:

  1. The following approaches are based on highly available deployments. That is, data 3 replicas, FE high availability.

# Preparen

  1. Turn off the replica repair and balance operation.

    There will be node restarts during the upgrade process, so unnecessary cluster balancing and replica repair logic may be triggered. You can close it first with the following command:

    # Turn off the replica ealance logic. After it is closed, the balancing operation of the ordinary table replica will no longer be triggered.
    $ mysql-client> admin set frontend config("disable_balance" = "true");
    
    # Turn off the replica balance logic of the colocation table. After it is closed, the replica redistribution operation of the colocation table will no longer be triggered.
    $ mysql-client> admin set frontend config("disable_colocate_balance");
    
    # Turn off the replica scheduling logic. After shutting down, all generated replica repair and balancing tasks will no longer be scheduled.
    $ mysql-client> admin set frontend config("disable_tablet_scheduler" = "true");
    

    After the cluster is upgraded, just use the above command to set the corresponding configuration to the original value.

# Test the correctness of BE upgrade

  1. Arbitrarily select a BE node and deploy the latest palo_be binary file.
  2. Restart the BE node and check the BE log be.INFO to see if the boot was successful.
  3. If the startup fails, you can check the reason first. If the error is not recoverable, you can delete the BE directly through DROP BACKEND, clean up the data, and restart the BE using the previous version of palo_be. Then re-ADD BACKEND. (This method will result in the loss of a copy of the data, please make sure that three copies are complete, and perform this operation!!!

# Testing FE Metadata Compatibility

  1. Important! Exceptional metadata compatibility is likely to cause data cannot be restored!!
  2. Deploy a test FE process (such as your own local developer) using the new version alone.
  3. Modify the FE configuration file fe.conf for testing and set all ports to different from online.
  4. Add configuration in fe.conf: cluster_id=123456
  5. Add the configuration in fe.conf: metadatafailure_recovery=true
  6. Copy the metadata directory palo-meta of the online environment Master FE to the test environment
  7. Modify the cluster_id in the palo-meta/image/VERSION file copied into the test environment to 123456 (that is, the same as in Step 3)
  8. "27979;" "35797;" "3681616;" sh bin /start fe.sh "21551;" FE
  9. Observe whether the start-up is successful through FE log fe.log.
  10. If the startup is successful, run sh bin/stop_fe.sh to stop the FE process of the test environment.
  11. The purpose of the above 2-6 steps is to prevent the FE of the test environment from being misconnected to the online environment after it starts.

# Upgrade preparation

  1. After data validation, the new version of BE and FE binary files are distributed to their respective directories.
  2. Usually small version upgrade, BE only needs to upgrade palo_be; FE only needs to upgrade palo-fe.jar. If it is a large version upgrade, you may need to upgrade other files (including but not limited to bin / lib / etc.) If you are not sure whether you need to replace other files, it is recommended to replace all of them.

# rolling upgrade

  1. Confirm that the new version of the file is deployed. Restart FE and BE instances one by one.
  2. It is suggested that BE be restarted one by one and FE be restarted one by one. Because Doris usually guarantees backward compatibility between FE and BE, that is, the old version of FE can access the new version of BE. However, the old version of BE may not be supported to access the new version of FE.
  3. It is recommended to restart the next instance after confirming the previous instance started successfully. Refer to the Installation Deployment Document for the identification of successful instance startup.