We recommend switching to the latest versions of Edge, Firefox, Chrome or Safari. Using Internet Explorer will result in a loss of website functionality.

JDBC write performance to Azure Synapse Analytics (MPP database)

Comments

1 comment

  • Avatar
    Adrian Williams

    Inserting data into Azure Synapse database from an external source can be a relatively slow process. 

    Adding the EnableBulkLoad=true option in the connection string manually enables the bulk insert functionality that bypasses the data parsing that is normally done when data is inserted into the database, which results in the performance gain you experienced in your tests.

    You can also leverage the bulk insert functionality with the JDBC Execute by using parameterized inserts e.g. INSERT INTO MYTABLE (Cust_ID, Cust_First_Name, Cust_Last_Name) VALUES (?, ?, ?)

    In this case the SqlQueryFieldBindings property would specify the parameterized input fields to be inserted into the table.

     

    To optimize the loading speed you should mimimize the number of load jobs running concurrently. 

    You may be able to increase the loading speed by loading your data into a staging table and as a second step move the data from the staging table into the data warehouse table. The staging table should be defined to use use round-robin distribution rather than hash distribution.

    If you do not need to encrypt your data you could consider disabling ColumnEncryption, which can improve the overall performance of the data load (and query). Anecdotal comments also indicate the type of encryption used can also impact load performance. Inserts may also be slowed down by database triggers.

    As an alternative to loading large data sets directly, some online discussions suggest overall performance may be improved by first loading data into Azure blob storage and then copy the data into Azure Synapse Analytics.

    0
    Comment actions Permalink

Please sign in to leave a comment.