AWS DMS is a new type of service used to migrate data from different types of database and data-warehouse. I met some problems when trying to use it in production environment.

Problem 1. When using a MySQL server of AWS RDS as the source of a replication task. It reported errors after started the task:

Last failure message
Last Error Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2673] [1020418] Error Code [10001] : Binary Logging must be enabled for MySQL server; Errors in MySQL server binary logging configuration. Follow all prerequisites for 'MySQL as a source in DMS' from https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html or'MySQL as a target in DMS' from https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.MySQL.html ; Failed while preparing stream component 'st_0_WBK5KGUWQAH6VKEP4I5LH2EFHE'.; Cannot initialize subtask; Stream component 'st_0_WBK5KGUWQAH6VKEP4I5LH2EFHE' terminated [reptask/replicationtask.c:2680] [1020418] Stop Reason FATAL_ERROR Error Level FATAL

The failure message looks terrible. But at least I can find this doc to follow. After changed the configurations as below:

binlog_format ROW
binlog_checksum NONE
binlog_row_image FULL

the error still existed.
The real answer is in here since I used RDS instead of self-managed MySQL. After I add one line Terraform code to enable “automatic backups”:

resource "aws_db_instance" "test_gaf" {
  ......
  backup_retention_period     = 10
}

the replication task began to work without the error.

Problem 2. Running replication task for a while to export data from MySQL to AWS Redshift. A new error log appeared in Redshift load logs:

019-10-29T04:41:27 [TARGET_LOAD ]E: RetCode: SQL_ERROR SqlState: XX000 NativeError: 30 Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR: User arn:aws:redshift:us-east-1:262284277472:dbuser:analytics-20190902/masteruser is not authorized to assume IAM Role arn:aws:iam::262284277472:role/dms-access-for-endpoint DETAIL: ----------------------------------------------- error: User arn:aws:redshift:us-east-1:262284277472:dbuser:analytics-20190902/masteruser is not authorized to assume IAM Role arn:aws:iam::262284277472:role/dms-access-for-endpoint code: 8001 context: IAM Role=arn:aws:iam::262284277472:role/dms-access-for-endpoint query: 1799 location: xen_aws_credentials_mgr.cpp:321 process: padbmaster [pid=21755] ----------------------------------------------- [1022502] (ar_odbc_stmt.c:4622)

Why masteruser is not authorized? The answer is here. Below is the Terraform code:

data "aws_iam_policy_document" "dms_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      identifiers = ["dms.amazonaws.com"]
      type        = "Service"
    }
  }

  statement {
    actions = ["sts:AssumeRole"]

    # By https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Security.APIRole.html,     
    # we also need principal `redshift.amazonaws.com`
    principals {
      identifiers = ["redshift.amazonaws.com"]
      type        = "Service"
    }
  }
}

Then I had giiven “dms_assume_role” two Trusty Entities



Problem 3. There was still a error in Redshift load log (so many errors in AWS DMS…):

Error Type Raw Field Value
Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SS] timestamp 0000-00-00 00:00:00

Seems the answer is here. Therefore I added “acceptanydate=true;timeformat=auto” into the “extra connection settings” in Redshift endpoint. But the error just changed to:

Error Type Raw Field Value
Invalid data timestamp 0000-00-00 00:00:00

After searching for almost two days, I found that the reason is in the schema of Redshift, which is automatically created by AWS DMS replication task.

CREATE TABLE my (
    ...
    mydate TIMESTAMP DEFAULT '0000-00-00 00:00:00' NOT NULL,
    ...
)

Since the schema doesn’t allow “mydate” column to be null but the “acceptanydate=true” is trying to transfer “0000-00-00 00:00:00 to null”, the final error is “Invalid data” for Redshift.
The solution for this problem is: create table of Redshift manually to let “mydate” column to be “nullable”, and change the working mode of replication task to “TRUNCATE_BEFORE_LOAD”.