While copying data from RDS to Redshift..
To avoid data loss, start the ‘Incremental copy template’ before the ‘Full copy’
A sample implementation can be,
Incremental copy scheduled start time - 1:50 PM
Full copy start time - 2:00 PM A DB Insert - 2:10 PM Full copy End Time - 4:00 PM
A DB Insert - 4:05 PM
Incremental copy First run - 4:10 PM
In the above example, the contents of first DB Insert at 2:10 may or may not be included in FULL copy. Contents of the second insert will not be included in Full copy.
How to ensure that these new inserts will show up in Redshift database ?
As the ‘Incremental copy template’ uses TIME SERIES scheduling, the actual ‘Incremental copy activity’ run wont start at scheduled start time(1:50), rather it will start and the end of scheduled start time(4:10). All the DB changes between ‘scheduled start date/time’ and ‘first run of the actual copy activity’ will be copied to redshift. So, the first incremental copy run will copy all new DB inserts between 1:50 PM and 4:10 PM to redshift. This includes the contents of two DB inserts which are happening during/after FULL copy activity.