For objects with self-reference (aka "Hierarchy") fields, the best method is to load in two steps:
- Load all records but leaving out the ParentId field (remove it from Talend schema).
- Update all records to set ParentId values (map only Id or Ext-Id, and the ParentId).
For 2 objects with cyclical references, the approach is similar:
- Load all records of Object-A, but leaving out the Lookup-B field.
- Load all records of Object-B, including its Lookup-A field.
- Update all records of Object-A to set Lookup-B values.
Note: Your suggestion also works, but only for 2-level hierarchies. You assume that parent records have no ParentId, but in a 3-level hierarchy, the "middle" record is both a parent and a child, so the "where ParentId = null" is not sufficient. The first method I recommended above will work for any # of levels.
An alternative method takes only 1 loading step, but I don't like it because you can't use BulkAPI w/parallel, sorting gets hard for 3+ levels, and you are at mercy of Salesforce honoring the order. But FYI, this method is: 1) Sort the source data so parent records appear first, then the children. 2) Load all data, including the ParentId. Using standard API (not Bulk/Parallel) the data normally loads in same order it appears in your source file, so the parents get created before the children that reference them.
Your other questions belong in a separate question post. But regarding how to ensure referred-to records are included/added in partial-data migrations, I always approach these scenarios by first building a list of distinct Id's (External-Id's) that are needed, and then loading records whose Id is in that list (using Inner Joins). For example, say you want to migrate a sampling of Accounts, only those whose name begins with 'A', plus ensuring all of their parent Accounts (regardless of Name) are also included. I would create a list of all account Id's that begin with 'A', then append to that list all the ParentId's of accounts that begin with 'A' but which are not already in the list. Finally, when loading the Accounts, I would inner-join Accounts to that list. The overhead of building the list seems high for one object, but it pays off because the very same list can be inner-joined to drive which Contacts, Opportunities, Cases, etc you load. Also note, this may be time consuming to build in Talend, but it is super easy in a database with some SQL.
Best Answer
There are so many guides and points to consider while doing data migration, but I shall try to keep it down to testing/validation of data migration only. If you have the full-copy sandbox that is the best thing, and you do migration on it first. Whether you do it on sandbox or production here are few points from my previous experience:
Audit fields (CreatedBy, Createddate, etc) update - Since Winter'16, once enabled, you will be able to set these fields when creating new records.
Use of Vlookup function to match/refer external ids or Salesforce Ids of objects from one sheet to another
I prefer Index-Match over Vlookup (but Vlookup is more common I guess).
For Enterprise and above, you can use free apps from the AppExchange to tell you how complete records are or how much a field is used – Field Trip and AddressTools Free V4 both are free.
Use SOQL effectively to validate data after conversion.