Re: Professional Data Engineer topic 1 question 21

A is best choice. D doesn't make sense.

Re: Professional Data Engineer topic 1 question 21

A is incorrect. how can you find duplicates if you assign a unique id to every record? The answer is either B or D. I first selected B, but reading through the answers D may be better.

Re: Professional Data Engineer topic 1 question 21

you cannot deduplicate data adding a random guid, with guid row is distinct than others

Re: Professional Data Engineer topic 1 question 21

Hard question.
It's a *proprietary* system. Who guarantees we can even add a GUID?
But if you can, it's definitely more efficient than calculating hashes (ignoring timestamp).

Re: Professional Data Engineer topic 1 question 21

As Dg63 wrote.

Re: Professional Data Engineer topic 1 question 21

Just asked Chatgpt, it gave me option D

Re: Professional Data Engineer topic 1 question 21

Answer B:
Option A: GUIDs can deduplicate the data but is expensive and good for multiple data processing.
Option B: Using hash function to authenticate the unique rows, this function can be applied directly in bigquery.
Option D, is complex and more expensive.

``
`CREATE TEMP FUNCTION hashValue(input STRING) AS (
  CAST(FARM_FINGERPRINT(input) AS STRING)
);
``

Re: Professional Data Engineer topic 1 question 21

A is prefereed way to generate unique identifier compared to hashing/indexing.