Re: Professional Data Engineer topic 1 question 21
A is best choice. D doesn't make sense.
IT Certification exam information exchange, brain dumps discussions sharing.
You are not logged in. Please login or register.
Share Test → Google Certifications → Professional Data Engineer topic 1 question 21
A is best choice. D doesn't make sense.
A is incorrect. how can you find duplicates if you assign a unique id to every record? The answer is either B or D. I first selected B, but reading through the answers D may be better.
you cannot deduplicate data adding a random guid, with guid row is distinct than others
Hard question.
It's a *proprietary* system. Who guarantees we can even add a GUID?
But if you can, it's definitely more efficient than calculating hashes (ignoring timestamp).
As Dg63 wrote.
Just asked Chatgpt, it gave me option D
Answer B:
Option A: GUIDs can deduplicate the data but is expensive and good for multiple data processing.
Option B: Using hash function to authenticate the unique rows, this function can be applied directly in bigquery.
Option D, is complex and more expensive.
``
`CREATE TEMP FUNCTION hashValue(input STRING) AS (
CAST(FARM_FINGERPRINT(input) AS STRING)
);
``
A is prefereed way to generate unique identifier compared to hashing/indexing.
Share Test → Google Certifications → Professional Data Engineer topic 1 question 21
Note: This forum is a platform for users to share insights and discuss exam-related topics. We do not provide authentic exam questions or answers. The content here is contributed by community members and is meant for collaborative learning and discussion purposes only. Users are encouraged to refer to official sources for accurate exam materials.