Professional Data Engineer topic 1 question 21 (Page 2) — Google Certifications

Professional Data Engineer topic 1 question 21 (Page 2 of 2)

Share Test → Google Certifications → Professional Data Engineer topic 1 question 21

Pages Previous 1 2

You must login or register to post a reply

RSS topic feed

Posts: 26 to 33 of 33

26 Reply by boca_2022 2024-03-28 15:25:04

boca_2022
New member
Offline

Re: Professional Data Engineer topic 1 question 21

A is best choice. D doesn't make sense.

27 Reply by FP77 2024-03-28 16:41:30

FP77
New member
Offline

Re: Professional Data Engineer topic 1 question 21

A is incorrect. how can you find duplicates if you assign a unique id to every record? The answer is either B or D. I first selected B, but reading through the answers D may be better.

28 Reply by Melampos 2024-03-28 16:51:26

Melampos
New member
Offline

Re: Professional Data Engineer topic 1 question 21

you cannot deduplicate data adding a random guid, with guid row is distinct than others

29 Reply by juliobs 2024-03-28 17:59:43

juliobs
New member
Offline

Re: Professional Data Engineer topic 1 question 21

Hard question.
It's a *proprietary* system. Who guarantees we can even add a GUID?
But if you can, it's definitely more efficient than calculating hashes (ignoring timestamp).

30 Reply by tibuenoc 2024-03-28 20:09:33

tibuenoc
New member
Offline

Re: Professional Data Engineer topic 1 question 21

As Dg63 wrote.

31 Reply by AshokPalle 2024-03-28 21:22:26

AshokPalle
New member
Offline

Re: Professional Data Engineer topic 1 question 21

Just asked Chatgpt, it gave me option D

32 Reply by musumusu 2024-03-28 23:46:30

musumusu
New member
Offline

Re: Professional Data Engineer topic 1 question 21

Answer B:
Option A: GUIDs can deduplicate the data but is expensive and good for multiple data processing.
Option B: Using hash function to authenticate the unique rows, this function can be applied directly in bigquery.
Option D, is complex and more expensive.

``
`CREATE TEMP FUNCTION hashValue(input STRING) AS (
CAST(FARM_FINGERPRINT(input) AS STRING)
);
``

33 Reply by techtitan 2024-03-29 01:26:21

techtitan
New member
Offline

Re: Professional Data Engineer topic 1 question 21

A is prefereed way to generate unique identifier compared to hashing/indexing.

Posts: 26 to 33 of 33

Pages Previous 1 2

You must login or register to post a reply

Share Test → Google Certifications → Professional Data Engineer topic 1 question 21