Topic: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.
The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.
Which solutions will meet these requirements? (Choose two.)

A.
Create an AWS Glue partition index. Enable partition filtering.
B.
Bucket the data based on a column that the data have in common in a WHERE clause of the user query.
C.
Use Athena partition projection based on the S3 bucket prefix.
D.
Transform the data that is in the S3 bucket to Apache Parquet format.
E.
Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

I guess A / C, beucase we faced with - query plans performance bottleneck, so indexing should be improved

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

A. Creating an AWS Glue partition index and enabling partition filtering can help improve query performance by allowing Athena to prune unnecessary partitions from the query plan. This can reduce the number of partitions that need to be scanned, resulting in faster query planning times.

C. Athena partition projection allows you to define a partition scheme based on the S3 bucket prefix. This can help reduce the number of partitions that need to be scanned, as Athena can use the prefix to determine which partitions are relevant to the query. This can also help improve query performance and reduce planning times.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

The right answer is BD

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

A. Create an AWS Glue partition index. Enable partition filtering.
Targeted Optimization: Partition indexes within the Glue Data Catalog help Athena efficiently identify the relevant partitions, significantly reducing query planning time. Partition filtering further refines the search during query execution.
D. Transform the data that is in the S3 bucket to Apache Parquet format.
Efficient Columnar Format: Parquet's columnar storage and built-in metadata often allow Athena to skip over large portions of data irrelevant to the query, leading to faster query planning and execution.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

Keyword: Athena query planning time

See explanation in the link:
https://www.myexamcollection.com/Data-Engineer-Associate-vce-questions.htm

B & D are related to analytical queries performance, not about "query planning" performance.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

Just finished the exam and I went with AD. I agree with GiorgioGss, but the reason why I picked A over C was becaues the table is already using Glue catalog.
If we use the indexes, there's no reason to use C as we already have the partitions indexed.
No reason to pick B if we have C selected.
Thus I picked D with this to optimize the query e.g. if I'm only selecting a subset of the columns.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

Strange questions.... it can be ABCD

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

If your table stored in an AWS Glue Data Catalog has tens and hundreds of thousands and millions of partitions, you can enable partition indexes on the table. With partition indexes, only the metadata for the partition value in the query’s filter is retrieved from the catalog instead of retrieving all the partitions’ metadata. The result is faster queries for such highly partitioned tables. The following table compares query runtimes between a partitioned table with no partition indexing and with partition indexing. The table contains approximately 100,000 partitions and uncompressed text data. The orders table is partitioned by the o_custkey column.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
Optimizing Partition Processing using partition projection
Processing partition information can be a bottleneck for Athena queries when you have a very large number of partitions and aren’t using AWS Glue partition indexing. You can use partition projection in Athena to speed up query processing of highly partitioned tables and automate partition management. Partition projection helps minimize this overhead by allowing you to query partitions by calculating partition information rather than retrieving it from a metastore. It eliminates the need to add partitions’ metadata to the AWS Glue table.

Re: AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36

https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/