DORSETRIGS
Home

amazon-emr (25 post)


posts by category not found!

EMR Cluster using boto3 - Service Role Insufficient Permissions

EMR Cluster Creation Troubleshooting Service Role Insufficient Permissions with Boto3 Problem You re trying to create an EMR cluster using Boto3 in Python but e

3 min read 04-10-2024 42
EMR Cluster using boto3 - Service Role Insufficient Permissions
EMR Cluster using boto3 - Service Role Insufficient Permissions

AWS EMR - Output from print statement in pyspark job not present in log files

Understanding AWS EMR Why Print Statements in Py Spark Jobs May Not Show Up in Log Files When working with AWS EMR Elastic Map Reduce and Py Spark many develope

3 min read 29-09-2024 42
AWS EMR - Output from print statement in pyspark job not present in log files
AWS EMR - Output from print statement in pyspark job not present in log files

Spark UI not showing details in EMR 7

Troubleshooting Spark UI Not Showing Details in EMR 7 Problem Overview If you re utilizing Amazon EMR Elastic Map Reduce 7 and find that the Spark UI is not dis

2 min read 26-09-2024 45
Spark UI not showing details in EMR 7
Spark UI not showing details in EMR 7

Spark aggregate on multiple columns or a hash

Understanding Spark Aggregate on Multiple Columns or a Hash Apache Spark is a powerful open source engine for big data processing known for its speed and ease o

3 min read 24-09-2024 56
Spark aggregate on multiple columns or a hash
Spark aggregate on multiple columns or a hash

AWS EMR Jupyterhub notebook run fails with error: Session isn't active

Troubleshooting AWS EMR Jupyter Hub Notebook Session Isnt Active Error When working with Jupyter Hub notebooks on AWS EMR Elastic Map Reduce users may encounter

3 min read 18-09-2024 47
AWS EMR Jupyterhub notebook run fails with error: Session isn't active
AWS EMR Jupyterhub notebook run fails with error: Session isn't active

Spark EMR long running transformation job GC is taking more time

Optimizing Spark EMR Long Running Transformation Jobs Dealing with GC Overhead When working with Amazon EMR Elastic Map Reduce for big data processing Spark is

3 min read 16-09-2024 48
Spark EMR long running transformation job GC is taking more time
Spark EMR long running transformation job GC is taking more time

How to enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" using Terraform?

How to Enable Use for Hive Table Metadata in AWS Glue Data Catalog Settings Using Terraform When working with AWS Glue Data Catalog one of the important setting

3 min read 15-09-2024 56
How to enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" using Terraform?
How to enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" using Terraform?

How to read large zip files in pyspark

Unzipping and Processing Large Zip Files in Py Spark A Practical Guide Working with large zip files in Py Spark can be a challenge especially when dealing with

3 min read 05-09-2024 51
How to read large zip files in pyspark
How to read large zip files in pyspark

SPARK on EMR Container from a bad node

Troubleshooting Spark on EMR Container Errors A Case Study This article delves into a common challenge encountered when running Spark applications on AWS EMR cl

2 min read 05-09-2024 72
SPARK on EMR Container from a bad node
SPARK on EMR Container from a bad node

Hive "Show Tables" Fails with MetaException

Troubleshooting Hive Show Tables Errors A Deep Dive Encountering a Meta Exception when attempting to execute show tables in Hive can be frustrating This error o

2 min read 04-09-2024 51
Hive "Show Tables" Fails with MetaException
Hive "Show Tables" Fails with MetaException

Apache Crunch Job On AWS EMR using Oozie

Running Apache Crunch Jobs on AWS EMR with Oozie Troubleshooting Write Issues This article explores a common issue encountered when running Apache Crunch jobs w

3 min read 02-09-2024 49
Apache Crunch Job On AWS EMR using Oozie
Apache Crunch Job On AWS EMR using Oozie

How to use EMR studio notebooks with EMR serverless

Mastering EMR Studio Notebooks with EMR Serverless A Guide to Kernel Selection and Permissions EMR Serverless offers a powerful and cost effective way to run bi

2 min read 02-09-2024 46
How to use EMR studio notebooks with EMR serverless
How to use EMR studio notebooks with EMR serverless

DBT Spark on EMR using AWS Glue Data Catalog

Leveraging DBT Spark with AWS Glue Data Catalog Building a Modern Lakehouse on EMR The world of data warehousing is rapidly shifting towards lakehouse architect

2 min read 02-09-2024 88
DBT Spark on EMR using AWS Glue Data Catalog
DBT Spark on EMR using AWS Glue Data Catalog

Spark emr jobs: Is the number of task defined by AQE (adaptive.enabled)?

Understanding Spark AQE and Task Count on EMR When working with Spark jobs on Amazon EMR understanding how Sparks Adaptive Query Execution AQE impacts task exec

2 min read 02-09-2024 48
Spark emr jobs: Is the number of task defined by AQE (adaptive.enabled)?
Spark emr jobs: Is the number of task defined by AQE (adaptive.enabled)?

ClassCastException in Spark SQL Incremental Load with DBT

Troubleshooting Class Cast Exception in Spark SQL Incremental Load with DBT This article explores a common issue encountered when implementing incremental loads

3 min read 01-09-2024 55
ClassCastException in Spark SQL Incremental Load with DBT
ClassCastException in Spark SQL Incremental Load with DBT

What does retry in SparkUI means?

Understanding Retry in Spark UI A Deep Dive into Task Failures and Adaptive Query Execution When analyzing your Spark applications performance in the Spark UI y

3 min read 01-09-2024 51
What does retry in SparkUI means?
What does retry in SparkUI means?

EMR-Spark Job creating max 1000 partitions/task when AQE is enabled

EMR Spark Jobs Understanding Partition Limits and Adaptive Query Execution AQE Adaptive Query Execution AQE is a powerful feature in Spark that optimizes query

3 min read 01-09-2024 48
EMR-Spark Job creating max 1000 partitions/task when AQE is enabled
EMR-Spark Job creating max 1000 partitions/task when AQE is enabled

Troubleshooting Kafka Integration with Spark Streaming on Amazon EMR Serverless

Troubleshooting Kafka Integration with Spark Streaming on Amazon EMR Serverless This article will dive into the common challenges faced when integrating Kafka w

3 min read 31-08-2024 47
Troubleshooting Kafka Integration with Spark Streaming on Amazon EMR Serverless
Troubleshooting Kafka Integration with Spark Streaming on Amazon EMR Serverless

Spark EMR Shuffle Read Fetch Wait Time is in 4hrs

Decoding the Spark EMR Shuffle Read Fetch Wait Time Nightmare A 4 Hour Delay Solved Have you ever encountered a Spark job that inexplicably takes hours to compl

3 min read 31-08-2024 45
Spark EMR Shuffle Read Fetch Wait Time is in 4hrs
Spark EMR Shuffle Read Fetch Wait Time is in 4hrs

Spark Repartition/shuffle optimization

Optimizing Spark Repartition and Shuffle Operations A Deep Dive Repartitioning data in Apache Spark is a crucial step for parallel processing and efficient data

2 min read 30-08-2024 44
Spark Repartition/shuffle optimization
Spark Repartition/shuffle optimization

Airflow error while creating EMR cluster via DAG

Troubleshooting Invalid Instance Profile Error When Creating EMR Clusters with Airflow Creating EMR clusters within your workflow is a powerful capability but s

3 min read 29-08-2024 43
Airflow error while creating EMR cluster via DAG
Airflow error while creating EMR cluster via DAG

AWS EMR - reading multiple "zip" files from S3 bucket returns Your key is too long

Your key is too long Debugging S3 File Access Issues in AWS EMR When working with large datasets in AWS EMR reading data from S3 buckets is a common operation H

2 min read 28-08-2024 44
AWS EMR - reading multiple "zip" files from S3 bucket returns Your key is too long
AWS EMR - reading multiple "zip" files from S3 bucket returns Your key is too long

Apache oozie JA008 error - job state changed from SUCCEDED to FAILED

Decoding the Apache Oozie JA 008 Error Why Your Successful Job Suddenly Fails Apache Oozie is a powerful workflow engine for managing complex data processing pi

3 min read 28-08-2024 48
Apache oozie JA008 error - job state changed from SUCCEDED to FAILED
Apache oozie JA008 error - job state changed from SUCCEDED to FAILED

Spark-Scala vs Pyspark Dag is different?

Spark Scala vs Py Spark DAG Differences and Performance Variations This article delves into the differences between Spark Scala and Py Spark DAGs Directed Acycl

2 min read 27-08-2024 67
Spark-Scala vs Pyspark Dag is different?
Spark-Scala vs Pyspark Dag is different?

Does spark shuffle/exchange converts compress data to uncompress form?

h1 Does Spark Shuffle Exchange Convert Compressed Data to Uncompressed Form h1 This article will explore the relationship between data compression in Apache Spa

4 min read 27-08-2024 47
Does spark shuffle/exchange converts compress data to uncompress form?
Does spark shuffle/exchange converts compress data to uncompress form?