Batch Processing Queue Resource Usage


Before using features of the Batch Data Processing service in the following scenarios, you need to request for the Batch Processing - Queue resource on the EnOS Management Console.

  • Using HiveSQL or submitting Hadoop yarn jobs in Python or Shell task nodes of batch processing workflows.
  • Using data synchronization task nodes in batch processing workflows for synchronizing structured data, and the data source or target of the synchronization task is HIVE.
  • In existing structured data synchronization tasks or batch processing workflows that contain structured data synchronization task nodes, if no batch processing queue is specified, the system will use the default queue. However, the queue resource availability cannot be guaranteed. You need to update the configuration of the data synchronization tasks or batch processing workflows to specify queue resource.
  • When batch processing workflows contain Python or Shell task nodes, if no batch processing queue is specified (in existing or new workflows), the system will use the default queue. However, the queue resource availability cannot be guaranteed. You need to update the configuration of the batch processing workflows to specify queue resource.


In EnOS 2.2.0, if no batch processing queue is specified in the above scenarios, the system will use the default queue resource provided by the platform, which is shared by all OUs. Tasks may report running errors caused by insufficient queue resource or by queue resource not found, so the business stability cannot be guaranteed. Please update the configuration of your batch processing workflows to manually specify the queue resource that is requested through EnOS Resource Management service before EnOS is upgraded to the 2.3.0 version.


For information about how to request for the Batch Processing - Queue resource and how to configure the queue resource in EnOS Batch Processing and Data Synchronization services, see the following steps:

Requesting for Batch Processing Queue Resource

Log in to the EnOS Management Console and click Resource Management > Resource List from the left navigation panel. Under the Enterprise Data Platform tab, click the Request Resource button for the Batch Processing - Queue resource. See the following screen capture:

../_images/requesting_queue_resource.png

Configuring Data Synchronization Tasks

In the Scheduling Config panel of data synchronization tasks, specify the requested queue resource. See the following screen capture:

../_images/configuring_data_sync_task.png

Configuring Batch Processing Workflows

In the Scheduling Config panel of batch processing workflows, specify the requested queue resource. See the following screen capture:

../_images/configuring_workflow.png

Using Queue Resource in Shell / Python Task Nodes

Example for Specifying Queue in Shell Script

canaanhive -str  "set mapred.job.queue.name=root.xxx;insert into tablename vaules() ......"

Example for Specifying Queue in Python Code

hive.execute('''set mapreduce.job.queuename=root.xxx''')

rc= hive.execute('''create table t1 as select * from t2''')

rs=hive.executeQuery('''select * from t1''')

Extra Notes

In the EnOS 2.3.0, when data synchronization task nodes in batch processing workflows are for synchronizing structured data, and the data source or target of the synchronization task is HIVE, the queue resource configuration will be required. Meanwhile, the original configuration in the Scheduling Config panel will be removed. Workflows with batch processing queue resources specified in EnOS 2.2.0 will be seamlessly migrated when EnOS is upgraded to version 2.3.0. However, workflows with no batch processing queue resources specified will not be automatically migrated when EnOS is upgraded to version 2.3.0.


Based on the above description, conclusions are as follows:

  1. For existing batch processing workflows with data synchronization task nodes for synchronizing structured data, and the data source or target of the synchronization task is HIVE, please specify the queue resources that are requested through EnOS Resource Management in the Scheduling Config panel of the workflows.
  2. For existing batch processing workflows with data synchronization task nodes, if the data source or target of the synchronization task is not HIVE, you do not need to consider using batch processing queue resources.
  3. When batch processing workflows run HiveSQL through Shell or Python task nodes, please specify the queue resources that are requested through EnOS Resource Management in HiveSQL.


For any questions or feedback, please contact EnOS Product Team.