DataStax Help Center

Specifying the number of concurrent tasks per node

By default, each hadoop or analytics node runs 2 mapper tasks and 2 reducer tasks concurrently.  You'll want to tune this to the capability of your hardware and the type of workloads that you're running.  You can start with around 1.5 total tasks per core, then allocate those among mappers and reducers.  Start with 2/3 concurrent task slots for mappers, 1/3 task slots for reducers.  So for a system with 6 cores, you would have 9 total slots, 6 for mappers, 3 for reducers.

These properties get set per analytics node and require a restart of the DSE process on that node for the changes to take effect.  They cannot be overridden or set per job.  The properties are:

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

For more information on tuning Hadoop, see the O'Reilly book Hadoop Operations, chapter 5: "MapReduce: Optimization and Tuning".

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk