Mock-Test Window

1 :-

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions: 1. Group the individual images into a set of larger files 2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming. Which data serialization system gives the flexibility to do this?

Csv Xml Html Avro Sequence Files Json

2 :-

During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the intermediate data of each Map Task?

The Mapper Stores The Intermediate Data On The Node Running The Job’s Applicationmaster So That It Is Available To Yarn Shuffleservice Before The Data Is Presented To The Reducer B. The Mapper Stores The Int The Mapper Stores The Intermediate Data In Hdfs On The Node Where The Map Tasks Ran In The Hdfs /usercache/&(user)/apache/application_&(appid) Directory For The User Who Ran The Job The Mapper Transfers The Intermediate Data Immediately To The Reducers As It Is Generated By The Map Task . Yarn Holds The Intermediate Data In The Nodemanager’s Memory (a Container) Until It Is Transferred To The Reducer The Mapper Stores The Intermediate Data On The Underlying Filesystem Of The Local Disk In The Directories Yarn.nodemanager.locak-difs

3 :-

Your cluster has the following characteristics: ? A rack aware topology is configured and on ? Replication is set to 3 ? Cluster block size is set to 64MB Which describes the file read process when a client application connects into the cluster and requests a 50MB file?

The Client Queries The Namenode For The Locations Of The Block And Reads All Three Copie The First Copy To Complete Transfer To The Client Is The One The Client Reads As Part Of Hadoop’s Speculative Execution Framework The Client Queries The Namenode For The Locations Of The Block And Reads From The First Location In The List It Receives The Client Queries The Namenode For The Locations Of The Block And Reads From A Random Location In The List It Receives To Eliminate Network I/o Loads By Balancing Which Nodes It Retrieves Data From Any Given Time The Client Queries The Namenode Which Retrieves The Block From The Nearest Datanode To The Client Then Passes That Block Back To The Client

4 :-

You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swapping is occurring?(Select all that apply)

Free Df Memcat Top Jps Vmstat Swapinfo

5 :-

You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?

. Sample The Web Server Logs Web Servers And Copy Them Into Hdfs Using Curl Ingest The Server Web Logs Into Hdfs Using Flume Channel These Clickstreams Into Hadoop Using Hadoop Streaming Import All User Clicks From Your Oltp Databases Into Hadoop Using Sqoop Write A Mapreeeduce Job With The Web Servers For Mappers And The Hadoop Cluster Nodes For Reducers

6 :-

You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

7 :-

Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn.site.xml has the following configuration: yarn.nodemanager.resource.memory-mb 32768 yarn.nodemanager.resource.cpu-vcores 12 You want YARN to launch no more than 16 containers per node. What should you do?

Modify Yarn-site.xml With The Following Property:yarn.scheduler.minimum-allocation-mb2048 Modify Yarn-sites.xml With The Following Property:yarn.scheduler.minimum-allocation-mb4096 Modify Yarn-site.xml With The Following Property:yarn.nodemanager.resource.cpu-vccores No Action Is Needed: Yarn’s Dynamic Resource Allocation Automatically Optimizes The Node Memory And Cores

8 :-

Your Hadoop cluster is configuring with HDFS and MapReduce version 2 (MRv2) on YARN. Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a functional cluster?

. Ye The Daemon Will Receive Data From The Namenode To Run Map Tasks . Ye The Daemon Will Get Data From Another (non-local) Datanode To Run Map Tasks Ye The Daemon Will Receive Map Tasks Only The Daemon Will Receive Reducer Tasks Only

9 :-

You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions?

Oozie Zookeeper Hbase Sqoop Hue

10 :-

Which two features does Kerberos security add to a Hadoop cluster?

User Authentication On All Remote Procedure Calls (rpcs) Encryption For Data During Transfer Between The Mappers And Reducers Encryption For Data On Disk (“at Rest”) Authentication For User Access To The Cluster Against A Central Server Root Access To The Cluster For Users Hdfs And Mapred But Non-root Access For Clients

11 :-

Your cluster’s mapred-start.xml includes the following parameters mapreduce.map.memory.mb 4096 mapreduce.reduce.memory.mb 8192 And any cluster’s yarn-site.xml includes the following parameters yarn.nodemanager.vmen-pmen-ration 2.1 What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container?

4 Gb 17.2 Gb 8.9 Gb 8.2 Gb 24.6 Gb

12 :-

You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?

Run Hdfs Fs –du / And Locate The Dfs Remaining Value Run Hdfs Dfsadmin –report And Locate The Dfs Remaining Value Run Hdfs Dfs / And Subtract Ndfs Used From Configured Capacity Connect To Http://mynamenode:50070/dfshealth.jsp And Locate The Dfs Remaining Value

13 :-

You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?

. For A 1gb Child Heap Size An Io.sort.mb Of 128 Mb Will Always Maximize Memory To Disk I/o Increase The Io.sort.mb To 1gb Decrease The Io.sort.mb Value To 0 Tune The Io.sort.mb Value Until You Observe That The Number Of Spilled Records Equals (or Is As Close To Equals) The Number Of Map Output Records

14 :-

Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs?

Complexity Fair Scheduler (cfs). Capacity Scheduler Fair Scheduler Fifo Scheduler

15 :-

You are running Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undeselected?

Hdfs Is Almost Full The Namenode Goes Down A Datanode Is Disconnected From The Cluster Map Or Reduce Tasks That Are Stuck In An Infinite Loop Mapreduce Jobs Are Causing Excessive Memory Swaps

16 :-

Which YARN daemon or service monitors a Controller’s per-application resource using (e.g., memory CPU)?

Applicationmaster Nodemanager Applicationmanagerservice Resourcemanager

17 :-

Which YARN process run as “container 0” of a submitted job and is responsible for resource qrequests?

Applicationmanager Jobtracker Applicationmaster Jobhistoryserver Resouremanager Nodemanager

18 :-

Which command does Hadoop offer to discover missing or corrupt HDFS data?

Hdfs Fs –du Hdfs Fsck Dskchk The Map-only Checksum Hadoop Does Not Provide Any Tools To Discover Missing Or Corrupt Data; There Is Not Need Because Three Replicas Are Kept For Each Data Block

19 :-

Table schemas in Hive are:

Stored As Metadata On The Namenode Stored Along With The Data In Hdfs Stored In The Metadata Stored In Zookeeper

20 :-

You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?

Hadoop admin Quiz

🏅