Big Data Analytics Question Papers Collection – sem 7

Paper 01

Duration: 3 Hours
Total Marks: 80


  1. Question 1 is compulsory
  2. Answer any three out of the remaining five questions
  3. Assume any suitable data wherever required and justify the same


Q1. (20 Marks)

  • a) Explain how big data problems are handled by Hadoop system. [5]
  • b) Mention four characteristics of big data and explain in detail. [5]
  • c) List and explain the core business drivers behind the NoSQL movement. [5]
  • d) Explain the concept of bloom filter with an example. [5]

Q2. (20 Marks)

  • a) What is graph store? Give an example where a graph store can be used to effectively solve a particular business problem. [10]
  • b) Write a map reduce pseudo code for word count problem. Illustrate with an example showing all the steps. [10]

Q3. (20 Marks)

  • a) Suppose the stream is S = {4, 2, 5, 9, 1, 6, 3, 7}. Let hash functions h(x) = 3x + 7mod 32 for some a and b, treat result as a 5-bit binary integer. Show how the Flajolet-Martin algorithm will estimate the number of distinct elements in this stream. [10]
  • b) Describe applications of data visualization. [10]

Q4. (20 Marks)

  • a) Explain selection and projection relational algebraic operation using MapReduce. [10]
  • b) Explain DGIM algorithm for counting ones in a stream with example. [10]

Q5. (20 Marks)

  • a) Determine communities for the given social network graph using Girvan-Newman algorithm. [10]
  • b) Consider the following data frame:
course id class marks
1      11  1    56
2      12  2    75
3      13  1    48
4      14  2    69
5      15  1    84
6      16  2    53

i. Create a subset of course less than 5 by using [ ] brackets and demonstrate the output.
ii. Create a subset where the course column is less than 4 or the class equals to 1 by using subset () function and demonstrate the output. [10]

Q6. (20 Marks)

  • a) Write a script to create a dataset named data1 in R containing the following text:
    • Text: 2, 3, 4, 5, 6.7, 7, 8.1, 9
    • Explain the various functions provided by R to combine different sets of data. [10]
  • b) Describe collaborative filtering in recommendation system. [10]

Paper 02

Duration: 3 Hours
Total Marks: 80


  1. Q.1 is compulsory
  2. Attempt any three from the remaining
  3. Assume suitable data


Q1. (20 Marks)

  • a) Explain Edit distance measure with an example. [5]
  • b) When it comes to big data how NoSQL scores over RDBMS. [5]
  • c) Give difference between Traditional data management and analytics approach Versus Big data Approach [5]
  • d) Give Applications of Social Network Mining [5]

Q2. (20 Marks)

  • a) What is Hadoop? Describe HDFS architecture with diagram. [10]
  • b) Explain with block diagram architecture of Data stream Management System. [10]

Q3. (20 Marks)

  • a) What is the use of Recommender System. How is classification algorithm used in recommendation system. [10]
  • b) Explain the following terms with diagram [10]
    1. Hubs and Authorities
    2. Structure of the Web

Q4. (20 Marks)

  • a) What do you mean by Counting Distinct Elements in a stream. Illustrate with an example working of an Flajolet-Martin Algorithm used to count number of distinct elements. [10]
  • b) Explain different ways by which big data problems are handled by NoSQL. [10]

Q5. (20 Marks)

  • a) Describe Girwan-Newman Algorithm. For the following graph show how the Girvan Newman algorithm finds the different communities. [10]
  • b) What is the role of JobTracker and TaskTracker in MapReduce. Illustrate Map Reduce execution pipeline with Word count example. [10]

Q6. (20 Marks)

  • a) Compute the page rank of each page after running the PageRank algorithm for two iterations with teleportation factor Beta (β) value = 0.8 [10]
  • b) What are the challenges in clustering of Data streams. Explain stream clustering algorithm in detail. [10]

Paper 03

Duration: 3 Hours
Total Marks: 80


  1. Question 1 is compulsory
  2. Answer any three out of the remaining five questions
  3. Assume any suitable data wherever required and justify the same


Q1. (20 Marks)

  • a) Distinguish between Name node and Data node. [5]
  • b) List and explain the core business drivers behind the NoSQL movement. [5]
  • c) Mention four characteristics of big data. Elaborate these characteristics with respect to social media websites. [5]
  • d) List and explain the different issues and challenges in data stream query processing. [5]

Q2. (20 Marks)

  • a) What is a key-value store? What are the benefits of using a key-value store? [10]
  • b) Write a map reduce pseudo code to multiply two matrices. Apply map reduce working to perform following matrix multiplication. [10]
1 2      6 7
3 4      8 9

Q3. (20 Marks)

  • a) Suppose the stream is S = {2, 1, 6, 1, 5, 9, 2, 3, 5}. Let hash functions h(x) = ax + b mod 16 for some a and b, treat result as a 4-bit binary integer. Show how the Flajolet-Martin algorithm will estimate the number of distinct elements, h(x) = 4x + 1 mod 16. [10]
  • b) Consider the following data frame: [10]
course id class marks
1      11  1    56
2      12  2    75
3      13  1    48
4      14  2    69
5      15  1    84
6      16  2    53
  • i. Create a subset of course less than 3 by using [ ] brackets and demonstrate the output.
    ii. Create a subset where the course column is less than 3 or the class equals to 2 by using subset () function and demonstrate the output.

Q4. (20 Marks)

  • a) Explain natural join and grouping and aggregation relational algebraic operation using MapReduce. [10]
  • b) With a neat sketch, explain the architecture of the data-stream management system. [10]

Q5. (20 Marks)

  • a) Determine communities for the given social network graph using Girvan-Newman algorithm. [10]
  • b) List and discuss various types of data structures in R. [10]

Q6. (20 Marks)

  • a) i. The following table shows the number of units of different products sold on different days: [10]
Product    Monday Tuesday Wednesday Thursday Friday
Bread      12     3       5         11       9
Milk       21     27      18        20       15
Cola Cans  10     1       33        6        12
Chocolate  6      7       4         13       12
Detergent  5      8       12        20       23
  • Create five sample numeric vectors from this data.
  • ii. Name and explain the operators used to form data subsets in R.
  • b) Define collaborative filtering. Using an example of an e-commerce site like flipkart or amazon describe how it can be used to provide recommendation to users. [10]

This account on is managed by the core team of Doubtly.

Articles: 483

jsDelivr CDN plugin by Nextgenthemes

These are the assets loaded from jsDelivr CDN. Do not worry about old WP versions in the URLs, this is simply because the files were not modified. A sha384 hash check is used so you can be 100% sure the files loaded from jsDelivr are the exact same files that would be served from your server.


Level up your video embeds with ARVE or ARVE Pro