How to answer this question. Answer in essay format, your answer should include: 1) How you would answer the question.

How to answer this question.

Answer in essay format, your answer should include:

1) How you would answer the question. This should consider the following, but your answer should

use sentences, not bullet points:

a) What data are you using (files and fields or calculated fields)

b) What records are you including or excluding from your calculation? This should be data driven,

so what profiling would you do to decide what records to use?

c) What transformations would you need to do?

d) What are the DataFrames you are going to create to answer the question (what they

contain, not their names). What joins are required, what aggregations, etc.

If it’s easier to describe any of the above using Spark or SQL terminology, that’s OK, but do not

write code. You need to be able to describe to a manager what you are doing and why.

e) Say how you would show your result. Draw a graph to illustrate a possible answer. This should be

neatly drawn and described – what is it telling the reader?

f) Identify any issues specific to your question/answer and possible solutions to those issues (that

you could implement). Discuss the limitations of your analysis. For this part, do NOT list issues

such as: some reviews could be fake, we don’t know why users write reviews, businesses could be

paying for reviews, people may not write what they think, etc. Those are generic issues about

Yelp’s business model and you could not address them.

Question One:

There is research to suggest that reviewers (not just on Yelp) are influenced by the reviews posted by

other reviewers. This would suggest that possibly as a business has more reviews, later reviewers would

tend to review closer to the average for that business (e.g., a Yelp reviewer was going to rate a business as

a “2”, but saw the average for that business after 50 reviews is 4.5 and decides to rate them closer to their

average). In this question you will focus on restaurants. As restaurants have more reviews, does the

range of those reviews change? For example, are the first 10 reviews at a restaurant widely dispersed

across the ratings 1‐5, but the next ten get narrower (e.g., 2‐3 stars), and the next 10 are narrower still.

Your answer should consider as many restaurants as you can (if you are eliminating some restaurants, say

why). However, it’s the range at a business that matters (e.g., a bad restaurant may have a range of 1‐2

stars and a great restaurant may have a range of 4‐5 stars. Both would have a 2‐star range).

Question Two:

Any review can be voted on by other users as being funny, useful, or cool. Assume your teammate has a

hypothesis that people find negative reviews funnier, so if a Yelper was going to review a business where

they had a bad experience, they may try to be funnier than the prior negative reviews for that business. In

this question you are going to focus on businesses in the shopping category, and in particular, you are

exploring whether businesses that are rated lower have a higher share of reviews that are funny than

businesses that are highly rated.

Your answer should consider as many businesses in the shopping category as you can (if you are

eliminating some businesses, say why). Keep in mind that it’s the number of funny reviews or votes

compared to total reviews for a business that matters. If one business had 10 reviews and another

business had 100 reviews, there would likely be more funny reviews in total for the business with 100


Question Three:

Some users have lots of fans (followers), and others do not. Possibly this could be related to the number

of “useful” votes a Yelper’s reviews have earned. Any Yelper can vote on whether a review by another

Yelper was useful. In this question you are looking at whether the number of fans a user has is related

more to (a) the number of useful votes on their most useful review (the one with the most useful votes),

or (b) the average number of useful votes across their reviews.

Your answer needs to keep in mind what data we have. We have all of the reviews for 10 metro areas and

all of the users who wrote those reviews. For many users we do NOT have every review that user wrote.

  • Attachment 1
  • Attachment 2
  • Attachment 3
  • Attachment 4
  • Attachment 5
Calculate your essay price
(550 words)

Approximate price: $22

How it Works


It only takes a couple of minutes to fill in your details, select the type of paper you need (essay, term paper, etc.), give us all necessary information regarding your assignment.


Once we receive your request, one of our customer support representatives will contact you within 24 hours with more specific information about how much it'll cost for this particular project.


After receiving payment confirmation via PayPal or credit card – we begin working on your detailed outline, which is based on the requirements given by yourself upon ordering.


Once approved, your order is complete and will be emailed directly to the email address provided before payment was made!