PREPARE DATA FOR EXPLORATION WEEKLY CHALLENGE 1

1. A data analyst is working on an urgent traffic study. As a result of the short time frame, which type of data are they most likely to use?

Unclean
Theoretical
Personal
Historical

Explanation: It is quite possible that the data analyst will make use of real-time or near-real-time data while conducting an urgent traffic study in which time is a particularly important component. It is possible for the analyst to make judgments that are both timely and well-informed when using this sort of data since it gives the most recent information on the circumstances of the traffic. This might include traffic sensors, GPS data from automobiles, or live feeds from traffic cameras. Real-time data sources could also include satellite imagery. The analyst is able to react swiftly to the present traffic condition and handle the urgent nature of the research when they make use of such data.

2. Which of the following is an example of continuous data?

Movie budget.
Movie run time.
Leading actors in movie.
Box office returns.

3. Nominal qualitative data has a set order or scale.

True
False

Explanation: In point of fact, nominal qualitative data does not have a predetermined or predetermined scale. There is a form of categorical data known as nominal data. In nominal data, the categories reflect different groups or labels, but these categories do not have any intrinsic order or ranking.

For instance, various colors (such as red, blue, and green) or categories of fruit (such as apple, orange, and banana) are examples of nominal data. There is no logical order or scale among these categories; on the contrary, they are only different names.

Ordinal data, on the other hand, is a sort of categorical data in which the categories are arranged in a meaningful order or ranking. As an example of ordinal data, consider a survey question that allows respondents to choose their answer from the following options: "strongly disagree," "disagree,"

"neutral," "agree," and "strongly agree." In this particular instance, the categories are arranged in a logical fashion.

4. Which of the following is a benefit of internal data?

Internal data is less likely to need cleaning.
Internal data is less vulnerable to biased collection.
Internal data is the only data relevant to the problem.
Internal data is more reliable and easier to collect.

Explanation: Internal data is data that is created inside an organization, and it offers a high degree of customisation and specificity to the procedures, activities, and objectives of the business. This data is adapted to the particular requirements and circumstances of the company, which enables in-depth analysis and decision-making to achieve the desired results.

5. Structured data is likely to be found in which of the following formats? Select all that apply.

Audio file
Digital photo
Spreadsheet
Table

6. Which of the following values are examples of a Boolean data type? Select all that apply.

Yes, no, or unsure
Yes or no
One, two, or three
True or false

7. The following is a selection from a spreadsheet:

Table
Narrow
Wide
Long
Short

8. Data transformation can change the structure of the data. An example of this is taking data stored in one format and converting it to another.

True
False

Explanation: Absolutely! Changing the structure or format of data is an example of data transformation. Converting data from one format to another is a frequent example of data transformation. This may include transforming data from one data type to another, such as translating text data to numerical format, or converting data from a raw, unprocessed state to a format that is more structured and useable. For example, converting a CSV file to a relational database is an example of this. One of the most important steps in the process of preparing data is called data transformation. This stage ensures that the data are ready for analysis and reporting.

9. Which of the following questions collect nominal qualitative data? Select all that apply.

Have you heard of our frequent diner program?
How likely are you to recommend this restaurant to a friend?
Is this your first time dining at this restaurant?
Did anyone recommend our restaurant to you today?

10. A social media post is an example of structured data.

True
False

Explanation: Unstructured data is frequently represented by a social media post, an example of this kind of data. The data that is considered to be unstructured does not possess a data model that has been pre-defined and needs to be arranged in a manner that is readily understandable by computers.

Posts on social media platforms often consist of various components, including text, photographs, videos, emoticons, and other features. Even though the post may have some aspects of structure, such as a date, a username, and text, the format as a whole is somewhat flexible and needs to be arranged in a tabular or highly structured fashion.

On the other hand, structured data is arranged in a particular manner, such as tables in a relational database or rows and columns in a spreadsheet. This kind of data is differentiated from unstructured data. Examples of structured data include spreadsheets, databases, and CSV files, among other data types.

11. A Boolean data type must have a numeric value.

True
False

Explanation: This is not always the case. There are normally two values that are represented by a Boolean data type: true and false. In certain computer languages or systems, true and false may be internally represented as numeric values (1 for true and 0 for false). However, the most important characteristic of Boolean data is its binary nature, which means that there are only two possible values.

Several programming languages allow you to use the keywords "true" and "false" without having to give precise numeric values to them. This is possible in many of these languages. A binary condition is represented by Boolean data, which means that anything is either true or false. This is the most crucial thing to keep in mind.

12. In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?

A specific data type
A unique data variable
A specific constraint
A unique format

Explanation: Wide data normally consists of rows that hold the values for the variables that are represented in the columns, and each column typically represents a variable or a characteristic. Wide data is a kind of data that organizes the information in a horizontal fashion, with each column being devoted to a particular variable. This is in contrast to long data, which is organized in distinct columns that include values and the context for the values (for example, variable names).

13. A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation.

True
False

Explanation: The use of the "Save As" command to convert a file from the Excel format (.XLS) to the Comma-Separated Values (.CSV) format is, in fact, an example of a data transformation. In this particular instance, the data is being converted from one file type to another data format.

For the purpose of the transformation, the data from the spreadsheet will be converted into a plain text format, with commas serving as the separator between each data item. This format is more lightweight than the Excel format, which makes it simpler to transfer and work with in a variety of apps that handle CSV files. It is often used for data interchange and is more commonly used than Excel.

14. If you have a short time frame for data collection and need an answer immediately, you likely will have to use historical data.

True
False

Explanation: This is not always the case. It's possible that using historical data isn't the best choice if you have a limited amount of time to gather data and you need an answer right away. The term "historical data" refers to information that has been gathered over a period of time, and therefore may not be enough for delivering responses in real time or right away.

In circumstances when time is of the essence, you would normally depend on data sources that provide information in real time or near real time in order to get the most recent relevant information. Live data streams, sensors, and other sources that give rapid insights might be included in this category.

Although historical data is useful for analyzing trends, recognizing patterns, and gaining a knowledge of long-term patterns, it may not be the ideal option when information that is both urgent and real-time is required.

15. Continuous data is measured and has a limited number of values.

True
False

Explanation: In point of fact, continuous data is determined by measurement and may take on an endless variety of values within a certain range. When we talk about continuous data, we are referring to measurements that can be broken down into smaller units with a higher degree of accuracy. Additionally, it is theoretically capable of being measured with an unlimited degree of accuracy and can take any value within a range.

For instance, height is an example of continuous data. It is feasible to measure height with a high degree of accuracy, and there is no restriction on the number of alternative values that may be found within a certain range. Discrete data, on the other hand, is counted and consists of a collection of values that are distinct and independent from one another.

16. Internal data is more reliable because it’s clean.

True
False

Explanation: Internal data may offer benefits in terms of accessibility and familiarity; nevertheless, the notion that it is more trustworthy just because it is internal or clean is not always accurate. Internal data may have advantages related to familiarity and accessibility. The dependability of data is contingent upon a number of elements, such as the method by which the data is collected, the procedures that are followed for data management, and the environment in which the data is used.

If appropriate data quality standards are in place and a well-established data governance system is in place, then the data that is collected internally may be considered credible. Nevertheless, it is of the utmost importance to acknowledge that data, regardless of whether it is internal or external, might be subject to problems such as mistakes, biases, or inconsistencies. It is the quality assurance procedures that are used throughout the data collecting, storage, and analysis processes that have an effect on the dependability of the data.

When validating and verifying the trustworthiness of data sources, it is essential to take into consideration aspects such as correctness, completeness, and consistency. This applies to both internal and external data sources. The methods of data cleansing, verification, and validation are essential measures that must be taken in order to guarantee the dependability of the data that is used for assessment or decision-making.

17. A social media post is an example of structured data.

True
False

Explanation: A post on social media is not an example of structured data; rather, it is not structured data. The absence of a predetermined data model and the absence of an organization that is readily accessible by computers are both characteristics of unstructured data.

It is common for a post on social media to include a variety of multimedia components, including text, photographs, videos, emoticons, and other features. In spite of the fact that the post may have some structure (for example, timestamps, usernames, and hashtags), the format as a whole is flexible and does not adhere to a rigid, predetermined framework.

The opposite of unstructured data is structured data, which is arranged in a specified and highly formatted fashion. For example, tables in a relational database or rows and columns in a spreadsheet are examples of structured data. Examples of structured data include spreadsheets, databases, and CSV files, among other types of data.

18. A data analyst at a book publisher is working on an urgent report for executives. They are using only historical data. What is the most likely reason for choosing to analyze only historical data?

The data is constantly changing
There is plenty of time to research historical data
The project has a very short time frame
The data is unknown

19. Which of the following is an example of continuous data?

Box office returns
Movie run time
Movie budget
Leading actors in movie

20. Why is internal data considered more reliable and easier to collect than external data?

Internal data circumvents privacy restrictions.
Internal data has much larger sample sizes.
Internal data lives within a company’s own systems.
Internal data comes from people you know.

21. Which of the following is an example of structured data?

Digital photo
  Relational database
  Audio file
  Video file

22. In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?

A specific data type
A unique format
A unique data variable
A specific constraint

Explanation: Wide data normally consists of rows that include the values that correlate to the variables that are contained in each column, and each column typically contains a variable or a feature. The difference between wide data and long data is that with wide data, each column represents a different variable or measure, but in long data, each column represents a context or characteristic.

23. Which of the following questions collects nominal qualitative data?

On a scale of 1-10, how would you rate your service today?
Is this your first time dining at this restaurant?
How many times have you dined at this restaurant?
How many people do you usually dine with?

24. Nominal qualitative data has a set order or scale.

True
False

Explanation: The solutions to this question are divided into many categories, and there is no predetermined order or ranking among them within the question itself. The data is said to be nominal qualitative since each category is handled as if it were its own label.