1. The data collected for an analysis
project has just been cleaned. What are the next steps for a data analyst?
Select all that apply.
- Certification
- Reporting
- Verification
- Validation
2. What is the first step in the
verification process?
- Compare cleaned data with the original,
uncleaned dataset and compare it to what is there now
- Create a chronological list of modifications
made to the data
- Determine the quality of the data
- Inform others of your data-cleaning effort
Explanation: The process of verification normally begins with the
definition of the verification goals and criteria as the first stage.
Specifically, this entails laying out in detail what has to be verified and
determining the requirements or benchmarks that must be met for verification to
be considered acceptable. For the verification to be carried out successfully,
it is very necessary to have a complete comprehension of the objectives and
standards that will be used. In this stage, the foundation is laid for the whole
verification process. It serves as a guide for later actions and ensures that
the verification is in line with the results that were planned.
3. Fill in the blank: TRIM is a function
that removes _____ spaces in data. Select all that apply.
- Trailing
- Leading
- repeated
- inner
Explanation: In the data, there are leading and trailing spaces.
One of the most popular applications of the TRIM function is to eliminate the
unnecessary spaces that are present at the beginning (leading) and the end
(trailing) of a string.
4. While verifying cleaned data, a data
analyst encounters a misspelled name. Which function can they use to determine
if the error is repeated throughout the dataset?
- CHECK
- COUNTA
- COUNT
- CASE
Explanation: To assess whether or whether the misspelled name
appears several times across the dataset, the data analyst may make use of the
"COUNTIF" function or another function of a similar kind. By using
this function, they are able to determine the number of times a certain value
or condition appears inside a particular range of cells. Through the
application of this function to the dataset and the subsequent check for the
count of the misspelled name, the analyst is able to determine whether or not the
mistake is repeated and evaluate the degree to which it occurs.
5. A WHEN statement considers one or more
conditions and returns a value as soon as that condition is met.
- True
- False
Explanation: It
seems as if you are describing a CASE statement, which is a statement that is
often used in programming or querying languages like as SQL. It is possible to
examine many conditions in a sequential manner inside a CASE statement, and
when a condition is satisfied, you can either return a particular value or
carry out a certain action. The statement is terminated after it has discovered
the first true condition, and the value or action that corresponds to that
condition is carried out.
6. Fill in the blank: Documentation is
the process of tracking _____ during data cleaning. Select all that apply.
- inactivity
- deletions
- changes
- additions
Explanation: During the process of data cleansing, the process of
documenting changes, choices, and justifications is referred to as
documentation. When it comes to knowing the processes that were done throughout
the data cleaning process, this guarantees that there is transparency,
repeatability, and clarity, respectively. It is possible for it to contain
information about the resolution of missing data, the transformation of
variables, the management of outliers, and any other alterations that were done
to the initial dataset.
7. Fill in the blank: While cleaning
data, a data analyst can use a changelog to keep a chronological list of
changes they make. They can refer to it during the _____ period if there are
errors or questions.
- verification
- visualization
- presenting
- documentation
Explanation: It is possible for a data analyst to utilize a
changelog in order to maintain a chronological account of the modifications
that they make when cleaning data. It is possible for them to consult it
throughout the time of validation or debugging in the event that there are
queries or mistakes.
8. Reviewing version history is an
effective way to view a changelog in SQL.
- True
- False
Explanation: On the contrary. The SQL database management system
does not natively offer a direct method for version history or changelogs,
despite the fact that version control systems such as Git are ideal for
monitoring changes and keeping a changelog in software development. When
working with SQL, it is common practice to manually record changes or to make
use of a specialized system for performing version control and changelog
management.
The functions of version history and
changelog are more closely related with version control systems that are used
in the process of software development than they are with the SQL computer
language itself. These systems provide developers with the ability to monitor
changes, interact with one another, and maintain multiple versions of their
code.
9. Fill in the blank: Once data is clean,
a data analyst moves on to _____ and verification.
- processing
- publishing
- reporting
- confirming
Explanation: The next step for a data analyst is to do analysis and verification once the data has been cleaned. Examining the cleaned data in order to derive useful insights and ensuring that it satisfies the criteria for further analysis or reporting is a part of this process. Before moving further with any analytical or reporting chores, verification verifies that the data that has been cleansed are accurate and reliable.
10. A data analyst is in the verification
step. They consider the business problem, the goal, and the data involved in
their analytics project. What scenario does this describe?
- Visualizing the data
- Seeing the big picture
- Reporting on the data
- Considering the stakeholders
Explanation: During the verification stage of an analytics
project, this scenario provides a description of the first phase of exploratory
data analysis (EDA). In this stage of the process, the data analyst takes into
consideration the business issue, the objectives of the study, and the data
that is pertinent to the problem. Obtaining a more in-depth comprehension of
the data, seeing patterns, and developing preliminary insights that may serve
as a foundation for the succeeding phases in the analytical process are the goals
of this endeavor.
11. Which of the following functions
automatically remove extra spaces when cleaning data?
- SNIP
- REMOVE
- CLEAR
- TRIM
Explanation: When data is being cleaned, the TRIM function is
often used to automatically eliminate any unnecessary spaces that may be
present. In order to guarantee that the data is consistent and devoid of any
extraneous spaces, it eliminates the leading and following spaces that are
present in a text string.
12. While verifying cleaned data, a data
analyst encounters a misspelled name. Which function can they use to determine
if the error is repeated throughout the dataset?
- COUNTA
- COUNT
- CHECK
- CASE
Explanation: It is possible for a data analyst to utilize the
COUNTIF function in order to ascertain whether or not a misspelled name is
duplicated across the dataset. The COUNTIF function gives the analyst the
ability to count the number of times a certain value or condition appears
inside a given range of cells. Following the application of COUNTIF to the
dataset and the subsequent check for the count of the misspelled name, the
analyst is able to determine whether or not the mistake is repeated and
evaluate the extent to which it is present in the dataset.
13. A data analyst uses a changelog while
cleaning data. What process does a changelog support?
- Documentation
- Illumination
- Disclosure
- Examination
Explanation: When it comes to the process of recording changes
during data cleansing, a changelog is a helpful tool. This is a chronological
record or paperwork that describes the revisions, choices, and activities that
were performed by a data analyst while they were cleaning and processing the
data. In order to provide transparency, repeatability, and troubleshooting, a
changelog is quite useful. This is because it enables the analyst to examine
and comprehend the sequence of changes that have been made to the dataset. In
the event that faults or queries emerge during the process of validation or
debugging, it makes a very important contribution.
14. Verification and reporting come
directly before the data-cleaning process.
- True
- False
15. Which function removes leading,
trailing, and repeated spaces in data?
- TRIM
- CROP
- TIDY
- CUT
Explanation: In most cases, the TRIM function is the one that is
responsible for removing data that contains leading, trailing, and repetitive
spaces. Additionally, TRIM is meant to get rid of any redundant spaces that may
be present inside a string, as well as any additional spaces that may be
present at the beginning or end of a string. This guarantees that the data is
clean and consistent with regard to the space between the columns. It is
important to keep in mind that the precise implementation may differ from computer
programming or database management systems that you are using.
16. Which SQL tool considers one or more
conditions, then returns a value as soon as a condition is met?
- CASE
- WHEN
- THEN
- ELSE
Explanation: The CASE statement is the tool that you are referring
to when you talk about SQL. The CASE statement is responsible for evaluating
one or more conditions and either returning a value or carrying out an action
as soon as it comes across a condition that is evaluated to be true. It enables
conditional logic to be included into SQL queries, which makes it a strong
instrument for the creation of individualized outputs depending on the
circumstances that are given.
17. Fill in the blank: A changelog
contains a _____ list of modifications made to a project.
- approximate
- random
- synchronized
- chronological
Explanation: There is a chronological record of changes that have
been made to a project that is included inside a changelog.
18. A data analyst makes changes to SQL
queries and uses these comments to create a changelog. This involves specifying
the changes they made and why they made them.
- True
- False
Explanation: Yes, it is an excellent method! When it comes to
preserving openness, fostering collaboration, and establishing a clear
historical record of modifications made to your SQL queries, it is essential to
create a changelog that includes comments that detail the changes that were
made and the reasons that informed them. The original analyst is not the only
one who benefits from this documentation; any members of the team or
stakeholders who may in the future need the ability to comprehend or duplicate
the modifications are assisted as well. In the context of effective data
management and documenting processes, it is an important component.
19. What is involved in seeing the big
picture when verifying data cleaning? Select all that apply
- Consider the business problem
- Consider the data
- Consider the goal
- Consider the reporting
20. Fill in the blank: TRIM is a function
that removes _____ spaces in data. Select all that apply.
- Leading
- Repeated
- inner
- trailing
Explanation: This function, known as TRIM, is responsible for
removing leading and trailing spaces from data.
21. What is the process of tracking
changes, additions, deletions, and errors during data cleaning?
- Documentation
- Cataloging
- Recording
- Observation
Explanation: Version control or the act of keeping a changelog are phrases that are often used to refer to the process of recording changes, additions, deletions, and mistakes that occur throughout the process of data cleansing. An example of a changelog is a chronological record or documentation that provides information about the alterations that were made to a dataset while it was being cleaned. It contains information on any mistakes that were encountered, as well as any changes, additions, or removals that were made. Having this documentation is critical for preserving transparency, repeatability, and a clear history of the actions involved in the data cleaning process. This documentation also makes it simpler to comprehend, debug, and recreate the data cleaning process with ease.
22. At what point during the analysis
process does a data analyst use a changelog?
- While cleaning the data
- While visualizing the data
- While gathering the data
- While reporting the data
Explanation: A
changelog is often used by a data analyst throughout the whole of the analysis
process, and specifically throughout the phases of data cleansing and
transformation. The purpose of the changelog is to provide a chronological
record of the alterations that have been made to the data. These modifications
include revisions, additions, removals, and failures that have occurred.