Python -
Working with Strings
Problem Statement:
The problem which we are dealing here
is to run a survey to see what radish varieties your customers prefer the most.
Main objective of this study is to find
out the following:
·
What's the most popular
radish variety?
·
What are the least popular?
·
Did anyone vote twice?
A survey was carried out to collect the
data and it is available in the following link:
Save it to your system.Better view the contents of your file
with the help of some Text Editor app. (e.g., EditPlus, Sublime Text, etc.,)
Data will look this:
Analysis of the Data :
(I)Reading the Data:
and so
on...
The above code will go through the file
line by line. Each line will be like
"Jin
Li - White Icicle\n" $$ (“Name – Vote”)
Since we need to strip off the trailing
newline (“\n”).We are using the strip()
method.
Now we need to split each line into
name and vote. For this we can use the split().
You can learn about split function in this link:
and so
on...
Now we have to inspect the votes and
fine out the people who voted for White Icicle
Radishes.
and so
on...
(II) Counting the Votes:
Now we have to count the number of the
people who voted for White Icicle.
The above code tells us how many people
have voted for White Icicle.
Now we can use the above code to
calculate the number of votes for the other radish varieties also.
Below code will do the job.
Above code calculates the vote count
for White Icicle, Daikon and Sicily Giant radishes and prints the same.
(III) Counting all the votes:
Now we are moving to the next level of
counting the number of votes for each variety. Writing a code to do this is
time consuming since you have to know all the names in advance and you have to
loop through the file multiple times.
You'll need a data structure where you
can associate a radish variety with the number of votes counted for it.
Creating a dictionary would be perfect.
First you need to create an empty
dictionary counts={}
Remember that for dictionaries counts[vote] means "the value in
counts which is associated with the key vote". In this case, the key is a
string (radish name) and the value is a number (vote count.).
For better viewing of the result. We
are printing the output line by line.
(IV)
Cleaning the Data:
See the above result. To a computer,
"Red
King" and "red King" look different
because of the different capitalization. So do "Cherry Belle" and "
Cherry Belle" because of the leading space.
We need to clean the data so all looks
alike. In case of “Cherry Belle” we
can use strip() function. But what
about “Red King” and “red King” ? How you will tell the
computer that two are same?
There are many string functions are
available to correct this.
Some of them are str.lower() - would convert all the names to all lower case
str.upper() - would convert them all to
upper case
str.capitalize() - would Capitalise the
first letter only
You can learn about other string
functions in the following link:
str.capitalize() is used
along with the strip() function to
clean the data.
We are taking the vote variable. First
we are calling the strip() on it and
then calling str.capitalize() on the
above result and again storing it on the variable vote.
We have cleaned the data. Now we have
to see whether anyone has voted twice. Below code does that work.
Here we are first creating a empty list
voted and using append we are
adding new entries to the end. We are also using cleaning techniques to ensure
there is no confusion between the names. For example, "Joanne Smith" and "joanne
smith" are considered as the different person by the computer.
We started our Study with Data
collection followed by counting the votes, cleaning the data removing the duplicates.
Now it is time to find the winner of the show.
We can find the winner with the help of
the below code.
No comments:
Post a Comment