Saturday, 8 August 2015


       Python - Working with Strings
Problem Statement:
The problem which we are dealing here is to run a survey to see what radish varieties your customers prefer the most.

Main objective of this study is to find out the following:
·         What's the most popular radish variety?
·         What are the least popular?
·         Did anyone vote twice?

A survey was carried out to collect the data and it is available in the following link:


Save it to your system.Better view the contents of your file with the help of some Text Editor app. (e.g., EditPlus, Sublime Text, etc.,)

Data will look this:


Analysis of the Data :

(I)Reading the Data:
and so on...

The above code will go through the file line by line. Each line will be like

 "Jin Li - White Icicle\n"    $$     (“Name – Vote”)

Since we need to strip off the trailing newline (“\n”).We are using the strip() method.

Now we need to split each line into name and vote. For this we can use the split(). You can learn about split function in this link: 


and so on...

Now we have to inspect the votes and fine out the people who voted for White Icicle
Radishes.

and so on...

(II) Counting the Votes:

Now we have to count the number of the people who voted for White Icicle.

The above code tells us how many people have voted for White Icicle.

Now we can use the above code to calculate the number of votes for the other radish varieties also.

Below code will do the job.


Above code calculates the vote count for White Icicle, Daikon and Sicily Giant radishes and prints the same.

(III) Counting all the votes:

Now we are moving to the next level of counting the number of votes for each variety. Writing a code to do this is time consuming since you have to know all the names in advance and you have to loop through the file multiple times.

You'll need a data structure where you can associate a radish variety with the number of votes counted for it. Creating a  dictionary would be perfect.


First you need to create an empty dictionary counts={}

Remember that for dictionaries counts[vote] means "the value in counts which is associated with the key vote". In this case, the key is a string (radish name) and the value is a number (vote count.).

For better viewing of the result. We are printing the output line by line.


(IV) Cleaning the Data:


See the above result. To a computer, "Red King" and "red King" look different because of the different capitalization. So do "Cherry Belle" and " Cherry Belle" because of the leading space.

We need to clean the data so all looks alike. In case of “Cherry Belle” we can use strip() function. But what about “Red King” and “red King” ? How you will tell the computer that two are same? 

There are many string functions are available to correct this.

Some of them are str.lower() - would convert all the names to all lower case
 str.upper() - would convert them all to upper case
 str.capitalize() - would Capitalise the first letter only

You can learn about other string functions in the following link:


str.capitalize() is used along with the strip() function to clean the data.
We are taking the vote variable. First we are calling the strip() on it and then calling str.capitalize() on the above result and again storing it on the variable vote.


We have cleaned the data. Now we have to see whether anyone has voted twice. Below code does that work.

Here we are first creating a empty list voted and using append we are adding new entries to the end. We are also using cleaning techniques to ensure there is no confusion between the names. For example, "Joanne Smith" and "joanne smith" are considered as the different person by the computer.



We started our Study with Data collection followed by counting the votes, cleaning the data removing the duplicates. Now it is time to find the winner of the show.
We can find the winner with the help of the below code.






No comments:

Post a Comment