Hi, Task : My code (which doesn’t work) : I am having a hard time writing code for the above task, can you please help me out with this.
I have a dataframe which has 27949 rows & 7 columns & the first few rows look like this https://i.stack.imgur.com/1Pipf.png
In the dataframe I have a 'title' column which has many duplicate titles which I want to remove (duplicate title:almost all the title is same except for 1 or 2 words).
Pseudo code :
I want to check the 1st row with all other rows & if any of these is a duplicate then I want to remove it.
Then I want to check the 2nd row with all other rows & if any of these is a duplicate I want to remove it - similarly with all rows i.e. i = 1st line to last line j = i+1 to last line.for i in range(0,27950):
for j in range(1,27950):
a = data_sorted['title'].iloc[i].split()
b = data_sorted['title'].iloc[j].split()
if len(a)-len(b)<=2:
data_sorted.drop(b)
j=j
else:
j+=1
i+=1