I already have dataframes of names that are parsed:
** FIRST_NAME ** John Cullen William Todd J & End 123 Trust
If this is good or bad, then I create a column to flag a name:
df ['bd'] = pdseries (npjoros (1), index = df.index) ** FIRST_NAME ** ** Bad ** John 0 Colin William William Todd JJ-Crew 0 and O Inc 0 123 Trust If any FIRST_NAME has punctuation, Numbers or a white spot, then <0
I want to update Bad = 1.
** FIRST_NAME ** ** BAD * * John 0 Colline 0 William 0 Todd JJ-Crew 1 & O Inc 1 123 Trust 1
It's my code again:
punctuation = '! "# $% & Amp; \ '() * +, -. / :; & lt; = & gt ;? @ [\\] ^ _` {|}} 1234567890' i = 0 while i & lt; int (Lane (dfcopy)): For P in Punctuation 1: If (DF ['bad'] [i] == 1): DF ['bad'] [I] = 1 elif (P in df. ILO [I, 1]) and DF ['Bad'] [I] == 0): DF ['bad'] [I] = 1 second: DF [bad]] [I] = 0 I = i + 1
Is there any way to speed it?
Another possibility: Set your punctuation with
punctuation marks = set (punctuation marks)
. Then you can do this: df ['bad'] = df.First_Name.map (lambda v: bool (set (v) and punctuation mark))
In addition, if you really want to know that all the letters in the string are letters, then you can:
df ['bad'] = df First_Name.map (Lambda V: v.isalpha ())
Comments
Post a Comment