python - Identify rows with punctuation in pandas data frame -

I already have dataframes of names that are parsed:

  ** FIRST_NAME ** John Cullen William Todd J & End 123 Trust

If this is good or bad, then I create a column to flag a name:

  df ['bd'] = pdseries (npjoros (1), index = df.index) ** FIRST_NAME ** ** Bad ** John 0 Colin William William Todd JJ-Crew 0 and O Inc 0 123 Trust If any FIRST_NAME has punctuation, Numbers or a white spot, then <0

I want to update Bad = 1.

  ** FIRST_NAME ** ** BAD * * John 0 Colline 0 William 0 Todd JJ-Crew 1 & O Inc 1 123 Trust 1

It's my code again:

  punctuation = '! "# $% & Amp; \ '() * +, -. / :; & lt; = & gt ;? @ [\\] ^ _` {|}} 1234567890' i = 0 while i & lt; int (Lane (dfcopy)): For P in Punctuation 1: If (DF ['bad'] [i] == 1): DF ['bad'] [I] = 1 elif (P in df. ILO [I, 1]) and DF ['Bad'] [I] == 0): DF ['bad'] [I] = 1 second: DF [bad]] [I] = 0 I = i + 1

Is there any way to speed it?

Another possibility: Set your punctuation with punctuation marks = set (punctuation marks) . Then you can do this:

  df ['bad'] = df.First_Name.map (lambda v: bool (set (v) and punctuation mark))

In addition, if you really want to know that all the letters in the string are letters, then you can:

  df ['bad'] = df First_Name.map (Lambda V: v.isalpha ())

Brayer

Search This Blog

python - Identify rows with punctuation in pandas data frame -

Comments

Post a Comment