python - Remove #, and RT from the tweet -


I have millions of tweets and I want to remove "#" and "RT For example: For example: "RT @ ABC: Mustkken output should be output to Erail Adelon # Paulis Mewood Murat Çetiner" @ ABC: Messelken Eharko Edelon Paulus Münu Murat Çetiner Here is the code that I have:

  # coding: UTF-8 importing system x = "RT @ zaman Comtr: Messelkten Eras Adelon Polly S Münter Murat Çetiner: Bana Takir Belgian Verne BM D Mile Parl? Http://t.co/sd5N6yaZzv http: ... "y = '' .join (re.sub (" ([[A-Za-z0- 9] +) | ([^ 0-9A-Za-z \ t]) | (\ w +: \ / \ / \ S +) "," ", x) .split ()) print y   

You can use the next code:

  z = Lambda x: re.compile ('\ #'). ('', Re.compile ('RT @'). Sub ('@', x, count = 1) .strip ()) print z (x)  

first < Code> Re.compile ('RT @'). Sub ('@', x, count = 1) before the first name changes 'rt' If you want to remove any of the label re-labeling in the post, then just enter the code It is important to use the count = 1 remove mask 'RT @' from 'RT' in tweets.

By the way, re.compile ('\ #') removes all the hashtags from tweets.


Comments