I am trying to capture 4-year hourly data from Wunderground.com with BeautifulSoup. Although I have a problem, when I used this code the problem is that each hour is not shown in only one row, it is divided into 3 lines (in enclosed). When I try to filter the data, it is hard to study. I have to keep every single data in a row, can you help me on this?
My code:
Import urllib.request as urllib2, then, time # import bs4 from the importing parser BeautifulSoup f = open ('weather .txt ',' W ') # Start the simulation for Y in the range and last year (2009, 2013): # months that you want to remove # Use limit for January and February (1,3 ) In the range (1, 13) for the range: # Check for leap year (1,32) to rotate in the range: If y% 400 == 0: Leap = True Elife y% 100 == 0: leap = False elif y% 4 = = 0: leap = t Second: Leap = Falls if (M == 2 and Leap and D> 29): Alif (M == 2 and D> 28) Continue: Alif (M [4, 6, 9, 10] And D> 30): Issued url = "http://www.wunderground.com/history/airport/LTBA/" + str (y) + "/" + str (m) + "/" + ARR ( D) Opening the website with "Beautiful Soup Soup = Beautiful Soup (Page)" # Paraglist = Soup.Find All (ID = "Overview-Details") = "/ DailyHITI HTML" page = urllib2.urlopen (url) # Beautiful soup soup counter = 0 Counter_max = 0 # Maximum number of columns # Number of digits by zero Tring = '' If if add lane (str (m)) then & lt; 2: mStamp = '0' + str (m) Other: mStamp = str (m) if lane (str (d))
Example output:
Comments
Post a Comment