[190116] pytrend

프로그래밍/크롤링(PYTHON)

[190116] pytrend

나민오 2019. 1. 16. 16:05

Google trend 서비스는 다양한 필터와 부가기능을 제공하는데.. Google 측에서 공식적인 API는 제공하고 있지 않다. 하지만 이런 귀중한 데이터를 가만히 놔둘리 없는 개발자 분들이 다행히도 비공식 API를 개발해서 Github에서 제공중이다.

# Github

git clone https://github.com/GeneralMills/pytrends

# 기능

Top Chart나 Trending Search 등 다양한 기능이 있지만, 현재로서는 기본적인 기능만 사용할 것 같다.

Interest_over_time : 시간에 따른 검색 키워드의 검색량을 보여준다. 물론, 시간설정도 가능하다. 결과가 pandas로 제공되므로 pandas의 대략적인 Dataframe을 알고 있으면 유용하다 - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

 
# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
print(interest_over_time_df)

pandas 객체 - interest_over_time_df·

interest_over_time 의 내용은 아래와 같다.

 
    def interest_over_time(self):
        """Request data from Google's Interest Over Time section and return a dataframe"""

        over_time_payload = {
            # convert to string as requests will mangle
            'req': json.dumps(self.interest_over_time_widget['request']),
            'token': self.interest_over_time_widget['token'],
            'tz': self.tz
        }

        # make the request and parse the returned json
        req_json = self._get_data(
            url=TrendReq.INTEREST_OVER_TIME_URL,
            method=TrendReq.GET_METHOD,
            trim_chars=5,
            params=over_time_payload,
        )

        df = pd.DataFrame(req_json['default']['timelineData'])
        if (df.empty):
            return df

        df['date'] = pd.to_datetime(df['time'].astype(dtype='float64'), unit='s')
        df = df.set_index(['date']).sort_index()
        # split list columns into seperate ones, remove brackets and split on comma
        result_df = df['value'].apply(lambda x: pd.Series(str(x).replace('[', '').replace(']', '').split(',')))
        # rename each column with its search term, relying on order that google provides...
        for idx, kw in enumerate(self.kw_list):
            # there is currently a bug with assigning columns that may be
            # parsed as a date in pandas: use explicit insert column method
            result_df.insert(len(result_df.columns), kw, result_df[idx].astype('int'))
            del result_df[idx]

        if 'isPartial' in df:
            # make other dataframe from isPartial key data
            # split list columns into seperate ones, remove brackets and split on comma
            df = df.fillna(False)
            result_df2 = df['isPartial'].apply(lambda x: pd.Series(str(x).replace('[', '').replace(']', '').split(',')))
            result_df2.columns = ['isPartial']
            # concatenate the two dataframes
            final = pd.concat([result_df, result_df2], axis=1)
        else:
            final = result_df
            final['isPartial'] = False

        return final

저작자표시