youtube Data API를 이용해 유튜브 댓글(라이브방송 포함) 수집

250x250

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

데이터과학 삼학년

youtube Data API를 이용해 유튜브 댓글(라이브방송 포함) 수집 본문

GCP

youtube Data API를 이용해 유튜브 댓글(라이브방송 포함) 수집

Dan-k 2021. 12. 9. 20:13

유튜브 Data API

Youtube Data API(v3)
유튜브와 관련된 기본적인 API로, 동영상을 업로드하거나 재생목록을 관리하는 등의 가장 기본적인 기능 제공
Youtube Analytics API
유튜브의 동영상 및 채널에 대한 시청 통계, 인기도 통계 등 검색, 동영상 수익 관련 정보
Youtube Live Streaming API

1. 유튜브 영상 정보 수집

youtube data api 에서 api key를 발급받아 사용 가능
pafy 라이브러리 사용
- dislike count로 인해 keyerror 발생
- package파일에 직접 들어가서 해당부분 주석 처리
title, author, published, likes, category, description, keywords, viewcount, rating 등의 영상 정보 수집

import pafy #유튜브 정보 수집 

api_key = '<api_key>' #gcp youtube data api 에서 api key 생성 
pafy.set_api_key(api_key) 
v = pafy.new(video_id) 
title, author, published, likes, category, description, keywords, viewcount, rating = v.title, v.author, v.published, v.likes, v.category, v.description, v.keywords, v.viewcount, v.rating 

print('title:', title) 
print('author:',author) 
print('published:',published) 
print('likes:',likes) 
print('category:',category) 
print('description:',description) 
print('keywords:',keywords) 
print('viewcount:',viewcount) 
print('rating:',rating)

2. 유튜브 영상 댓글 수집

댓글 단위 좋아요수 수집 가능
영향을 많이 미친 댓글에 대해 우선적으로 분석하여 결과 해석 가능 → 유저 공감도 확인
대댓글 : 댓글의 댓글도 수집 가능

from googleapiclient.discovery import build

video_id ='JpTqSzm4JOk'
comments = list()
api_obj = build('youtube', 'v3', developerKey=api_key)
response = api_obj.commentThreads().list(part='snippet,replies', videoId=video_id, maxResults=100).execute()
 
while response:
    for item in response['items']:
        comment = item['snippet']['topLevelComment']['snippet']
        comments.append([comment['textDisplay'], comment['authorDisplayName'], comment['publishedAt'], comment['likeCount']])
 
        if item['snippet']['totalReplyCount'] > 0:
            for reply_item in item['replies']['comments']:
                reply = reply_item['snippet']
                comments.append([reply['textDisplay'], reply['authorDisplayName'], reply['publishedAt'], reply['likeCount']])
 
    if 'nextPageToken' in response:
        response = api_obj.commentThreads().list(part='snippet,replies', videoId=video_id, pageToken=response['nextPageToken'], maxResults=100).execute()
    else:
        break
 
 df = pd.DataFrame(comments, columns=[['댓글','작성자','작성시간','좋아요수']])

3. 유튜브 실시간 (라이브 방송) 댓글 수집

pytchat 라이브러리를 통해 유튜브 라이브 방송 댓글 수집 가능

import pytchat #실시간 댓글 크롤링 
import pafy #유튜브 정보 
import pandas as pd 

api_key = '<api_key>' #gcp youtube data api 에서 api key 생성 
pafy.set_api_key(api_key) 

video_id = 'GoXPbGQl-uQ' # [LIVE] 대한민국 24시간 뉴스채널 YTN
file_path = './news_ytn_youtube.csv'

empty_frame = pd.DataFrame(columns=['제목', '채널 명', '스트리밍 시작 시간', '댓글 작성자', '댓글 내용', '댓글 작성 시간'])
chat = pytchat.create(video_id=video_id)

while chat.is_alive():
    cnt = 0
    try:
        data = chat.get()
        items = data.items
        for c in items:
            print(f"{c.datetime} [{c.author.name}]- {c.message}")
            data.tick()
            data2 = {'제목' : [title], '채널 명' : [author], '스트리밍 시작 시간' : [published], '댓글 작성자' : [c.author.name], '댓글 내용' : [c.datetime], '댓글 작성 시간' : [c.message]}
            result = pd.DataFrame(data2)
            result.to_csv(file_path, mode='a', header=False)
        cnt += 1
        if cnt == 5 : break
    except KeyboardInterrupt:
        chat.terminate()
        break

df = pd.read_csv(file_path, names=['제목' ,'채널명','스트리밍 시작 시간','댓글 작성자', '댓글 작성시간','댓글내용'])
df.head(30)

라이브 방송 댓글 수집

video_id 리스트를 이용해 댓글 수집 코드

video_lst = ['5Rc3u8UbdD8','YwafYHCAFaQ','pYxNSUDSFH4'] 

def adjust_df_size(info_df,size):
    temp = info_df.copy()
    for i in range(size):
        temp = temp.append([info_df])
    temp = temp.reset_index(drop=True)
    
    return temp
    
def collect_data(video_id):
    
    v = pafy.new(video_id)
    title, author, published, likes, category, description, keywords, viewcount, rating = v.title, v.author, v.published, v.likes, v.category, v.description, v.keywords, v.viewcount, v.rating
    keywords = ', '.join(keywords)
    info_df = pd.DataFrame({'title' : [title], 'channel' : [author], 'published_time' : [published], 'video_likes':[likes], 'category':[category],'viewcount':[viewcount], 'rating':[rating], 'keywords':[keywords]})
    comments = list()
    api_obj = build('youtube', 'v3', developerKey=api_key)
    response = api_obj.commentThreads().list(part='snippet,replies', videoId=video_id, maxResults=100).execute()

    while response:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            comments.append([comment['textDisplay'], comment['authorDisplayName'], comment['publishedAt'], comment['likeCount']])

            if item['snippet']['totalReplyCount'] > 0:
                for reply_item in item['replies']['comments']:
                    reply = reply_item['snippet']
                    comments.append([reply['textDisplay'], reply['authorDisplayName'], reply['publishedAt'], reply['likeCount']])

        if 'nextPageToken' in response:
            response = api_obj.commentThreads().list(part='snippet,replies', videoId=video_id, pageToken=response['nextPageToken'], maxResults=100).execute()
        else:
            break
    
    comments_df = pd.DataFrame(comments, columns=['comments','writer','comment_time','comments_likes'])
    info_df = adjust_df_size(info_df,size=len(comments_df)-1)
    df = pd.concat([info_df, comments_df], axis=1)
    
    return df

def data_merge_load(video_lst, destination_table, project_id):
    is_first = True
    for video_id in video_lst:
        temp = collect_data(video_id)
        if is_first:
            new_df = temp.copy()
            is_first = False
        else:
            new_df = pd.concat([new_df,temp],axis=0)
    
    new_df = new_df.reset_index(drop=True)
    new_df = new_df.applymap(lambda x: str(x).replace("\r"," "))
    new_df = new_df.applymap(lambda x: str(x).replace("\n"," "))
    new_df.to_gbq(destination_table=destination_table, project_id=project_id, if_exists='replace')
    
    return new_df

https://velog.io/@ally0526/1.-Youtube-API%EB%A1%9C-%EB%8C%93%EA%B8%80-%EA%B0%80%EC%A0%B8%EC%98%A4%EA%B8%B0

728x90

LIST

'GCP' 카테고리의 다른 글

Bigquery array, unnest를 mysql에서는 recursive문을 활용 (0)	2022.01.01
Bigquery ML Explainable AI (XAI) (0)	2021.11.10
Bigquery procedure 를 이용하여 recursion 함수 만들기 (0)	2021.04.17
Bigquery Procedure 소개 (0)	2021.04.08
Cloud Scheduler로 Compute 인스턴스 예약 (feat. cloud function, cloud pub/sub) (0)	2020.10.29

'GCP' Related Articles

Comments

데이터과학 삼학년

youtube Data API를 이용해 유튜브 댓글(라이브방송 포함) 수집 본문

youtube Data API를 이용해 유튜브 댓글(라이브방송 포함) 수집

'GCP' 카테고리의 다른 글

티스토리툴바