Preface
As a heavy user of Douyin, I am laughing at the video every day 😄, I am very happy. But everyone knows that the videos downloaded by Douyin are watermarked. As a programmer with obsessive-compulsive disorder, this is absolutely not allowed. There are many de-watermarking tools on the Internet, what are their principles, and they have written a particularly powerful algorithm. Curiosity drove me to start research.
Short video to watermark
analyze
We start with the share link of Douyin, and the format of the share link copied from Douyin is as follows
2.82 wsr:/ Happy birthday to Kobe.%Basketball %Mamba Mentality %Kobe Birthday https://v.douyin.com/d8LpxMQ/ Copy the link, search for it, and watch the video directly!
There is a link address https://v.douyin.com/d8LpxMQ/
, we put it in the browser and found that the link is redirected, and the redirected address is as follows:
https://www.iesdouyin.com/share/video/6999605370222054663
It doesn't seem to be of any use, let's grab the package to see if there is an interface for requesting video, look carefully, ding. I found the interface of item_ids, and the value that follows is the last part of the redirect url (6999605370222054663), which I judged to be the ID of the video. The interface addresses are as follows:
https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids=6999605370222054663
Next, let's take a look at the data returned by this interface request. Wow, when I click on Preview, I suddenly see the text, author, music, thumbnail, address, etc. of the video.
After I took out the address of the video, I copied it to the browser to open it. The video url is as follows:
https://aweme.snssdk.com/aweme/v1/playwm/?video_id=v0d00fg10000c4hpfk3c77uar6l7cs90&ratio=720p&line=0
But after opening it, I found that the watermark in the upper left corner of the video is still there. Looking at playwm in this url, I found that wm is a bit similar to my project name, isn't it the abbreviation of watermark? I removed wm, then copied it to the browser to open it, a magical scene appeared, the watermark of the video was gone, I was so excited. The address of the video without watermark is as follows:
https://aweme.snssdk.com/aweme/v1/play/?video_id=v0d00fg10000c4hpfk3c77uar6l7cs90&ratio=720p&line=0
It turns out that it is so easy to remove the watermark from Douyin video. I was thinking about video algorithms and so on, and it was done with a simple analysis. Haha, the simplicity makes me a little moved 🤭.
Now that the principle is understood, writing code is not easy and enjoyable.
Code
The video link we copied is a short video link with mixed text. First, we will extract the link.
if len(re.findall('[a-z]+://[\S]+', content, re.I | re.M)) > 0:
return re.findall('[a-z]+://[\S]+', content, re.I | re.M)[0]
After the video short link is extracted, it needs to be redirected to obtain the id of the video. Make requests through the request library
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'
'Chrome/92.0.4515.107Safari/537.36'}
# url: redirect url
response = requests.get(url, headers=header)
return response.url
We need to intercept the video id from the redirected url as the value of the interface parameter item_ids
# realUrl: redirected url
startUrl = realUrl[0:realUrl.index('?')]
id = startUrl[startUrl.rindex('/') + 1:len(startUrl)]
Start requesting the interface. GET request, one parameter item_ids
douyinUrl = 'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo'
douyinParams = {
'item_ids': id
}
douyinResponse = requests.get(url=douyinUrl, params=douyinParams, headers=headers)
body = douyinResponse.text
json parsing to get the video copy and video address. The playwm of the non-watermarked video link should be replaced with play
data = json.loads(body)
# Video copy
videoTitle = data['item_list'][0]['desc']
# Video with watermark url
videoUrl = data['item_list'][0]['video']['play_addr']['url_list'][0]
# Video without watermark url
realVideoUrl = f'{videoUrl}'.replace('playwm', 'play')
Finally, we use the webbrowser library to open the browser, play the video, and enjoy the pleasure of no watermark
webbrowser.open(realVideoUrl)
All codes are as follows:
import json
import re
import webbrowser
import requests
def get_url(content):
if len(re.findall('[a-z]+://[\S]+', content, re.I | re.M)) > 0:
return re.findall('[a-z]+://[\S]+', content, re.I | re.M)[0]
return None
def get_redirect_url(url, header):
# url: redirect url
response = requests.get(url, headers=header)
return response.url
if __name__ == '__main__':
douyinUrl = 'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'
'Chrome/92.0.4515.107Safari/537.36'}
inputContent = input('Please enter the video link:')
if inputContent.strip() is not None:
if get_url(inputContent) is not None:
realUrl = get_redirect_url(get_url(inputContent), headers)
# realUrl: redirected url
startUrl = realUrl[0:realUrl.index('?')]
id = startUrl[startUrl.rindex('/') + 1:len(startUrl)]
douyinParams = {
'item_ids': id
}
if realUrl.__contains__('www.douyin.com/video'):
douyinResponse = requests.get(url=douyinUrl, params=douyinParams, headers=headers)
body = douyinResponse.text
print(douyinResponse.url)
data = json.loads(body)
print(data['item_list'][0]['desc'])
# Video copy
videoTitle = data['item_list'][0]['desc']
# Video with watermark url
videoUrl = data['item_list'][0]['video']['play_addr']['url_list'][0]
# Video without watermark url
realVideoUrl = f'{videoUrl}'.replace('playwm', 'play')
print(realVideoUrl)
webbrowser.open(realVideoUrl)
Post comment 取消回复