TL;DRThe 8 tiktok json files are split, so you need `cat` to concatenate them to single xz file in order to extract files other than 001.
Then use jq to convert one-liner data file to json formatted file.
[1] cat tiktok.json.xz.00{1..8} > tiktok.json.xz.full; # bash
[2] Extract it in nautilus. Be patient, output size keep increasing until 176GB.
[3] <data jq '.' > tiktok.json
How I research from scratch:
I try jq on file tiktok.json.xz.001 extracted data file first, it failed with error at the end:
parse error: Unfinished string at EOF at line 1, column 301400064
I need to know why jq failed first before proceed other huge file.
I view the jq output last video id is 62077733563793408, and use this id to compare original data file.
I use python to read and split that id because normal utility `less`(-n better though) can't handle effectively such very long line.
I knew it failed ~10 lines only.
So the reason jq failed is because of No closing "}" something which is make sense because it seems like split.
Last characters of 001 data file:
"playAddr": "https://v19-web-newkey.tiktokcdn.com/79eec1166f8ae077c86dd37a14d70288/5f84ddc0/video/s3/mp/s3-mp-v-0068/reg02/2016/02/08/08/15804438-6c1b-4a7a-8e35-f04687699854.mp4/?a=1988&br=0&bt
Before I try jq huge file(188.6 GB concatenated data file), I want to prove the files are continuation otherwise I waste my time on jq parse error.
So the next thing to prove is that 002 file really continue 001 file.
I need extract the beginning part of 002 file from concatenated extracted data file to compare.
The normal command such as cut is heavy, so I try to use "low level" command `dd`.
The 001 data file is 26071203840 bytes (ls -la to get size, you don't use jq_001.out which already parsed).
Then full data file simply round a bit within 10MB range to 26070000000 bytes. Then extract the total 10000000 bytes (10MB).
dd if=full_data bs=1 skip=26070000000 count=10000000 of=skip_data
Again with python, `r = f.read()`, id `'62077733563793408' in r` is True.
Then simply split(only 2 indexes) by '62077733563793408' and print r[1].
Then I can see `=0` continue `&bt` (last 3 characters of 001 data file), which proved that 002 has correct opening bytes to be able continuously parsed by jq on concatenated full data file:
It means that safe to proceed `cat`, extract, and `<data jq '.' > tiktok.json`.
Be patient because it take times (data file 176GB, it took me 1 hour 8 minutes 40 seconds on jq, you can try 001 2GB file first to have expectation time of 14.1GB files).
After completed (195GB output file), I also use same step to compare to proved that the final output json item of parsed file same as data file.