22 Messages
•
428 Points
Saturday, September 24th, 2022
Solved
Missing entries (tconst) in title.basics.tsv with regard to other files
When checking in the main file (title.basics.tsv) all the tconst (tt....) found in the other files (akas, episodes and principals) there seems to be incoherencies and missing movies or TV shows. The parent series of some TV show episodes found in the title.episodes.tsv (the parentTconst value) are absent in the title.basics.tsv For instance, tt0086748 show 8 episodes in the title.episodes.tsv +-----------+--------------+--------------+---------------+| tconst | parentTconst | seasonNumber | episodeNumber |+-----------+--------------+--------------+---------------+| tt0630512 | tt0086748 | 1 | 6 || tt0630513 | tt0086748 | 1 | 8 || tt0630514 | tt0086748 | 1 | 7 || tt0630515 | tt0086748 | 1 | 4 || tt0630516 | tt0086748 | 1 | 2 || tt0630517 | tt0086748 | 1 | 5 || tt0630518 | tt0086748 | 1 | 1 || tt0630519 | tt0086748 | 1 | 3 |+-----------+--------------+--------------+---------------+ but tt0086748 is not present on title.basics.tsv while the TV show exists on IMdb: https://www.imdb.com/title/tt0086748/ What is even more surprising, is that for some cases the webpage of the parent TV show doesn't exists on the file nor on the website: +-----------+--------------+--------------+---------------+| tconst | parentTconst | seasonNumber | episodeNumber |+-----------+--------------+--------------+---------------+| tt7153894 | tt4824012 | 6 | 21 |+-----------+--------------+--------------+---------------+ tt4824012 is also absent from tile.basics.tsv and it returns a dead page but tt6781204 shows the episode with a different ID for the serie (tt2912216 instead of tt4824012) And for a more extreme case: +-----------+--------------+--------------+---------------+| tconst | parentTconst | seasonNumber | episodeNumber |+-----------+--------------+--------------+---------------+| tt4839832 | tt4820982 | 1 | 17 |+-----------+--------------+--------------+---------------+ here, both IDs (episode and parent serie) are absent in the titles.basics and the website tt4820982 -> 404 Not Found tt4839832 -> 404 Not Found In conclusion, from those 3 files: - title.akas.tsv - title.episodes.tsv - title.principals.tsv when we extract respectively the titleId, parentTconst and tconst and check them against the tconst in title.basics.tsv there are 6220 missing entries 1086 seem to be valid pages on IMDb website (although 48 are redirections -> HTTP/1.1 308 Redirect) but 5134 are dead pages (see tt4820982 above) The files title.ratings and title.crew doesn't seem to have the same problem.
Problem
•
Updated
9 months ago
• Edited
184
7
Responses
Bethanny
Employee
•
5.6K Messages
•
58.9K Points
3 years ago
0
Adren
22 Messages
•
428 Points
3 years ago
(edited)
0
Bethanny
Employee
•
5.6K Messages
•
58.9K Points
2 years ago
2
0
Adren
22 Messages
•
428 Points
2 years ago
(edited)
0
0
Adren
22 Messages
•
428 Points
1 year ago
0
Adren
22 Messages
•
428 Points
1 year ago
1
0
Michelle
Employee
•
18K Messages
•
318.8K Points
11 months ago
3
0