23 Messages
•
450 Points
Saturday, September 24th, 2022
Solved
Missing entries (tconst) in title.basics.tsv with regard to other files
When checking in the main file (title.basics.tsv) all the tconst (tt....) found in the other files (akas, episodes and principals) there seems to be incoherencies and missing movies or TV shows. The parent series of some TV show episodes found in the title.episodes.tsv (the parentTconst value) are absent in the title.basics.tsv For instance, tt0086748 show 8 episodes in the title.episodes.tsv +-----------+--------------+--------------+---------------+| tconst | parentTconst | seasonNumber | episodeNumber |+-----------+--------------+--------------+---------------+| tt0630512 | tt0086748 | 1 | 6 || tt0630513 | tt0086748 | 1 | 8 || tt0630514 | tt0086748 | 1 | 7 || tt0630515 | tt0086748 | 1 | 4 || tt0630516 | tt0086748 | 1 | 2 || tt0630517 | tt0086748 | 1 | 5 || tt0630518 | tt0086748 | 1 | 1 || tt0630519 | tt0086748 | 1 | 3 |+-----------+--------------+--------------+---------------+ but tt0086748 is not present on title.basics.tsv while the TV show exists on IMdb: https://www.imdb.com/title/tt0086748/ What is even more surprising, is that for some cases the webpage of the parent TV show doesn't exists on the file nor on the website: +-----------+--------------+--------------+---------------+| tconst | parentTconst | seasonNumber | episodeNumber |+-----------+--------------+--------------+---------------+| tt7153894 | tt4824012 | 6 | 21 |+-----------+--------------+--------------+---------------+ tt4824012 is also absent from tile.basics.tsv and it returns a dead page but tt6781204 shows the episode with a different ID for the serie (tt2912216 instead of tt4824012) And for a more extreme case: +-----------+--------------+--------------+---------------+| tconst | parentTconst | seasonNumber | episodeNumber |+-----------+--------------+--------------+---------------+| tt4839832 | tt4820982 | 1 | 17 |+-----------+--------------+--------------+---------------+ here, both IDs (episode and parent serie) are absent in the titles.basics and the website tt4820982 -> 404 Not Found tt4839832 -> 404 Not Found In conclusion, from those 3 files: - title.akas.tsv - title.episodes.tsv - title.principals.tsv when we extract respectively the titleId, parentTconst and tconst and check them against the tconst in title.basics.tsv there are 6220 missing entries 1086 seem to be valid pages on IMDb website (although 48 are redirections -> HTTP/1.1 308 Redirect) but 5134 are dead pages (see tt4820982 above) The files title.ratings and title.crew doesn't seem to have the same problem.
Problem
•
Updated
10 months ago
• Edited
186
7
Responses






Bethanny
Employee
•
5.6K Messages
•
58.9K Points
3 years ago
0
Adren
23 Messages
•
450 Points
3 years ago
(edited)
0
Bethanny
Employee
•
5.6K Messages
•
58.9K Points
2 years ago
2
0
Adren
23 Messages
•
450 Points
2 years ago
(edited)
0
0
Adren
23 Messages
•
450 Points
2 years ago
0
Adren
23 Messages
•
450 Points
1 year ago
1
0
Michelle
Employee
•
18.2K Messages
•
320.5K Points
1 year ago
3
0