Language:
English
Pages:
1 online resource (1 video file (41 min.))
,
sound, color.
Edition:
[First edition].
DDC:
006.3/5
Keywords:
Natural language processing (Computer science)
;
Machine learning
;
Natural Language Processing
;
Traitement automatique des langues naturelles
;
Apprentissage automatique
;
Instructional films
;
Nonfiction films
;
Internet videos
;
Films de formation
;
Films autres que de fiction
;
Vidéos sur Internet
;
Webcast
Abstract:
Hobson and his colleagues try to figure out how to train word embeddings from scratch using the WikiText2 dataset in PyTorch. The WikiText2 dataset contains redacted words, but they were unable to find the "labels" that reveal the words masked with the symbol ``. If you try to use the `Wikipedia` package to retrieve Wikipedia pages directly, you may hit the `suggest` bug. There are more than 100 unanswered issues on the project, and the maintainer has pushed any changes for many years. The Tangible AI fork on GitLab fixes this search suggestion bug so we could easily crawl Wikipedia. Unfortunately, the Wikipedia-API package is not very useful for searching and crawling Wikipedia to retrieve text.
Note:
Online resource; title from title details screen (O'Reilly, viewed April 26, 2022)
Permalink