ABSTRACT
We present the IR Anthology, a corpus of information retrieval publications accessible via a metadata browser and a full-text search engine. Following the example of the well-known ACL Anthology, the IR Anthology serves as a hub for researchers interested in information retrieval. Our search engine ChatNoir indexes the publications' full texts, enabling a focused search and linking users to the respective publisher's site for personal access. Listing more than 40,000 publications at the time of writing, the IR Anthology can be freely accessed at https://IR.webis.de.
Supplemental Material
- 2008-2021. GROBID. https://github.com/kermitt2/grobid.arXiv:1:dir:dab86b296e3c3216e2241968f0d63b68e8209d3cGoogle Scholar
- Uchenna Akujuobi and Xiangliang Zhang. 2017. Delve: A Dataset-Driven Scholarly Search and Analysis System. SIGKDD Explor., Vol. 19, 2 (2017), 36--46. https://doi.org/10.1145/3166054.3166059Google ScholarDigital Library
- William Y. Arms. 2000. Digital Libraries. MIT Press. http://www.cs.cornell.edu/wya/DigLib/Google Scholar
- Ricardo Baeza-Yates. 2017. Semantic Query Understanding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 1357. https://doi.org/10.1145/3077136.3096472Google ScholarDigital Library
- Dominik Benz, Andreas Hotho, Robert J"a schke, Beate Krause, Folke Mitzlaff, Christoph Schmitz, and Gerd Stumme. 2010. The social bookmark and publication management system bibsonomy - A platform for evaluating and demonstrating Web 2.0 research. VLDB J., Vol. 19, 6 (2010), 849--875. https://doi.org/10.1007/s00778-010-0208--4Google ScholarCross Ref
- Janek Bevendorff, Benno Stein, Matthias Hagen, and Martin Potthast. 2018. Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl. In Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018) (Lecture Notes in Computer Science), Leif Azzopardi, Allan Hanbury, Gabriella Pasi, and Benjamin Piwowarski (Eds.). Springer, Berlin Heidelberg New York.Google Scholar
- Caroline Birkle, David A. Pendlebury, Joshua Schnell, and Jonathan Adams. 2020. Web of Science as a data source for research on scientific and scholarly activity. Quant. Sci. Stud., Vol. 1, 1 (2020), 363--376. https://doi.org/10.1162/qss_a_00018Google ScholarCross Ref
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res., Vol. 3 (2003), 993--1022. http://jmlr.org/papers/v3/blei03a.htmlGoogle ScholarCross Ref
- John Bohannon. 2016. A Computer Program Just Ranked the Most Influential Brain Scientists of the Modern Era. Science (Nov. 2016). https://doi.org/10/gh77gwGoogle Scholar
- Marcel Bollmann and Desmond Elliott. 2020. On Forgetting to Cite Older Papers: An Analysis of the ACL Anthology. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7819--7827. https://doi.org/10.18653/v1/2020.acl-main.699Google ScholarCross Ref
- Dan Brickley, Matthew Burgess, and Natasha F. Noy. 2019. Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13--17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 1365--1375. https://doi.org/10.1145/3308558.3313685Google ScholarDigital Library
- Declan Butler. 2012. Scientists: your number is up. Nat., Vol. 485, 7400 (2012), 564. https://doi.org/10.1038/485564aGoogle ScholarCross Ref
- Harry B. Coonce. 2004. Computer science and the mathematics genealogy project. SIGACT News, Vol. 35, 4 (2004), 117. https://doi.org/10.1145/1054916.1054918Google ScholarDigital Library
- Tim Fischer, Steffen Remus, and Chris Biemann. 2019. LT Expertfinder: An Evaluation Framework for Expert Finding Methods. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Demonstrations, Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, 98--104. https://doi.org/10.18653/v1/n19--4017Google ScholarCross Ref
- Eugene Garfield. 1964. “Science Citation Index ”textemdashA New Dimension in Indexing. Science, Vol. 144, 3619 (May 1964), 649--654. https://doi.org/10/d9qt5mGoogle Scholar
- Dario Garigliotti. 2018. A Semantic Search Approach to Task-Completion Engines. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2018, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 1457. https://doi.org/10.1145/3209978.3210224Google ScholarDigital Library
- Daniel Gildea, Min-Yen Kan, Nitin Madnani, Christoph Teichmann, and Mart'in Villalba. 2018. The ACL Anthology: Current State and Future Directions. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS). Association for Computational Linguistics, Melbourne, Australia, 23--28. https://doi.org/10.18653/v1/W18--2504Google ScholarCross Ref
- C. Lee Giles, Kurt D. Bollacker, and Steve Lawrence. 1998. CiteSeer: An Automatic Citation Indexing System. In Proceedings of the 3rd ACM International Conference on Digital Libraries, June 23--26, 1998, Pittsburgh, PA, USA. ACM, 89--98. https://doi.org/10.1145/276675.276685Google ScholarDigital Library
- Jim Giles. 2005. Science in the Web Age : Start Your Engines. Nature, Vol. 438, 7068 (Dec. 2005), 554--555. https://doi.org/10/dcz432Google Scholar
- Michael Gusenbauer. 2019. Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases. Scientometrics, Vol. 118, 1 (2019), 177--214. https://doi.org/10.1007/s11192-018--2958--5Google ScholarCross Ref
- Matthias Hagen, Anna Beyer, Tim Gollub, Kristof Komlossy, and Benno Stein. 2016. Supporting Scholarly Search with Keyqueries. In Advances in Information Retrieval. 38th European Conference on IR Research (ECIR 2016) (Lecture Notes in Computer Science, Vol. 9626), Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hauff, and Gianmaria Silvello (Eds.). Springer, Berlin Heidelberg New York, 507--520. https://doi.org/10.1007/978--3--319--30671--1_37Google ScholarCross Ref
- Joseph Y. Halpern. 2000. CoRR: a computing research repository. ACM J. Comput. Documentation, Vol. 24, 2 (2000), 41--48. https://doi.org/10.1145/337271.337274Google ScholarDigital Library
- Anne-Wil Harzing. 2019. Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science? Scientometrics, Vol. 120, 1 (2019), 341--349. https://doi.org/10.1007/s11192-019-03114-yGoogle ScholarDigital Library
- Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2020. ANTIQUE: A Non-factoid Question Answering Benchmark. In Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 12036), Joemon M. Jose, Emine Yilmaz, Jo a o Magalh a es, Pablo Castells, Nicola Ferro, Má rio J. Silva, and Flá vio Martins (Eds.). Springer, 166--173. https://doi.org/10.1007/978--3-030--45442--5_21Google ScholarCross Ref
- Ginny Hendricks, Dominika Tkaczyk, Jennifer Lin, and Patricia Feeney. 2020. Crossref: The sustainable source of community-owned scholarly metadata. Quant. Sci. Stud., Vol. 1, 1 (2020), 414--427. https://doi.org/10.1162/qss_a_00022Google ScholarCross Ref
- Victor Henning and Jan Reichelt. 2008. Mendeley - A Last.Fm for Research?. In Fourth International Conference on E-Science, e-Science 2008, 7--12 December 2008, Indianapolis, IN, USA. IEEE Computer Society, 327--328. https://doi.org/10/cb9w22Google ScholarDigital Library
- Djoerd Hiemstra, Claudia Hauff, Franciska de Jong, and Wessel Kraaij. 2007. SIGIR's 30th anniversary: an analysis of trends in IR research and the topology of its community. SIGIR Forum, Vol. 41, 2 (2007), 18--24. https://doi.org/10.1145/1328964.1328966Google ScholarDigital Library
- Djoerd Hiemstra, Marie-Francine Moens, Raffaele Perego, and Fabrizio Sebastiani. 2021. Transitioning the Information Retrieval Literature to a Fully Open Access Model. SIGIR Forum, Vol. 54, 1, Article 13 (Feb. 2021), 10 pages. https://doi.org/10.1145/3451964.3451977Google ScholarDigital Library
- Allyn Jackson. 2007. A labor of love: The mathematics genealogy project. Notices of the AMS, Vol. 54, 8 (2007), 1002--1003.Google Scholar
- Vidit Jain and Esther Galbrun. 2013. Topical organization of user comments and application to content recommendation. In 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13--17, 2013, Companion Volume, Leslie Carr, Alberto H. F. Laender, Bernadette Farias Ló scio, Irwin King, Marcus Fontoura, Denny Vrandecic, Lora Aroyo, José Palazzo M. de Oliveira, Fernanda Lima, and Erik Wilde (Eds.). International World Wide Web Conferences Steering Committee / ACM, 61--62. https://doi.org/10.1145/2487788.2487812Google ScholarDigital Library
- Jimmy, Guido Zuccon, Bevan Koopman, and Gianluca Demartini. 2019. Health Cards for Consumer Health Search. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21--25, 2019, Benjamin Piwowarski, Max Chevalier, É ric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer (Eds.). ACM, 35--44. https://doi.org/10.1145/3331184.3331194Google ScholarDigital Library
- Katy Jordan. 2019. From Social Networks to Publishing Platforms: A Review of the History and Scholarship of Academic Social Network Sites. Frontiers Digit. Humanit., Vol. 6 (2019), 5. https://doi.org/10.3389/fdigh.2019.00005Google ScholarCross Ref
- Jungeun Kim, Minsoo Choy, Daehoon Kim, and U Kang. 2014. Link prediction based on generalized cluster information. In 23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7--11, 2014, Companion Volume, Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM, 317--318. https://doi.org/10.1145/2567948.2578807Google ScholarDigital Library
- Howard Leventhal, John Weinman, Elaine A Leventhal, and L Alison Phillips. 2008. Health Psychology: The Search for Pathways Between Behavior and Health. Annu. Rev. Psychol., Vol. 59 (2008), 477--505.Google Scholar
- Michael Ley. 2009. DBLP - Some Lessons Learned. Proc. VLDB Endow., Vol. 2, 2 (2009), 1493--1500. https://doi.org/10.14778/1687553.1687577Google ScholarDigital Library
- D. A. Lindberg. 2000 Sep-Oct. Internet Access to the National Library of Medicine. Effective clinical practice: ECP, Vol. 3, 5 (2000 Sep-Oct), 256--260.Google Scholar
- Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld. 2020. S2ORC: The Semantic Scholar Open Research Corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4969--4983. https://doi.org/10.18653/v1/2020.acl-main.447Google ScholarCross Ref
- Saif M. Mohammad. 2019. The State of NLP Literature: A Diachronic Analysis of the ACL Anthology. CoRR, Vol. abs/1911.03562 (2019). arxiv: 1911.03562 http://arxiv.org/abs/1911.03562Google Scholar
- Saif M. Mohammad. 2020 a. NLP Scholar: A Dataset for Examining the State of NLP Research. In Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11--16, 2020, Nicoletta Calzolari, Fré dé ric Bé chet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hé lè ne Mazo, Asunció n Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association, 868--877. https://www.aclweb.org/anthology/2020.lrec-1.109/Google Scholar
- Saif M. Mohammad. 2020 b. NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2020, Online, July 5--10, 2020, Asli Celikyilmaz and Tsung-Hsien Wen (Eds.). Association for Computational Linguistics, 232--255. https://doi.org/10.18653/v1/2020.acl-demos.27Google ScholarCross Ref
- Greg Morrison. 2019. Explorations in Bibliography : Zotero Goes Public. Atla Summary of Proceedings (2019), 218--221. https://doi.org/10/gh77x8Google Scholar
- Colm Mulcahy. 2017. The Mathematics Genealogy Project Comes of Age at Twenty-one. Notices of the AMS, Vol. 64, 5 (2017), 466--470.Google ScholarCross Ref
- Bryan Newbold. 2021. Search Scholarly Materials Preserved in the Internet Archive. https://blog.archive.org/2021/03/09/search-scholarly-materials-preserved-in-the-internet-archive/Google Scholar
- Andreas Niekler, Arnim Bleier, Christian Kahmann, Lisa Posch, Gregor Wiedemann, Kenan Erdogan, Gerhard Heyer, and Markus Strohmaier. 2018. ILCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7--12, 2018, Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Kô iti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hé lè ne Mazo, Asunció n Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018/summaries/734.htmlGoogle Scholar
- Kenneth A Norman, Ehren L Newman, and Greg Detre. 2007. A Neural Network Model of Retrieval-Induced Forgetting. Psychological Review, Vol. 114, 4 (2007), 887.Google ScholarCross Ref
- Kevin O'Brien. 2019. Resource Review: ResearchGate. Journal of the Medical Library Association, Vol. 107, 2 (April 2019), 284--285. https://doi.org/10/gh7rp4Google Scholar
- Monarch Parmar, Naman Jain, Pranjali Jain, P. Jayakrishna Sahit, Soham Pachpande, Shruti Singh, and Mayank Singh. 2020. NLPExplorer: Exploring the Universe of NLP Papers. In Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 12036), Joemon M. Jose, Emine Yilmaz, Jo a o Magalh a es, Pablo Castells, Nicola Ferro, Má rio J. Silva, and Flá vio Martins (Eds.). Springer, 476--480. https://doi.org/10.1007/978--3-030--45442--5_61Google ScholarCross Ref
- Martin Potthast, Tim Gollub, Matti Wiegmann, and Benno Stein. 2019. TIRA Integrated Research Architecture. In Information Retrieval Evaluation in a Changing World, Nicola Ferro and Carol Peters (Eds.). Springer, Berlin Heidelberg New York. https://doi.org/10.1007/978--3-030--22948--1_5Google ScholarCross Ref
- Jinfeng Rao, Ferhan Tü re, Xing Niu, and Jimmy Lin. 2017. Mining the Temporal Statistics of Query Terms for Searching Social Media Posts. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, Amsterdam, The Netherlands, October 1--4, 2017, Jaap Kamps, Evangelos Kanoulas, Maarten de Rijke, Hui Fang, and Emine Yilmaz (Eds.). ACM, 133--140. https://doi.org/10.1145/3121050.3121052Google ScholarDigital Library
- Stephen E. Robertson, Hugo Zaragoza, and Michael J. Taylor. 2004. Simple BM25 extension to multiple weighted fields. In Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, November 8--13, 2004, David A. Grossman, Luis Gravano, ChengXiang Zhai, Otthein Herzog, and David A. Evans (Eds.). ACM, 42--49. https://doi.org/10.1145/1031171.1031181Google ScholarDigital Library
- Ulrich Schafer, Bernd Kiefer, Christian Spurk, Jö rg Steffen, and Rui Wang. 2011. The ACL Anthology Searchbench. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19--24 June, 2011, Portland, Oregon, USA - System Demonstrations. The Association for Computer Linguistics, 7--13. https://www.aclweb.org/anthology/P11--4002/Google Scholar
- Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18--22, 2015 - Companion Volume, Aldo Gangemi, Stefano Leonardi, and Alessandro Panconesi (Eds.). ACM, 243--246. https://doi.org/10.1145/2740908.2742839Google ScholarDigital Library
- Inien Syu, Sheau-Dong Lang, and Narsingh Deo. 1996. Incorporating Latent Semantic Indexing into a Neural Network Model for Information Retrieval. In CIKM '96, Proceedings of the Fifth International Conference on Information and Knowledge Management, November 12 - 16, 1996, Rockville, Maryland, USA. ACM, 145--153. https://doi.org/10.1145/238355.238475Google ScholarDigital Library
- Gary Taubes. 1993. Publication by Electronic Mail Takes Physics by Storm. Science, Vol. 259, 5099 (Feb. 1993), 1246--1248. https://doi.org/10/bwqfwvGoogle ScholarCross Ref
- Ariena HC van Bruggen and Alexander M. Semenov. 2000. In search of biological indicators for soil health and disease suppression. Applied Soil Ecology, Vol. 15, 1 (2000), 13--24. https://doi.org/10.1016/S0929--1393(00)00068--8 Special issue: Managing the Biotic component of Soil Quality.Google ScholarCross Ref
- Huaiyu Wan, Yutao Zhang, Jing Zhang, and Jie Tang. 2019. AMiner: Search and Mining of Academic Social Networks. Data Intell., Vol. 1, 1 (2019), 58--76. https://doi.org/10.1162/dint_a_00006Google ScholarCross Ref
- Hao Wu and Hui Fang. 2014. Document Prioritization for Scalable Query Processing. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3--7, 2014, Jianzhong Li, Xiaoyang Sean Wang, Minos N. Garofalakis, Ian Soboroff, Torsten Suel, and Min Wang (Eds.). ACM, 1609--1618. https://doi.org/10.1145/2661829.2661914Google ScholarDigital Library
- Jian Wu, Kunho Kim, and C. Lee Giles. 2019. CiteSeerX: 20 years of service to scholarly big data. In Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse, AIDR 2019, Pittsburgh, PA, USA, May 13--15, 2019, Huajin Wang and Keith Webster (Eds.). ACM, 1:1--1:4. https://doi.org/10.1145/3359115.3359119Google ScholarDigital Library
- Holt Zaugg, Richard E. West, Isaku Tateishi, and Daniel L. Randall. 2010. Mendeley: Creating Communities of Scholarly Inquiry through Research Collaboration. TechTrends: Linking Research and Practice to Improve Learning, Vol. 55, 1 (July 2010), 32--36. https://doi.org/10/d4vbh8Google Scholar
- Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018. Towards Conversational Search and Recommendation: System Ask, User Respond. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22--26, 2018, Alfredo Cuzzocrea, James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, Mohammed J. Zaki, K. Selcc uk Candan, Alexandros Labrinidis, Assaf Schuster, and Haixun Wang (Eds.). ACM, 177--186. https://doi.org/10.1145/3269206.3271776Google ScholarDigital Library
- Tiancheng Zhao and Kyusong Lee. 2020. Talk to Papers: Bringing Neural Question Answering to Academic Search. (2020), 30--36. https://doi.org/10.18653/v1/2020.acl-demos.5Google ScholarCross Ref
- Michel Zitt, Alain Lelu, Martine Cadot, and Guillaume Cabanac. 2019. Bibliometric Delineation of Scientific Fields. Springer, 25--68. https://doi.org/10.1007/978--3-030-02511--3_2Google ScholarCross Ref
Index Terms
- The Information Retrieval Anthology
Recommendations
The information retrieval anthology 2021: inaugural status report and challenges ahead
The Information Retrieval Anthology, IR Anthology for short, is an endeavor to create a comprehensive collection of metadata and full texts of IR-related publications. We report on its first release, the use cases it can serve, as well as the challenges ...
Towards Better Understanding of Academic Search
JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital LibrariesAcademics have relied heavily on search engines to identify and locate research manuscripts that are related to their research areas. Many of the early information retrieval sys- tems and technologies were developed while catering for li- brarians to ...
hp-frac: An index to determine Awarded Researchers
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023In order to advance academic research, it is important to assess and evaluate the academic influence of researchers and the findings they produce. Citation metrics are universally used methods to evaluate researchers. Amongst the several variations of ...
Comments