Common Crawl - Open Repository of Web Crawl Data
Common Crawl offers free, open access to massive web crawl data for research, analysis, and AI development. Perfect for exploring large-scale web content.
Common Crawl is a nonprofit organization that provides a huge, openly accessible archive of web crawl data. You can explore and download datasets covering billions of web pages, making it a valuable resource for researchers, developers, and anyone interested in large-scale web analysis.
Whether you're building AI models, conducting academic research, or analyzing trends across the internet, Common Crawl gives you the tools and information you need. The site is designed to be approachable, offering guides, examples, and community resources to help you get started with extracting and analyzing web data. It's a go-to platform if you want to tap into the open web for your projects.
Discover websites similar to Commoncrawl.org. Optimized for ultra-fast loading.
Explore declassified documents, research, and publications on U.S. foreign policy and national security from this nonprofit archive and advocacy center.
Search and explore millions of historical newspapers from the 1700s to 2000s for research, genealogy, and discovering stories from the past.
Browse and search a large archive of geology articles, maps, and publications from the AAPG. Access full-text resources and featured digital publications.
Explore the history of British cartooning with access to over 200,000 editorial, political, and comic cartoons at the British Cartoon Archive.
Access a vast archive of public opinion polls and survey data from the United States and worldwide for research, teaching, or informed decision-making.
openICPSR lets you share and access behavioral health and social science research data for free, supporting open science and public research access.
Mendeley Data is a free, secure online repository for sharing, storing, and citing research data, helping you easily access and collaborate worldwide.
Browse and access a wide range of research datasets published by Elsevier, with search tools, citation options, and detailed data types for your studies.
Download genomic sequences and annotations for various species from the UCSC Genome Browser. Access data for research via web, API, or FTP.
Access and share genomic data on viruses like influenza and COVID-19. GISAID supports global research and public health collaboration.
PhysioNet offers free access to complex physiologic signal data and tools, supporting research and collaboration in biomedical and health science fields.
Access NASA's Planetary Data System for planetary science research, with streamlined tools and easy data downloads for scientists and enthusiasts.
DataONE connects you to a vast network of Earth and environmental data, offering tools and training to help researchers access, share, and manage data.
Explore human gene expression and regulation across tissues with open-access data, visualizations, and resources from the Genotype-Tissue Expression (GTEx) project.
YouGov offers real-time market research and audience insights with accurate data to help understand brands and consumer behavior.
Access cross-national microdata for research and analysis with remote tools from the LIS Data Center in Luxembourg. Ideal for social science studies.
Explore UK public research projects, publications, and funding with Gateway to Research—an easy way to find people, outcomes, and organizations in science.
Explore and download large-scale functional genomics data, browse experiments, and access research tools for studying human and mouse genomics.
Access Swiss National Science Foundation research data, statistics, and reports through an organized portal for easy exploration and download.
Explore and access 3D structures of proteins, nucleic acids, and complex assemblies in the global Protein Data Bank research archive.
Explore cancer genomics data, tools, and resources to support cancer research and discovery, provided by the NCI Genomic Data Commons.
Access a global archive of nucleotide sequence data, from raw reads to annotated sequences, for research and scientific collaboration worldwide.
IMGT offers expertly curated immunogenetics databases and online analysis tools for researchers studying antibodies, T cell receptors, and immune responses.
Explore and request access to clinical biospecimens and study datasets for heart, lung, and blood research through this NIH-supported repository.
Explore a curated database of ageing-related genes and longevity research, with tools to search genes, species, and scientific literature on human ageing.
Browse a huge collection of movie posters, from the latest releases to classics dating back to 1912, with annual awards and artist credits included.
Browse archived Los Angeles Times blogs from 2006 to 2013, featuring news, opinion, and commentary on a wide range of topics and issues.
Explore a searchable archive of Jeopardy! games, clues, and players, created by fans for trivia lovers and show enthusiasts.
Browse digital archives of City Pages, preserved by Hennepin County Library and the Minnesota Historical Society for news, stories, and local history.
Browse archived news, sports, and entertainment articles from The Salt Lake Tribune covering Salt Lake City and Utah's history and events.
Find and explore rat genomic, genetic, and disease data, plus analysis tools and resources for researchers in genetics and biomedical science.
Explore, browse, and download network datasets for scientific research and analysis on the UCI Network Data Repository website.
UbuWeb is a free online archive offering avant-garde art, film, and audio, featuring rare works from artists, poets, and filmmakers worldwide.
Browse a vast photo archive of U.S. Navy ships, including submarines, carriers, and more, with detailed images and historical information.
Browse historical news articles, features, and archives from the Honolulu Star-Bulletin. Discover Hawaii's stories by year and topic in one place.
Retro CDN is a central media archive hosting sprites, screenshots, scans, videos, and photography for Retro's projects, with over 37,000 files available.
Explore rare historical photos and the fascinating stories behind them, featuring iconic moments and people from history in an easy-to-browse archive.
Explore a vast archive of articles from "The Iranian," covering news, culture, and stories from 1995 to 2021 about Iran and the Iranian community.
Browse classic Sports Illustrated articles, photos, and features in this curated online archive for sports fans and history enthusiasts.
Browse hundreds of vintage Christmas catalogs and wish books from Sears, JCPenney, and Montgomery Ward, featuring thousands of nostalgic holiday pages.
Discover tools and services similar to commoncrawl.org
Explore related tools and services in these categories