Common Crawl offers free, open access to massive web crawl data for research, analysis, and AI development. Perfect for exploring large-scale web content.
Access massive open web crawl data for free
Common Crawl is a nonprofit organization that provides a huge, openly accessible archive of web crawl data. You can explore and download datasets covering billions of web pages, making it a valuable resource for researchers, developers, and anyone interested in large-scale web analysis.
Whether you're building AI models, conducting academic research, or analyzing trends across the internet, Common Crawl gives you the tools and information you need. The site is designed to be approachable, offering guides, examples, and community resources to help you get started with extracting and analyzing web data. It's a go-to platform if you want to tap into the open web for your projects.
Discover websites similar to Commoncrawl.org based on shared categories, topics, and features.
Explore global development data, statistics, and analysis from the World Bank. Access open datasets and tools to support research and informed decisions.
CAIDA offers network research, curated datasets, and tools for scientists and academics studying internet infrastructure and data analysis.
Access global census and survey data for research, comparison, and analysis—IPUMS offers free, integrated datasets and tools for studying social change.
Explore evolutionary gene relationships across genomes with OMA, offering orthology predictions, data downloads, and analysis tools for researchers.
A collaborative research site sharing genetic study data and findings on coronary artery disease and heart attack risk. Resources for scientists and clinicians.
Explore an extensive archive of documents, analysis, and research related to major assassinations and their historical context in the United States.
Browse and download a wide range of World Bank development data, including microdata, finance, and energy datasets, all in one easy-to-access catalog.
Explore historical records on philanthropy at the Rockefeller Archive Center, a research hub offering broad access to preserved archival collections worldwide.
CAS offers scientific research platforms and data solutions to help researchers accelerate discoveries, manage information, and drive innovation across fields.
DBpedia lets you explore and use structured data from Wikipedia, offering tools, datasets, and knowledge graphs for research, analysis, and development.
Browse and share humanitarian crisis data from around the world to support relief efforts, with thousands of datasets from trusted organizations.
Explore and visualize open data from multiple sources on topics like health, economy, and environment. Find trends, charts, and insights in one place.
Genboree offers tools and databases for biomedical researchers to manage, analyze, and share genomics and biological data in a collaborative environment.
HYCOM offers ocean modeling tools, real-time data, and resources for researchers and institutions in the field of ocean science and prediction.
Synapse is a collaborative platform where scientists share, analyze, and access biomedical research data to advance open and reproducible science.
COCO offers a large, labeled image dataset for computer vision research, including object detection, segmentation, and captioning tasks. Free to access.
Access global economic and development data, analysis, and tools from the World Bank to explore trends and support research or policy decisions.
Access and share genomic data on viruses like influenza and COVID-19. GISAID supports global research and public health collaboration.
openICPSR lets you share and access behavioral health and social science research data for free, supporting open science and public research access.
PhysioNet offers free access to complex physiologic signal data and tools, supporting research and collaboration in biomedical and health science fields.
Russian-language site offering research, comparisons, and analysis of data compression algorithms, tools, and codecs for audio, video, and images.
Explore declassified documents, research, and publications on U.S. foreign policy and national security from this nonprofit archive and advocacy center.
Browse and access over 170,000 official U.S. Geological Survey publications, covering 150 years of scientific research and government reports.
Search and explore millions of historical newspapers from the 1700s to 2000s for research, genealogy, and discovering stories from the past.
Browse and search a large archive of geology articles, maps, and publications from the AAPG. Access full-text resources and featured digital publications.
Access and explore comprehensive datasets on terrestrial biogeochemistry and ecological dynamics, supporting research and education in Earth science.
Explore official U.S. population and economic data, interactive maps, and research tools from the Census Bureau for insights about people and communities.
Search patents and analyze intellectual property with a free online database offering powerful research tools and data analytics for innovations worldwide.
Academic Torrents helps researchers share and download huge datasets quickly and securely through a distributed file sharing platform.
Explore original research, theses, and academic papers from the University of Edinburgh in this free, searchable digital archive.
Access open, high-quality global data on income and wealth inequality, compare countries with interactive tools, and explore research from top experts.
Explore scientific research with Dimensions AI—find grants, publications, datasets, clinical trials, patents, and policy documents all in one place.
Access a vast archive of public opinion survey data from the US and worldwide to support research, teaching, and understanding of societal trends.
Explore and download hundreds of machine learning datasets or contribute your own to support research and learning worldwide.
Find benchmark datasets, data loaders, and evaluators for graph machine learning research, all designed to work with PyTorch models and tools.
Explore research, software, and resources from the UW Interactive Data Lab, focused on data visualization and interactive analysis tools.
Explore advanced research in computing sciences, data analysis, and mathematical modeling at Berkeley Lab, supporting breakthroughs in science and technology.
AMPLab at UC Berkeley shares research, software, and resources focused on machine learning, cloud computing, and big data analytics innovations.
Mendeley Data is a free, secure online repository for sharing, storing, and citing research data, helping you easily access and collaborate worldwide.
Browse and access a wide range of research datasets published by Elsevier, with search tools, citation options, and detailed data types for your studies.
Access census data, reports, and analysis tools for Missouri and the U.S. Explore population trends, business stats, and more from the Missouri Census Data Center.
YouTube Research offers data and tools for researchers to study YouTube’s impact, helping advance public understanding of the platform and its effects.
Explore space weather models, run simulations, and access tools for ionosphere-thermosphere research at NASA's CCMC platform for the science community.
OBIS is a global, open-access database for marine biodiversity, offering data and resources to support ocean science, conservation, and sustainability.
Find and explore rat genomic, genetic, and disease data, plus analysis tools and resources for researchers in genetics and biomedical science.
Download genome sequences and annotations for humans, mice, and other species from the UCSC Genome Browser—ideal for research and bioinformatics.
DataONE connects you to a vast network of Earth and environmental data, offering tools and training to help researchers access, share, and manage data.
Browse and access hundreds of curated datasets from WRI, supporting research and informed decision-making on environmental and social topics.
Explore global crime, drug, and justice statistics with UNODC's data portal. Access country profiles, microdata, and topic-based reports from the UN.
Huma-Num supports digital research in humanities and social sciences by offering data management tools, resources, and collaborative services in French.
BioCyc offers integrated genome and metabolic pathway data for thousands of organisms, plus bioinformatics tools for research and analysis.
Access global health and population survey data to support research, policy, and program planning in areas like nutrition, mortality, and family planning.
Access cross-national microdata for research and analysis with remote tools from the LIS Data Center in Luxembourg. Ideal for social science studies.
Figshare lets you store, share, and discover research data and files of any format, making research more open and accessible for everyone.
Access, manage, and analyze seismological and earth science data with NSF SAGE Data Services, supporting the global geoscience and research community.
Explore engaging infographics and visualizations that turn complex data and ideas into easy-to-understand graphics across a range of topics.
Access a wide range of statistical software, datasets, and resources for research and learning, hosted by Carnegie Mellon University.
Explore and access genomics data, resources, and tools at the National Genomics Data Center—supporting research in life and health sciences worldwide.
Explore UK public research projects, publications, and funding with Gateway to Research—an easy way to find people, outcomes, and organizations in science.
Explore and download large-scale functional genomics data, browse experiments, and access research tools for studying human and mouse genomics.