DataPerf is a platform for benchmarking and evaluating data quality and performance in machine learning, helping improve AI systems with better datasets.
Benchmark data quality for better AI results
DataPerf is a platform designed to help you measure and compare the quality and performance of datasets used in machine learning projects. Whether you’re building AI models or working with large data collections, DataPerf provides tools and benchmarks that make it easy to evaluate how well your data supports your goals.
The site offers a straightforward way to understand, test, and improve datasets, so you can make informed decisions about which data to use in your machine learning workflows. It’s especially useful for researchers, data scientists, and anyone interested in creating more reliable and effective AI models.
By using DataPerf, you can participate in community-driven benchmarks, see how your datasets stack up, and access resources to boost the impact of your data. It’s a helpful destination for anyone looking to raise the standard of data in AI development.
Discover websites similar to Dataperf.org. Section 1 prioritizes sites with matching domain extensions and/or languages. Section 2 offers worldwide alternatives.
OpenML lets you share datasets, algorithms, and experiments to collaborate and advance machine learning research and analysis together.
Apache Spark is an open-source engine for large-scale data analytics, supporting data engineering, science, and machine learning in multiple languages.
Flyte helps you build, deploy, and manage scalable data and machine learning workflows, making it easy to unify your data, ML, and analytics projects.
OpenSearch is an open source search and analytics suite for finding, visualizing, and analyzing data, with AI and machine learning tools included.
Stan is an open-source platform for Bayesian data analysis and statistical modeling, offering tools, documentation, and a supportive user community.
Apache Mahout is a distributed linear algebra and machine learning platform for building custom algorithms, designed for data scientists and developers.
DVC is an open-source tool for version control in data science and machine learning, helping you track data, models, and experiments like with Git.
Voyant Tools is a web-based platform for analyzing and visualizing texts, making it easy to explore word patterns and trends in documents.
Explore tidyr, a tool for reshaping and tidying messy data in R. Learn how to organize, pivot, and clean datasets for easier analysis and visualization.
Create custom data visualizations in JavaScript with D3. Flexible tools for interactive charts and graphics, perfect for developers and data storytellers.
Explore interactive visuals to analyze global health data, track disease trends, and compare risk factors across countries and time periods.
Access, analyze, and visualize global development data with interactive charts, tables, and maps from the World Bank's extensive databases.
The HDF Group offers tools, libraries, and support for managing, sharing, and preserving scientific and engineering data across platforms and environments.
Create elegant data visualizations in R using ggplot2, a flexible system based on the Grammar of Graphics for mapping data to visual elements.
Movebank lets you explore, manage, and share animal tracking data for research and collaboration in wildlife movement and ecology studies worldwide.
CodaLab Worksheets lets you run, share, and reproduce data experiments and research code online, making collaboration and transparency simple.
OPeNDAP offers free, open-source tools to help researchers and data providers access, share, and manage distributed scientific datasets easily.
CAS offers scientific research platforms and data solutions to help researchers accelerate discoveries, manage information, and drive innovation across fields.
Apache Iceberg is an open table format that helps you manage large analytic datasets reliably across popular big data engines like Spark and Hive.
Apache Hudi is an open source data lake platform that lets you efficiently manage, update, and analyze large-scale streaming and batch data on the cloud.
Open source software for epidemiology offering tools to manage, analyze, and share research data for scientific studies and health research projects.
Akvo helps NGOs and governments use data to improve water, agriculture, and climate programs for more effective development work.
HoloViews helps you easily visualize and analyze your data in Python with minimal code, making it simple to create interactive plots and dashboards.
Datashader is a Python tool that quickly creates interactive, large-scale data visualizations. Learn how to install and use it with guides and examples.
Sage Bionetworks helps researchers share, analyze, and reuse biomedical data, accelerating scientific discovery with AI-powered tools and a collaborative platform.
AidData provides data, research, and analysis tools to help policymakers and organizations improve the impact of sustainable development investments worldwide.
OpenMetadata is an open source platform for discovering, managing, and governing data, offering tools for data quality, lineage, and collaboration.
OpenRefine lets you clean, transform, and organize messy data for free. Easily format, enrich, and prepare datasets using this open source tool.
Tidyverse offers a collection of R packages for data science, making data analysis, visualization, and manipulation in R simpler and more consistent.
MonetDB is a high-performance database system designed for fast analytics and data management using standard SQL. Open source and easy to use.
Peroptyx helps you improve AI and machine learning models by providing accurate, real-world data annotation and evaluation services for location-based apps.
Dataloop helps you manage, label, and automate unstructured data, making it easy to build and deploy AI solutions from start to finish.
DataChain offers tools for data management, preprocessing, experiment tracking, and ML model versioning to streamline large-scale AI data workflows.
Explore and visualize high-dimensional data or machine learning embeddings interactively in your browser with TensorFlow’s easy-to-use projector tool.
Find benchmark datasets, data loaders, and evaluators for graph machine learning research, all designed to work with PyTorch models and tools.
Explore benchmark datasets and results for computer vision and machine learning, with a focus on German traffic sign recognition and classification.
Domino Data Lab is an enterprise AI platform that helps data science teams accelerate research, deploy models, and collaborate using trusted tools.
RAPIDS offers open source GPU-accelerated data science libraries, helping you analyze and process data faster with familiar Python APIs.
Element 84 delivers geospatial data processing and software solutions to help organizations analyze, visualize, and use earth data for positive impact.
OpenText Analytics Database offers fast data analysis, machine learning, and AI-powered insights for businesses, with flexible deployment options.