Apache Hive is a distributed data warehouse system for scalable analytics, letting you read, write, and manage big data using SQL on various storage systems.
Analyze massive data sets with SQL
Apache Hive is a powerful open-source system designed for analyzing and managing huge amounts of data. Built on top of Apache Hadoop, it lets you use familiar SQL queries to read, write, and organize data across different storage solutions like S3, HDFS, and more.
With Hive, you can handle petabytes of information efficiently, thanks to its distributed and fault-tolerant architecture. Its central metastore makes it easy to keep track of data and metadata, supporting data-driven decision-making for organizations of any size.
Whether you're building a data lake or need a robust warehouse for business analytics, Hive provides the tools and flexibility you need. The website offers extensive documentation, community resources, and guides to help you get started quickly.
Discover websites similar to Hive.apache.org based on shared categories, topics, and features.
Apache Druid is a high-performance analytics database for fast, real-time querying of streaming and batch data at any scale.
Dask is an open-source Python library that helps you run data analysis and machine learning tasks faster by scaling your existing Python tools.
RDF HDT offers a compact binary format and tools for storing, managing, and sharing RDF data efficiently. Find documentation, downloads, and tech resources.
Apache CouchDB is an open-source database that syncs data across devices, supports big data to mobile, and offers an easy HTTP/JSON API for developers.
MonetDB is a high-performance database system designed for fast analytics and data management using standard SQL. Open source and easy to use.
Apache Cassandra is an open source NoSQL database that helps you manage massive amounts of data reliably and quickly across clouds and on-premises.
Apache Phoenix lets you run fast SQL queries and manage data directly on Hadoop, making it easy to analyze and update big datasets in real time.
DuckDB is a fast, open source SQL database system for analyzing and transforming data directly on your device, with easy installation and rich features.
Apache Flink lets you process and analyze data streams in real time, offering scalable, stateful computations for data-driven applications.
OpenRefine lets you clean, transform, and organize messy data for free. Easily format, enrich, and prepare datasets using this open source tool.
Tidyverse offers a collection of R packages for data science, making data analysis, visualization, and manipulation in R simpler and more consistent.
Apache Pinot is an open source platform for real-time data analytics, letting you quickly analyze and visualize large datasets for instant insights.
Analyze life science data online with a collaborative platform designed for research and community-driven workflows in bioinformatics and genomics.
Apache Pig lets you analyze large data sets using a simple high-level language, making it easier to process and manage big data efficiently.
Apache Arrow offers a universal columnar data format and tools for fast, multi-language data analytics and seamless data interchange between systems.
Apache Zeppelin is a web-based notebook for interactive data analytics, letting you create collaborative documents using SQL, Scala, Python, R, and more.
Explore pandas, the open source Python library for fast, flexible data analysis and manipulation. Get started with guides, docs, and a helpful community.
Apache Spark is an open-source engine for large-scale data analytics, supporting data engineering, science, and machine learning in multiple languages.
Open-source tool for analyzing and visualizing data across sciences and engineering, supporting everything from large-scale simulations to desktop use.
Galaxy Europe is an open-source platform for accessible, FAIR data analysis with tools, resources, and a strong community for scientific collaboration.
ClickHouse is a fast, open-source database for real-time analytics and reporting using SQL, ideal for business intelligence, ML, and big data tasks.
StarRocks is an open-source database for fast, real-time analytics using SQL, designed to help businesses handle large-scale data easily and efficiently.
Datomic lets you build flexible, distributed systems that store and query all your data history, on your own infrastructure or in the cloud.
InfluxDB is a platform for managing and analyzing time series data, offering fast, flexible database solutions for cloud, on-premises, or edge environments.
ScyllaDB offers a fast, scalable NoSQL database for data-intensive apps, delivering high performance and low latency for businesses and developers.
Chroma is an open-source AI application database with built-in tools, offering a cloud platform for managing and deploying AI data and models.
Weaviate is an AI-native database platform that helps developers build smarter, faster search and data apps with advanced vector and keyword capabilities.
WEKA offers a high-performance data platform for storing, processing, and managing data across cloud and on-premises, powering AI and machine learning workloads.
CARTO lets you analyze, visualize, and build apps with spatial data on the cloud, making advanced location analytics easy for businesses and developers.
Virtuoso lets you connect, manage, and analyze data from multiple sources using open standards, with flexible AI-powered tools for individuals and businesses.
Atlas Device SDK offers tools for building offline-first, cloud-synced apps on mobile, web, desktop, and IoT, with easy data access and sync features.
MariaDB offers enterprise-grade open source database solutions and services for scalable, secure, and reliable data management in modern applications.
Neo4j is a graph database platform for connecting and analyzing complex data, enabling advanced queries, analytics, and AI-powered business solutions.
ObjectBox is a fast, lightweight database that keeps your data synced and secure on devices, even offline. Ideal for mobile, IoT, and edge AI projects.
Cloudera offers a secure hybrid data platform for managing, analyzing, and moving data across clouds and on-premises, with built-in AI and analytics tools.
Percona offers open source database software, support, and managed services for MySQL, PostgreSQL, MongoDB, and MariaDB to help you run databases smoothly.
MySQL offers a powerful cloud-based database platform with AI and analytics features for managing, analyzing, and deploying data-driven applications.
TiDB by PingCAP is an open-source, MySQL-compatible distributed SQL database that offers scalable, fully-managed cloud solutions for modern workloads.
ArcGIS Hub helps you organize people, data, and tools in one cloud platform to support initiatives, share insights, and achieve community goals.
Qdrant is an open-source vector database and search engine that helps you build fast, scalable AI-powered search and recommendation systems.