335 results for “topic:hadoop-hdfs”
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
More than 2000+ Data engineer interview questions.
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Data Engineering Project with Hadoop HDFS and Kafka
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Big data projects implemented by Maniram yadav
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
By Smart Shaped s.r.l. (https://www.smartshaped.com/)
HokStack - Run Hadoop Stack on Kubernetes
Ansible Playbook For Setup Hadoop HDFS
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
Open source data infrastructure platform. Designed for developers, built for speed.
λFS: an elastic, high-performance, serverless-function-based metadata service for large-scale distributed file systems (ACM ASPLOS'23)
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
Marathon on yarn
Toy Hadoop cluster combining various SQL-on-Hadoop variants
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
"movies-rating" is a recommendation system project that leverages distributed frameworks. Which includes services such as Hadoop Namenode, Hadoop Datanode, Spark Master, Spark Worker, and Redis.
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
Docker image builds for Hadoop sandbox.
Installation and configuration of Hadoop on Google Colaboratory
Machine Learning for Forest Fire Prediction using Hadoop ecosystems and Spark Tools (Pyspark)