"topic:hadoop-hdfs" — Search

335 results for “topic:hadoop-hdfs”

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.

Go31.0k2.8kUpdated just now

blob-storagecloud-drivedistributed-file-systemdistributed-storagedistributed-systemserasure-codingfusehadoop-hdfshdfskubernetesobject-storageposixreplications3s3-storageseaweedfstiered-file-system

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

Python1.5k521Updated 2 months ago

airflowavroawsazurecassandradata-engineeringdata-structuresflinkflumehadoophadoop-hdfshbasehiveimpalainterviewinterview-questionskafkanifisparksql

Morphl-AI/MorphL-Community-Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Python25929Updated 6 years ago

artificial-intelligencecassandraconversion-rate-optimizationdata-driven-designfront-end-developmenthadoop-hdfskubernetesmachine-learningmorphl-platformpipelineproduct-developmentpysparkuser-experience

linkedin/dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Java13433Updated 2 years ago

hadoophadoop-filesystemhadoop-frameworkhadoop-hdfshdfshdfs-dfsperformance-analysisperformance-metricsperformance-testperformance-testingscalescale-uptestingtesting-tools

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Python12330Updated 2 years ago

datadata-engineerdata-engineeringdata-engineering-pipelinedockerdocker-composehadoophadoop-filesystemhadoop-hdfshdfshdfs-clienthdfs-dfskafkakafka-consumerkafka-producerkafka-uikafkauipiplinepythonpython-hdfs-client

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Jupyter Notebook8526Updated 11 hours ago

apache-sedonaapache-sparkbig-databigdatabigtopdockergutenberg-ebookshadoophadoop-clusterhadoop-hdfshadoop-mapreducejupyter-notebookmapreducemapreduce-bashmrjobpysparksparkspark-sqltestdfsio

IBM/sparksql-for-hbase

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

6922Updated 6 months ago

apache-sparkhadoop-hdfshbaseibmcodenosqlsparksql

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Python5641Updated 2 years ago

apache-sparkbig-datadata-pipelinedatalakeetletl-componentsetl-frameworketl-pipelinehadoophadoop-hdfshadoop-mapreducepysparkpythonpython3sparkspark-sqlxmlxml-parsing

maniram-yadav/Big_DataHadoop_Projects

Big data projects implemented by Maniram yadav

PigLatin5035Updated 7 years ago

big-data-analyticsbig-data-projectsflumehadoophadoop-hdfshadoop-mapreducehdfshivemapreducepigpig-latinsparksqoop

jarlor/TravelWebsite_BigDataAnalysisArchived

旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)

Java371Updated 2 years ago

bigdatacourseworkhadoop-hdfsjavamapreduce

Smart-Shaped/chaM3Leon

By Smart Shaped s.r.l. (https://www.smartshaped.com/)

Java302Updated 4 months ago

apache-kafkaapache-sedonaapache-sparkbig-datadata-engineeringdata-pipelinedata-preprocessingdata-processinghadoop-hdfsjavalambda-architecturemachine-learningmavenmlopsspring-boot

hokstack/hok-helm

HokStack - Run Hadoop Stack on Kubernetes

Shell259Updated 5 years ago

automationbigdatadataopsdevops-toolshadoophadoop-clusterhadoop-hdfshdpkubernetesoperator

SepehrImanian/ansible-hadoop-hdfs

Ansible Playbook For Setup Hadoop HDFS

Jinja232Updated 3 years ago

ansibleansible-playbookhadoophadoop-hdfshdfs

hadoop-sandbox/hadoop-sandbox

A fully-functional Hadoop Yarn cluster as docker-compose deployment.

Shell235Updated 2 weeks ago

dockerdocker-composehadoophadoop-clusterhadoop-hdfshadoop-yarn

torqbit/databoxArchived

Open source data infrastructure platform. Designed for developers, built for speed.

TypeScript224Updated 3 years ago

data-opshadoophadoop-hdfskafkaspark

ds2-lab/LambdaFS

λFS: an elastic, high-performance, serverless-function-based metadata service for large-scale distributed file systems (ACM ASPLOS'23)

Java142Updated 11 months ago

dfsdistributed-file-systemfaasfilesystemhadoop-hdfsmetadatanuclioopenwhiskserverlessserverless-computing

lucas91batista/twitter-hashtag-graph

Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton

JavaScript140Updated 3 years ago

apache-flumehadoophadoop-hdfshadoop-mapreduceneo4jtwitter

PChou/marayarn

Marathon on yarn

Java147Updated 2 years ago

hadoop-hdfsmarathonyarn

waltherg/distributable_docker_sql_on_hadoop

Toy Hadoop cluster combining various SQL-on-Hadoop variants

Shell134Updated 8 years ago

hadoophadoop-clusterhadoop-dockerhadoop-filesystemhadoop-frameworkhadoop-hdfshadoop-mapreducehbasehbase-clienthivehueimpalaprestosparksparksqltezyarnyarn-hadoop-clusterzookeeperzookeeper-deployment

alagrede/HdfsClient

A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.

Java1310Updated 8 years ago

hadoophadoop-hdfskerberoskerberos-authentication

Mahmoud-nfz/football-big-data

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.

TypeScript113Updated 5 months ago

hadoophadoop-hdfskafkanextjsrethinkdbsearch-enginesparkspark-streamingt3-stack

Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords

A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.

Java100Updated 4 years ago

codehadoophadoop-hdfshadoop-mapreducejavamapreduceparallel-computingparallel-programmingprojectsentiment-analysisubuntu

mgarralda/hadoop-spark-cluster

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

Jupyter Notebook93Updated 5 months ago

hadoop-hdfssparkspark-clusterspark-hadoopspark-hadoop-dockerspark-yarn-docker

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Python93Updated 1 year ago

apache-kafkaapache-nifiapache-sparkbig-databig-data-analyticscassandracassandra-driverdata-engineeringdata-sciencegrafanahadoophadoop-hdfshivepowerbispark-rddspark-sqlspark-streaming

Amir2244/movies-rating

"movies-rating" is a recommendation system project that leverages distributed frameworks. Which includes services such as Hadoop Namenode, Hadoop Datanode, Spark Master, Spark Worker, and Redis.

Java90Updated 7 months ago

batch-processingbig-datadata-analysisdatabasedistributed-computingflink-stream-processinghadoop-hdfsmongodbreal-timesparkvector-databaseweb

leibniz21c/mammoth

Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.

Dart85Updated 4 years ago

adobe-xddartdockerdocker-composefastapiflutter-apphadoop-hdfsinfluxdblog-analyzermapreducemongodbmsapython3yarn

jodth07/hadoop-installation

Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04

Shell814Updated 4 years ago

flumehadoophadoop-ecosystemhadoop-hdfshadoop-installationinstallationkafkakafka-installationsbtsbt-installationscalascala-installationsparkspark-installation

hadoop-sandbox/hadoop-sandbox-images

Docker image builds for Hadoop sandbox.