269 results for “topic:sparksql”
Compile-time Language Integrated Queries for Scala
Geo Spatial Data Analytics on Spark
Real Time Analytics and Data Pipelines based on Spark Streaming
Process Common Crawl data with Python and Spark
Scala examples for learning to use Spark
An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Geospatial Raster support for Spark DataFrames
Quill for Scala 3
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
PySpark functions and utilities with examples. Assists ETL process of data modeling
A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino
Read SparkSQL parquet file as RDD[Protobuf]
Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on descriptive analytics, visualization, clustering, time series forecasting and anomaly detection.
type-class based data cleansing library for Apache Spark SQL
New generation opensource data stack
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.
Google Spreadsheets datasource for SparkSQL and DataFrames
全套大数据基础学习教程,包含最基础的centos、maven。大数据主要包含hdfs、mr、yarn、hbase、kafka、scala、sparkcore、sparkstreaming、sparksql。教程包含所有的源代码演示以及在线文档说明。
This repository contains Spark, MLlib, PySpark and Dataframes projects
Deriving Spark DataFrame schemas from case classes
demo applications that show how to deploy offline feature engineering solutions to online in one minute with fedb and nativespark
PostgreSQL and GreenPlum Data Source for Apache Spark
Hive-JDBC-Proxy是一个高性能的HiveServer2和Spark ThriftServer的代理服务,具备负载均衡、基于规则转发Hive JDBC Client的请求给到HiveServer2和Spark ThriftServer的能力。
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
Spark 2.x 案例操作:Scala版本与 Java1.8lambda版代码示例。涵盖Spark核心技术操作SparkCore、SparkSql、SparkStreaming。同时提供了Spark高级性能优化、序列化、广播变量、数据倾斜、算子优化、JVM优化、troubleshooting、数据倾斜解决方案。是多年来根据工作积累整理出来!
记录Spark、Flink研究经验