Apache Phoenix Joins Cloudera Labs
We are happy to announce the inclusion of Apache Phoenix in Cloudera Labs. Apache Phoenix is an efficient SQL skin for Apache HBase that has created a lot of buzz. Many companies are successfully using...
View ArticleHow-to: Get Started with CDH on OpenStack with Sahara
The recent OpenStack Kilo release adds many features to the Sahara project, which provides a simple means of provisioning an Apache Hadoop (or Spark) cluster on top of OpenStack. This how-to, from...
View ArticleDeploying Apache Kafka: A Practical FAQ
This post contains answers to common questions about deploying and configuring Apache Kafka as part of a Cloudera-powered enterprise data hub. Cloudera added support for Apache Kafka, the open standard...
View ArticleWhat’s New in Cloudera Director 1.5?
Cloudera Director 1.5 is now available; this post describes what’s inside, including a new open source plugin interface. Cloudera Director is the manifestation of Cloudera’s commitment to providing a...
View ArticleUsing Apache Spark for Massively Parallel NLP at TripAdvisor
Thanks to Jeff Palmucci, Director of Machine Learning at TripAdvisor, for permission to republish the following (originally appeared in TripAdvisor’s Engineering/Operations blog). Here at TripAdvisor...
View ArticleYCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs
YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs. Many factors go into...
View ArticleUntangling Apache Hadoop YARN, Part 1
In this multipart series, fully explore the tangled ball of thread that is YARN. YARN (Yet Another Resource Negotiator) is the resource management layer for the Apache Hadoop ecosystem. YARN has been...
View ArticleMeet Cloudera’s Apache Spark Committers
The super-active Apache Spark community is exerting a strong gravitational pull within the Apache Hadoop ecosystem. I recently had that opportunity to ask Cloudera’s Apache Spark committers (Sean Owen,...
View ArticleHow-to: Prepare Unstructured Data in Impala for Analysis
Learn how to build an Impala table around data that comes from non-Impala, or even non-SQL, sources. As data pipelines start to include more aspects such as NoSQL or loosely specified schemas, you...
View ArticleHow-to: Use Apache Solr to Query Indexed Data for Analytics
Bet you didn’t know this: In some cases, Solr offers lightning-fast response times for business-style queries. If you were to ask well informed technical people about use cases for Solr, the most...
View ArticleNew in Cloudera Enterprise 5.5: Analytics for Metadata Management
Starting in Cloudera Enterprise 5.5, Cloudera Navigator offers interactive visual analytics that help answer important questions about the data that’s in your CDH clusters. The new analytics system in...
View ArticleProgress Report: Hive-on-Spark Nears Production Readiness
Contributors from Intel, Cloudera, and the rest of the community have been making strong progress on the Hive-on-Spark initiative. This post provides an update. Since its inception about one year ago,...
View ArticleHow-to: Get Started with CDH on OpenStack with Sahara
The recent OpenStack Kilo release adds many features to the Sahara project, which provides a simple means of provisioning an Apache Hadoop (or Spark) cluster on top of OpenStack. This how-to, from...
View ArticleApache Phoenix Joins Cloudera Labs
We are happy to announce the inclusion of Apache Phoenix in Cloudera Labs. [Update: A new package for Apache Phoenix 4.5.2 on CDH 5.4.x was released on Nov. 19, 2015.] Apache Phoenix is an efficient...
View ArticleThe New Hadoop Application Architectures Book is Here!
There’s an important new addition coming to the Apache Hadoop book ecosystem. It’s now in early release! We are very happy to announce that the new Apache Hadoop book we have been writing for O’Reilly...
View ArticleNew in CDH 5.1: Document-level Security for Cloudera Search
Cloudera Search now supports fine-grain access control via document-level security provided by Apache Sentry. In my previous blog post, you learned about index-level security in Apache Sentry...
View ArticleRunning CDH 5 on GlusterFS 3.3
The following post was written by Jay Vyas (@jayunit100) and originally published in the Gluster.org Community. I have recently spent some time getting Cloudera’s CDH 5 distribution of Apache Hadoop to...
View ArticleApache Kafka for Beginners
When used in the right way and for the right use case, Kafka has unique attributes that make it a highly attractive option for data integration. Apache Kafka is creating a lot of buzz these days. While...
View ArticleSecrets of Cloudera Support: Using OpenStack to Shorten Time-to-Resolution
Automating the creation of short-lived clusters for testing purposes frees our support engineers to spend more time on customer issues. The first step for any support engineer is often to replicate the...
View ArticleNoSQL in a Hadoop World
The number of powerful data query tools in the Apache Hadoop ecosystem can be confusing, but understanding a few simple things about your needs usually makes the choice easy. Ah, the good old days. I...
View Article
More Pages to Explore .....