google-hadoop's questions - English 1answer

0 google-hadoop questions.

The google cloud provides connectors for working with Hadoop.(https://cloud.google.com/hadoop/google-cloud-storage-connector) Using the connector, I receive data from hdfs to google cloud storage ex)...

My issue may be a result of my misunderstanding with global consistency in google storage, but since I have not experienced this issue until just recently (mid November) and now it seems easily ...

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder) When running the job locally on my Mac machine, I am getting the ...

I am working on a scala Spark job which needs to use java library (youtube/vitess) which is dependent upon newer versions of GRPC (1.01), Guava (19.0), and Protobuf (3.0.0) than currently provided on ...

From my machine, I've configured the hadoop core-site.xml to recognize the gs:// scheme and added gcs-connector-1.2.8.jar as a Hadoop lib. I can run hadoop fs -ls gs://mybucket/ and get the expected ...

I have written a spark job on my local machine which reads the file from google cloud storage using google hadoop connector like gs://storage.googleapis.com/ as mentioned in https://cloud.google.com/...

In normal operation one can provide encryption keys to the google storage api to encrypt a given bucket/blob: https://cloud.google.com/compute/docs/disks/customer-supplied-encryption Is this possible ...

We use the Google BigQuery Spark Connector to import data stored in Parquet files into BigQuery. Using custom tooling we generated a schema file needed by BigQuery and reference that in our import ...

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/...

I'm getting started with running a spark cluster on google compute engine backed by google cloud storage that is deployed with bdutil (on the GoogleCloudPlatform github), I am doing this as follows: ....

When using BigQuery Connector to read data from BigQuery I found that it copies all data first to Google Cloud Storage. Then reads this data in parallel into Spark, but when reading big table it takes ...

I am trying to migrate existing data (JSON) in my Hadoop cluster to Google Cloud Storage. I have explored GSUtil and it seems that it is the recommended option to move big data sets to GCS. It seems ...

I have used google-cloud-storage-connector for Hadoop and able to run mapreduce job that takes input from my local HDFS (Hadoop running in my local machine) and places the result in Google Cloud ...

I'm using Hadoop with HDFS 2.7.1.2.4 and Pig 0.15.0.2.4 (Hortonworks HDP 2.4) and trying to use Google Cloud Storage Connector for Spark and Hadoop (bigdata-interop on GitHub). It works correctly when ...

I was wondering if anybody could help me with this issue in deploying a spark cluster using the bdutil tool. When the total number of cores increase (>= 1024), it failed all the time with the ...

I use the following Hive Query: hive> INSERT OVERWRITE LOCAL DIRECTORY "gs:// Google/Storage/Directory/Path/Name" row format delimited fields terminated by ',' select * from .; I am getting the ...

After I create a google-cloud-based hadoop-enable cluster, I want to change the default bucket to a different one, how can I do that? I can't find the answer in google cloud doscumentation. Thanks!

I have a reference to the GoogleHadoopFileSystemBase in my java code, and I’m trying to call setTimes(Path p, long mtime, long atime) to modify the timestamp of a file. It doesn’t seem to be working ...

I have a large dataset stored into a BigQuery table and I would like to load it into a pypark RDD for ETL data processing. I realized that BigQuery supports the Hadoop Input / Output format https://...

I am trying to run Hadoop Job on Google Compute engine against our compressed data, which is sitting on Google Cloud Storage. While trying to read the data through SequenceFileInputFormat, I get the ...

How can I use the Google Cloud Platform free trial to test a Hadoop cluster? What are the most important things I should keep in mind if I try this? Will I be charged during the free Google Cloud ...

I'm using Spark on a Google Compute Engine cluster with the Google Cloud Storage connector (instead of HDFS, as recommended), and get a lot of "rate limit" errors, as follows: java.io.IOException: ...

Is there a direct way to address the following error or overall a better way to use Hive to get the join that I need? Output to a stored table isn't a requirement as I can be content with an INSERT ...

I am using Google Compute Engine to run Mapreduce jobs on Hadoop (pretty much all default configs). While running the job I get a tracking URL of the form http://PROJECT_NAME:8088/proxy/...

I have a Spark Cluster deployed using bdutil for Google Cloud. I installed a GUI on my driver instance to be able to run IntelliJ from it, so that I can try to run my Spark processes in interactive ...

With bdutil, the latest version of tarball I can find is on spark 1.3.1: gs://spark-dist/spark-1.3.1-bin-hadoop2.6.tgz There are a few new DataFrame features in Spark 1.4 that I want to use. Any ...

The original question was trying to deploy spark 1.4 on Google Cloud. After downloaded and set SPARK_HADOOP2_TARBALL_URI='gs://my_bucket/my-images/spark-1.4.1-bin-hadoop2.6.tgz' deployment with ...

After I deleted a Google Cloud Storage directory through the Google Cloud Console, (the directory was generated by early Spark (ver 1.3.1) job), when re-run the job, it always fail and seemed the ...

I am running an Hadoop Cluster on Google Cloud Platform, using Google Cloud Storage as backend for persistent data. I am able to ssh to the master node from a remote machine and run hadoop fs commands....

I deployed Spark (1.3.1) with yarn-client on Hadoop (2.6) cluster using bdutil, by default, the instances are created with Ephemeral external ips, and so far spark works fine. With some security ...

I am trying to use a file from Google Cloud Storage via FileInputFormat as input for a MapReduce job. The file is in Avro format. As a simple test, I deployed a small Hadoop2 cluster with the bdutil ...

I have just installed Google Cloud platform for a free trial. In order to run MapReduce tasks with DataStore, the docs says to run ./bdutil --upload_files "samples/*" run_command ./test-mr-datastore....

With SparkR, I'm trying for a PoC to collect an RDD that I created from text files which contains around 4M lines. My Spark cluster is running in Google Cloud, is bdutil deployed and is composed with ...

Is it possible to deploy several Hadoop clusters in one Google Cloud project?

I am using Google Cloud Storage for Hadoop 2.3.0 using GCS connector. I have added GCS.jar to lib directory of my hadoop installation an added path to GCS connector in hadoop-env.sh file as: export ...

We have a Mapreduce job created to inject data into BigQuery. There is not much of filtering function in our job so we'd like to make it map-only job to make it faster and more efficient. However, ...

I've been using bdutil for a year now, with hadoop and spark and this is quite perfect! Now I've got a little problem trying to get SparkR to work with Google Storage as HDFS. Here is my setup : - ...

I am starting a Google Compute Engine VM from an App Engine application. The start-up scripts for the GCE VM run python scripts which, in turn, make os.system calls to bdutil commands, e.g., os....

When I attempt to launch a Hadoop cluster with the bdutil command, using one of the following: bdutil -b a_hadoop_test -n 1 -P mycluster -e hadoop2_env.sh -i ubuntu-1204 deploy OR bdutil -b ...

I'm trying to install a custom Hadoop implementation (>2.0) on Google Compute Engine using the command line option. The modified parameters of my bdutil_env.sh file are as follows: GCE_IMAGE='ubuntu-...

It is possible, to connect my Hadoop cluster to multiple Google Cloud Projects at once ? I can easly use any Google Storage bucket in single Google Project via Google Cloud Storage Connector as ...

I am trying to access Google Storage bucket from a Hadoop cluster deployed in Google Cloud using the bdutil script. It fails if bucket access is read-only. What am I doing: Deploy a cluster with ...

I am testing the scaling of some MapReduce jobs on Google Compute Engine's Hadoop cluster, and finding some unexpected results. In short, I've been told this behavior may be explained by a having a ...

We are using bdutil 1.1 to deploy a Spark (1.2.0) cluster. However, we are having an issue when we launch our spark script: py4j.protocol.Py4JJavaError: An error occurred while calling o70....

I'm running a standalone application using Apache Spark and when I load all my data to a RDD as a textfile I got the following error: 15/02/27 20:34:40 ERROR Utils: Uncaught exception in thread ...

I have been through most of Questions surrounding this issue on this site however nothing seems to have helped me. Basically what I am trying to do is instantiate a Hadoop instance on my VM via the ...

I have begun testing The Google Cloud Storage connector for Hadoop. I am finding it incredibly slow for hive queries run against it. It seems a single client must scan the entire file system before ...

We are running hadoop on GCE with HDFS default file system, and data input/output from/to GCS. Hadoop version: 1.2.1 Connector version: com.google.cloud.bigdataoss:gcs-connector:1.3.0-hadoop1 ...

I got connectors from https://cloud.google.com/hadoop/datastore-connector But I'm trying to add the datastore-connector (and bigquery-connector too) as a dependency in the pom... I don't know if it ...

I'm piping unstructured event data through Hadoop and want to land it in BigQuery. I have a schema that includes most of the fields, but there are some fields I want to ignore or don't know about. ...

Related tags

Hot questions

Language

Popular Tags