data-preprocessing's questions - English 1answer

174 data-preprocessing questions.

I have a dataset row data in the following format: ...

I did the following steps in my modeling using R: 1)applied proProcess() function in CARET package and then encoded the data. 2)Used SMOTE to balance the data 3)Applied Mutlti linear regression for ...

I want to perform a Sentiment-Analyses with SentiWordnet on Tweets. The analyses should be on the document level, i.e. I want to classify each Tweet as positive, negative or neutral. However, the ...

I have some data set and need to use a few classification methods to make prediction. I first need to pre-process the data set. France is administratively divided in regions (13), and regions are ...

Is there a way I can input single words, e.g. names, as input in a feed forward neural network? It has to be a Feed forward NN, so I guess I have to implement some sort of pre-processing, like ...

I am now trying to evaluate my model with cross validation. My dataset contains some numeric and nominal attributes. Here, I carry out the following data preprocessing tasks: A. Normalization: Min-...

So I have a set of Tweets with a few columns such as Date and the Tweet itself and a few more but I want to use 2 columns to build my model(Sentiment & Stock Price) Sentiment analysis is performed ...

Example of data set ...

I have a weather parameter (daily volume of inflow for a river in million cubic meters, MCM) time series data as follows: I want to scale this data and feed it to a ...

I have a unbalanced data set, that for the purpose of training I wish to balance. What is a better practice? First balancing the train data and then feature scaling/normalizing or using the mean and ...

I have a daily observation of call volumes data starting from 28-01-2017 to 31-08-2018 a little over one and half year.On sundays calls volume are less and monday the highest showing weekly pattern. ...

I a newbie to Convolutional Neural Networks and for some dataset I trained my ConvNet model and achieved some accuracy. Now, I increased some filters and one or two ConvNet layers and saw some ...

I have $4$ variables in the data-set, each has more than $50$ levels in them. I want to include all these variables in my predictive model. How should I handle these categorical variables? If I do ...

When I am given a variable, I usually decide whether to take its logarithm based on gut feeling. Usually I base it on its distribution - if it has long tail (like: salaries, GDP, ...) I use logarithms....

Eventhough my question is applicable for all kind of models, I asked it in the scope of SVM for now. Assume I have 3 sentences in my training set and 2 sentence in my Test set. I would like to ...

I have dataset with over 2500 IDs of products. For each ID I need to remove outliers. I removed them by the condition, that everything what lies over 1.5 * IQR should be deleted, but it seems that ...

I used to apply supervised machine learning for maximum few dozen "normal", natural features like human interpretable ones in Boston House Prices table. I usually try to understand each of them, think ...

I was watching the video from this stanford course on convolutional neural nets where the professor says (at 28:59) 'we do zero-mean the pixel values in image but we do not normalize the pixel values ...

I have a precipitation data which is very sharp in some places and in my case not predictable for the LSTM predictive model which I'm currently working on. Here is ...

a quick statistics question. Consider having a conveyor band inside a black box. A test-object is pulled through the black box at different(known) speeds whilst gathering sensor data (temperature ...

I am given an $n \times m$ matrix $\mathbf R$ where $n$ is the number of $users$ and $m$ is the number of $items$ - this matrix is usually known as the rating matrix within recommender systems domain. ...

I am wondering what preprocessing techniques work well for autoencoding audio data? Specifically I have a dataset of ~0.5 second audio samples of people pronouncing digits 0-9 (think an audio version ...

I understand that in standard data cleaning and pre-processing pipelines, we have to make sure that the information from the test set (or what would be the test set after splitting) does not leak into ...

I'm working on a project where I have a dataset for a dynamical system (pendulum) containing a trajectory, energy cost and corresponding control actions (See below). I'm using a generative adversarial ...

I have searched about this question but no answer really made it comprehensive. To my knowledge, linear regression and most clustering algorithms naturally have the assumption that data need to be in ...

I have been reading about the possible tricks to use for improving the performance of Gradient Descent in neural networks. I understand that there are two common methods: 1. Feature Scaling: Scale ...

In this Machine Learning Mastery post we read that in order to predict time series using LSTM network, it is good to make the data stationary first and then scale it to the interval $(-1, 1).$ In ...

My data was shuffled randomly first then I applied a five crop data augmentation. Now my batch went from [8, 3, 256, 256] to ...

I have to write a system that has several data processing steps, each varying in time complexity. I intend to make use of buffer queues and multithreading for each preprocessing step. I have limited ...

I am training a ConvNet to detect different types of stripes in my images. As I am working on astronomical images, my pixel values are flux densities and therefore represent ground truth data. When I ...

It might be a beginner question, but I'm not sure how to normalize my data. Let's suppose I have a NxM matrix with N samples of M dimensions each. If I want to normalize my data I can do it in two ...

So I'm trying to build a neural network to fit an approximating function to a data set. The data set consists of (input,output) paris where the input features are discrete numbers [0,1,2,...,1027]. ...

I am fairly new to machine learning and especially the math (stats) behind it. I have a question asked here I have not only tried the methods suggested in that question, but also with Orange gui ...

I have got hourly temperature data from 2012 to 2016 as follows: ...

I am wondering if it is recommendable to preprocess the target for Survival Analysis, regression. Lets say the target is a time-to-event, for example the duration in days for process tasks. I plot a ...

I have a time series of integer values $x \geqslant 0$. I would like to model it using, say, ARIMA, or Holt-Winters. How do I properly preprocess it for the task? I tried log-transform of $x' = x + ...

I have a dataset:X Y 123 321 42 24 10 01As you can see, Y is just the reverse of the input number, nothing special. The issue ...

So I'm having a tough time trying to understand Vectors coming from a CS background, I am reading some books and reading through websites but it still a bit to a low level for me, I need a bit more ...

I am trying to perform binary prediction in a problem where the measurements come from objects that are spatially ordered as a grid and there is a physical meaning to the neighborhood on the grid. I ...

I have attached the boxplot of a variable called Fare(of a journey). This is a continuous variable which has outliers. According to some articles on outliers, I learned that any data point that is ...

I am working with a large amount of firm data where every variable is highly skewed as there are a large number of extremely small firms and a small number of huge ones. I am interested in defining ...

Is it possible that the accuracy of Naive Bayes remain the same even after applying Standardisation . I have applied 2 Standardisation techniques : Min Max Scaling ( which squishes the range from 0-1 ...

I am working on classificating a high dimensional binary sparse dataset. Is it a good idea measure the distance (via Hemming, Jaccard, etc.) of every negative class observation to every positve one ...

I've studied many questions and answers on the theme of nested cross-validation. I understand why we need it and how I can, after that part, find the optimal hyperparameters and any other things I'm ...

i want to use the baselines code from OpenAI to apply to a power trading setting where I trade energy on a market. My observation space includes several kinds of data, which is why I originally used ...

I have the dataset from here which contains the following features:'Index', 'Arrival_Time', 'Creation_Time', 'x', 'y', 'z', 'User', 'Model', 'Device', 'gt' A ...

I want to impute missing values of a dataset for machine learning (knn imputation). Is it better to scale and center the data before the imputation or afterwards? Since the scaling and centering ...

I am training a neural network for audio classification. My inputs are "1-channel images" of size 60x130x1. Surprisingly, I always get better accuracy when training the model with the original data, ...

I have raw data that has about 20 columns (20 features). Ten of them are continuous data and 10 of them are categorical. Some of the categorical data can have like 50 different values (U.S. States). ...

Bottom line up front: is there any reason not to center and scale continuous variables prior to model fitting for the sake of conducting model comparison? I'm conducting a model comparison on a large ...

Related tags

Hot questions

Language

Popular Tags