问题
I have a long list of variables in a dataset which contains multiple time
channels with different sampling rates, such as time_1
, time_2
, TIME
, Time
, etc. There are also multiple other variables that are dependent on either of these times.
I'd like to list all possible channels that contain 'time' (case-insensitive partial string search within Workspace) and search & match which variable belongs to each item of this time
list, based on the size of the variables and then group them in a structure with the values of the variables for later analysis.
For example:
Name Size Bytes Class
ENGSPD_1 181289x1 1450312 double
Eng_Spd 12500x1 100000 double
Speed 41273x1 330184 double
TIME 41273x1 330184 double
Time 12500x1 100000 double
engine_speed_2 1406x1 11248 double
time_1 181289x1 1450312 double
time_2 1406x1 11248 double
In this case, I have 4 time channels with different names & sizes and 4 speed channels which belong to each of these time channels.
whos
function is case-sensitive and it will only return the name of the variable, rather than the values of the variable.
回答1:
As a preamble I'm going to echo my comment from above and earlier comments from folks here and on your other similar questions:
Please stop trying to manipulate your data this way.
It may have made sense at the beginning but, given the questions you've asked on SO to date, this isn't the first time you've encountered issues trying to pull everything together and if you continue this way it's not going to be the last. This approach is highly error prone, unreliable, and unpredictable. Every step of the process requires you to make assumptions about your data that cannot be guaranteed (size of data matching, variables being present and named predictably, etc.). Rather than trying to come up with creative ways to hack together the data, start over and output your data predictably from the beginning. It may take some time but I guarantee it's going to save time in the future and it will make sense to whoever looks at this in 6 months trying to figure out what is going on.
For example, there is absolutely no significant effort needed to output your variables as:
outputstructure.EngineID.time = sometimeseries;
outputstructure.EngineID.speed = somedata;
Where EngineID can be any valid variable name. This is simple and it links your data together permanently and robustly.
That being said, the following will bring a marginal amount of sanity to your data set:
% Build up a totally amorphous data set
ENGSPD_1 = rand(10, 1);
Eng_Spd = rand(20, 1);
Speed = rand(30, 1);
TIME = rand(30, 1);
Time = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1 = rand(10, 1);
time_2 = rand(5, 1);
% Identify time and speed variable using regular expressions
% Assumes time variables contain 'time' (case insensitive)
% Assumes speed variables contain 'spd', 'sped', or 'speed' (case insensitive)
timevars = whos('-regexp', '[T|t][I|i][M|m][E|e]');
speedvars = whos('-regexp', '[S|s][P|p][E|e]{0,2}[D|d]');
% Pair timeseries and data arrays together. Data is only coupled if
% the number of rows in the timeseries is exactly the same as the
% number of rows in the data array.
timesizes = vertcat(speedvars(:).size); % Concatenate timeseries sizes
speedsizes = vertcat(timevars(:).size); % Concatenate speed array sizes
% Find intersection and their locations in the structures returned by whos
% By using intersect we only get the data that is matched
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));
% Preallocate structure
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];
% Unavoidable (without saving/loading data) eval loop :|
for ii = 1:ndata
groupeddata(ii).time = eval('timevars(timeidx(ii)).name');
groupeddata(ii).speed = eval('speedvars(speedidx(ii)).name');
end
A non-eval
method, by request:
ENGSPD_1 = rand(10, 1);
Eng_Spd = rand(20, 1);
Speed = rand(30, 1);
TIME = rand(30, 1);
Time = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1 = rand(10, 1);
time_2 = rand(5, 1);
save('tmp.mat')
oldworkspace = load('tmp.mat');
varnames = fieldnames(oldworkspace);
timevars = regexpi(varnames, '.*time.*', 'match', 'once');
timevars(cellfun('isempty', timevars)) = [];
speedvars = regexpi(varnames, '.*spe{0,2}d.*', 'match', 'once');
speedvars(cellfun('isempty', speedvars)) = [];
timesizes = zeros(length(timevars), 2);
for ii = 1:length(timevars)
timesizes(ii, :) = size(oldworkspace.(timevars{ii}));
end
speedsizes = zeros(length(speedvars), 2);
for ii = 1:length(speedvars)
speedsizes(ii, :) = size(oldworkspace.(speedvars{ii}));
end
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];
for ii = 1:ndata
groupeddata(ii).time = oldworkspace.(timevars{timeidx(ii)});
groupeddata(ii).speed = oldworkspace.(speedvars{speedidx(ii)});
end
See this gist for timing.
来源:https://stackoverflow.com/questions/40245465/group-variables-based-on-lengths-of-specific-arrays