How to analyse a sequence of vehicle states?

  • #1
serbring
271
2
Hi all,

I have to analyse a dataset containing real-world vehicle trajectories and in particular:
1. The trajectories were classified into states in the function of certain vehicle parameters and location (urban roads, country roads, etc.) and each state is characterised by an integer number (i.e., 1, 2, 3, etc.) permitting me to obtain also a signal of categorical variable called "state".
2. Portions of vehicle trajectories were grouped when a certain sequence of the states occurred which is equivalent to a trip starting from a parking lot, then travelling through an urban road, then to a highway stretch, etc. Thus, a sequence might start with a sequence of 1s, followed by a sequence containing 100 times 2s, followed by a sequence of 3, and so on. This task was carried out by converting the sequence of numbers into a string and by setting a proper regular expression. It is a quick and dirty approach and it took a lot of time to tune the parameters of the regular expression and it is far from being perfect.

This is because there might be misclassification in the vehicle states (step 1) and in real-world conditions, operations are not always carried out in the same way (meaning that there might be some extra states in between for example because the driver took the wrong road and the duration of this extra state may change in duration). So, I need to find a better method. However, I have no idea which approach I can adopt. Do you have any suggestion to give me?

Thanks!
 
Technology news on Phys.org
  • #2
It might help to tell us what the purpose is. It's hard to suggest an approach when the desired result is unknown.
 
  • #3
As part of data analysis, its good to review the data and correct fields or discard questionable data items.

Each dataset has its own rules for validity and you may need to establish them. As an example, you might discard rows that have missing data or if possible fill in the missing info with nominal values.

Say you had a dataset for trains, planes and automobiles, you could validate any speed fields by applying some speed range criteria to identify rows where speeds are too high or too low for the type of vehicle being recorded.
 
  • Like
Likes FactChecker
  • #4
serbring said:
It is a quick and dirty approach and it took a lot of time
This statement seems to contradict itself

serbring said:
Do you have any suggestion to give me?
Don't use a regular expression.

Implement a parser in an object-oriented language using the state pattern.
 
Last edited:
  • #5
serbring said:
This task was carried out by converting the sequence of numbers into a string and by setting a proper regular expression.
This seems odd. It would seem both easier and faster just to check the sequences of numbers directly.
 
  • #6
I routinely used regular expressions to clean up, simplify, or filter inputs from imperfect sources before passing the inputs on to other algorithms. I considered it to be much easier and more reliable than the alternatives.
 
Last edited:
  • #7
thanks for your answers. I will answer to all your comments.
FactChecker said:
It might help to tell us what the purpose is. It's hard to suggest an approach when the desired result is unknown.

jedishrfu said:
As part of data analysis, its good to review the data and correct fields or discard questionable data items.

Each dataset has its own rules for validity and you may need to establish them. As an example, you might discard rows that have missing data or if possible fill in the missing info with nominal values.

Say you had a dataset for trains, planes and automobiles, you could validate any speed fields by applying some speed range criteria to identify rows where speeds are too high or too low for the type of vehicle being recorded.
The data were checked for their validity (e.g., no logged vehicle position for example, because the GPS did not fix the position) but of course the misclassification may occur and this can never be solved when dealing with a large dataset of real-world data. More specifically on the task, I am doing trip analysis where a trip can be a sequence of vehicle states (i.e., the vehicle start from a point (i.e., the house of the owner but a different place may occur), then travelling on extra-urban road, then a highway, then urban road, and finally back along to the same path. However, the driver may choose to travel through a different road to travel back home or he desided a detour and so on. This change may lead to different sequences of vehicle states complicating the detection of trips. Once trips are detect, I will calculate features of trips and I will analyse them. Hopefully, it is clearer now.


pbuk said:
This statement seems to contradict itself


Don't use a regular expression.

Implement a parser in an object-oriented language using the state pattern.
As reported by @FactChecker, for a quick and flexible pattern, a regular expression is a very quick approach even if it is not the best in terms of efficiency. However, when considering a large collection of real-world data, things get more complicated and therefore, now, I am searching a more advanced approach. I did not know state pattern and it might be interesting. I will dig into it but, as far as I understood, it is mostly a convenient way of using multiple if-statements so I am not fully sure if it might help in this case.

PeterDonis said:
This seems odd. It would seem both easier and faster just to check the sequences of numbers directly.
If you have any specific method to suggest me it might be great. Thanks.
 
  • #8
serbring said:
More specifically on the task, I am doing trip analysis where a trip can be a sequence of vehicle states (i.e., the vehicle start from a point (i.e., the house of the owner but a different place may occur), then travelling on extra-urban road, then a highway, then urban road, and finally back along to the same path. However, the driver may choose to travel through a different road to travel back home or he desided a detour and so on. This change may lead to different sequences of vehicle states complicating the detection of trips. Once trips are detect, I will calculate features of trips and I will analyse them. Hopefully, it is clearer now.
Are you saying that you are trying to detect round trips by the pattern of speeds, without having position data?
serbring said:
As reported by @FactChecker, for a quick and flexible pattern, a regular expression is a very quick approach even if it is not the best in terms of efficiency.
If the application fits, I think it would be hard to beat the efficiency of the built-in regular expressions. You don't say what language you are using. Python is very popular now, but it can be astonishingly slow. I used Perl a lot for such tasks and was not bothered by any lack of speed. Your job and amount of data may just require long runs. If the execution time is very long (several hours or days), you should look for ways to periodically save things so that you can monitor progress and restart the program where it left off. Things like power "glitches", unplanned system resets, unexpected data inputs that are being handled wrong, etc. can force you to restart the program from the beginning or at some intermediate stage.
 

Similar threads

  • Other Physics Topics
Replies
1
Views
1K
Replies
39
Views
1K
Replies
6
Views
1K
  • STEM Academic Advising
Replies
8
Views
1K
  • STEM Academic Advising
Replies
13
Views
2K
  • STEM Educators and Teaching
2
Replies
37
Views
7K
Replies
4
Views
2K
  • Beyond the Standard Models
Replies
10
Views
2K
  • Beyond the Standard Models
3
Replies
85
Views
32K
Back
Top