December 1, 2020
File Reader
Architecture AI Project
File Reader Data Array Creation
The file reader for this projects starts with the CSVReader class. This class is responsible for reading data in from .csv files generated by Revit to pass it on to the underlying A* pathing node grid for use by the agents when pathing. These .csv files have 2D coordinate values to locate the data, the actual data value, and a reference ID. The coordinate data and the data values themselves are separated and saved into many data objects within Unity (in a class named DataWithPosition). This prepares the data to be used to pass on to the A* pathing node grid.
While the .csv data is generally consistently spaced based on coordinates, it can have many gaps as it does not assign a value somewhere where there is a gap in the model. This results in data that is not perfectly rectangular. To make the process of tying this data array in with the pathing grid more consistent, I manually make a rectangular data array to hold all of the data and fill in the missing coordinates with additional data objects that have some default value (usually “0”). This helps fill the data into the A* pathing grid as the grid is created because it can simply go through the data list one at a time instead of doing any searching of any kind.
Applying the Read Data to the A* Pathing Grid
Data Assumptions:
- In order by coordinates, starting with the x-coordinates
- There is a general consistent distance between the data points
After reading in all the data and creating the foundational array of data, it can then be applied to the node grid. The first basic case of this reads through the data array as A* pathing grid is created and assigns values as the grid is made. This however only makes sense when the resolution of the data being read in and the A* pathing grid are similar. If there is a substantially higher density of data points, or a higher density of node grids, this is no longer the case and we need other ways to apply the data.
Data Resolution Cases
This leads to three cases (Cases with respect to data resolution):
- Data Resolution = A* Pathing Grid Resolution
- Data Resolution > A* Pathing Grid Resolution
- Data Resolution < A* Pathing Grid Resolution
(The 3 cases with respect to distance):
- Data Distance = A* Pathing Grid Node Diameter
- Data Distance < A* Pathing Grid Node Diameter
- Data Distance > A* Pathing Grid Node Diameter
The resolution here is the inverse of the distance between the data points (i.e. the distance between the data point coordinates, and the node diameters). This also means these cases can be checked based on the distance between data points as well, but the condition is reversed (except for the case where they are equal, where it stays the same).
Determining which case is present is important to determine how to apply the data to the A* pathing nodes. I determined the best way to deal with these cases in a simple manner was the following:
Dealing with the 3 Cases of Data Resolutions
Dealing with the 3 Cases:
- Similar Number of Data Points and A* Nodes: Apply data to the A* pathing nodes 1-to-1
- Substantially More Data Points than A* Nodes: Average the data value of all the data points covered by each A* node for each A* node
- Substantially Less Data Points than A* Nodes: Apply the same data value from each data point to all the A* nodes in the same area it covers
These other cases require additional information to accurately apply these various techniques. Adding an additional data assumption that when there is a difference in the distance between data points and the A* node diameter that this difference such that the distances are divisible by one another leads to a useful term that can consistently help with both of these cases.
If (distance between data is divisible by A* node diameter OR A* node diameter is divisible by distance between data)
To keep it somewhat consistent, I created a term called the “Distance Ratio (D)”, which is the node diameter divided by the distance between the data point coordinates. This term can be used as an important data dimension when dealing with array indices for the different data application cases. Since the key is using this as a dimensional property, it needs to be a whole number, which is not the case when the node diameter is less than the distance between data coordinates. In this case, the inverse of “D” can be used to find the dimensional term.
Distance Ratio (D) = Node Diameter / Distance Between Data
Dimensional Ratio (D*)
if (D >= 1) D* = D
if (D < 1) D* = 1 / D
Using Dimensional Ratio for Cases 2 and 3
Case 1 does not need the dimensional ratio whatsoever, but both other cases can use it.
Case 2
For case 2 there are more data points per area than A* nodes, so the A* nodes must average the value of all the data points they cover. These data points can be found using the dimensional ratio. Each A* node covers a number of data points, n, where (n = D* ^ 2). This information makes it much easier and more consistent to find the proper data to average while setting the values during the creation of the A* node grid.
Case 3
For case 3, there are less data points than there are A* nodes in each given area. Since this case just applies the same value from a given data point to several A* nodes, it is just about figuring out all the A* nodes each data point should pass its data to. This can also be done by expanding the initial data array out with a bunch of identical data points so that it can then follow the 1-to-1 passing on approach of case 1.
To do this, the dimensional ratio, D*, is used again. The initial data array created from the reading of the .csv file can be modified and expanded. A new 2D data array is created with each dimension (height and width) multiplied by D*. Then each data point passes on all of its information to a square of data points in the new array, where the number of new data points created, n, is such that (n = D* ^ 2).
File Reader Data Resolution Difference Summary
This allows us to deal with various types and sizes of data while using various resolutions of A* pathing node systems somewhat consistently. This can be beneficial when passing in many types of data that could have different data resolutions and you want to keep the A* node grid resolution consistent. This also just helps the system perform properly when many types of data are passed through it over the course of several tests.
Unfortunately the distance between the data points is not determined automatically at this time, so it must be input manually. I initially thought of just finding the difference between the first couple coordinates to determine the distance, but this would fail edge cases where some of the data right at the beginning is missing. The better solution could be to randomly select several pairs of coordinates throughout the data and find the difference, then use the mode of that data as the determined data distance. This would work in most cases, and could then just have a manual override for fringe cases.
Case 3 in particular is definitely not the most efficient approach, but it was a quicker and simpler implementation for now. Ideally it would not need to create another expanded data grid as a reference, and the A* node grid could use some method to know which ones should read from the same data point.
Next Step
This process could benefit from some of the possible updates mentioned in the “File Reader Data Resolution Difference Summary” section, but most of those should be unnecessary for now. The next steps will look to other parts of the system, which does include some more file reading that some of this process could benefit. We need to read in more Revit data to assign data to the model objects themselves.
via Blogger http://stevelilleyschool.blogspot.com/2020/12/architecture-ai-pathing-project-file.html