Jekyll2023-09-16T20:54:42+00:00http://olcaysahin.com/feed.xmlDR. OLCAY SAHINPersonal blog for public.Olcay Sahin2020 TRB Annual Meeting2019-12-04T00:00:00+00:002019-12-04T00:00:00+00:00http://olcaysahin.com/TRB_2020<p>I am going to be attending the Transportation Research Board (TRB)’s Annual Meeting in January 2020 at the Walter E. Washington Convention Center, in Washington, D.C. This year I have one accepted paper as the corresponding author. TRB’s Annual Meeting is one of the largest gatherings of transportation professionals and is expected to attract more than 13,000 participants from around the world.</p>
<h2 id="paper">Paper:</h2>
<p><strong>Sahin, O.</strong>, Cetin, M., Ustun, I. (2020). <em>Empty Platform Semi-Trailer classification using side-fire LIDAR data for supporting Freight Analysis and Planning</em></p>
<p><strong>Lectern Session:</strong></p>
<ul>
<li>Innovations in Data Collection, Analysis, and Fusion to Address Persistent Freight Data Gaps</li>
<li>Standing Committee on Freight Transportation Data (ABJ90)</li>
<li>Tuesday, 1-14-2020 – 1:30 PM - 3:15 PM</li>
<li>Convention Center, 144A</li>
</ul>
<h2 id="abstract">ABSTRACT</h2>
<p>Empty truck trips constitute an important aspect of commodity-based freight planning and modelling. But this information is generally not available to State DOTs or Metropolitan Planning Organizations (MPOs) since detecting empty trips is a challenge with traditional vehicle sensors. In this study, we propose a method for detecting empty and loaded platform semi-trailers using data from a multi-array LIDAR sensor. From the LIDAR cloud points, 3D profiles of trucks can be generated, and these profiles allow extracting useful information (e.g. body type, empty and loaded platforms). Since only platform semi-trailers’ load is observable from their 3D profiles, we only consider open platform trailers which constitute 20% of the truck trailer population in the USA. This paper shows how point-cloud data from a 16-beam LIDAR sensor are processed to extract useful information and features to distinguish between empty and loaded platform semi-trailers versus all other major truck body types (e.g. dry van, container, tank, automobile transport, etc.). Several machine learning (ML) models, in particular, K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Adaptive Boosting (AdaBoost.M2), and Support Vector Machines (SVM) are implemented on the field data collected on a freeway segment that includes over nine-thousand trucks. The results show that all major semi-trailers and empty platform semi-trailers can be distinguished with very high level of accuracies of 99% and 97% respectively.</p>Olcay SahinI am going to be attending the Transportation Research Board (TRB)’s Annual Meeting in January 2020 at the Walter E. Washington Convention Center, in Washington, D.C. This year I have one accepted paper as the corresponding author. TRB’s Annual Meeting is one of the largest gatherings of transportation professionals and is expected to attract more than 13,000 participants from around the world.Motor Carrier Management Information System2019-11-07T00:00:00+00:002019-11-07T00:00:00+00:00http://olcaysahin.com/Motor_Carrier_Management_Information_System<p>RShiny application for downloading and visualizing Motor Carrier Management Information System (MCMIS) data.
More information can be found here: <a href="https://ask.fmcsa.dot.gov/app/mcmiscatalog/d_census_mcmis_doc" target="_blank">link</a></p>
<p>I wanted to have an example for downloading data from RShiny and PostgreSQL powered web application.</p>
<p>Using this application, data can be filtered and downloaded.</p>
<p>I will also include some simple analysis.</p>
<p>Data is open to public and can be downloaded from this <a href="https://ai.fmcsa.dot.gov/SMS/Tools/Downloads.aspx" target="_blank">link</a>.
This data updates every month. I downloaded October 2019 dataset.
If I have time, I will write a PHP script for automatically downloads and updates previously created database.</p>
<h2 id="virtal-private-server-setup">Virtal Private Server Setup</h2>
<p>I rented a low cost virtual private server from OVH ($4 per month (a cup of coffee!)) in the following configuration 1 vCore 2GB memory and 20GB SSD space.</p>
<h2 id="database-setup">Database Setup</h2>
<p>I created a local PostgreSQL database in the server for this data set. I could also use real-time reading from file using data.table library, but it could take some time to read this data from file in this server. This server is a low-cost server for lightweight works.</p>
<p>So I created a table in the database. You can see the SQL file in the repository.</p>
<h2 id="shiny-server-setup">Shiny Server Setup</h2>
<p>Shiny server installation is straightforward. Just follow steps from official RStudio Shiny tutorial: <a href="https://rstudio.com/products/shiny/download-server/ubuntu/" target="_blank">link</a></p>
<h2 id="database-credentials-protection">Database Credentials protection</h2>
<p>Since I am going to upload this script to the Github public repository, I am going to hide my credentials. In order to do so, I created a .Renviron file and included necessary credentials in this file as the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbname = "name"
dbuser = "user"
dbpass = "password"
dbhost = "Ip address or localhost"
dbport = 5432 #Generally port for PostgreSQL is 5432. If you have multiple version check port number in postgres config
dbtable = "table name" #Not really necessary if you have multiple table in your database.
</code></pre></div></div>
<p>When an application or RStudio get started, these information are loaded to environment. There are so many ways to protect your credentials which is explained detail in Rstudio’s maual page: <a href="https://db.rstudio.com/best-practices/managing-credentials/" target="_blank">link</a></p>
<h2 id="nginx-web-server">NGINX Web Server</h2>
<p>Shiny server has it own port number (3838) to serve its applications. However I don’t want to enable any port other than 80. Therefore, I setup a proxy in Nginx web server. In this way, you can set a subdomain from domain name provider (e.g., GoDaddy, Name.com, etc) and point to Virtual Server IP address. Since we added the proxy information to the NGINX server, it will handle the routing.</p>
<p>Anybody who are not familier with the server and proxy can copy the below configuration. Don’t forget to update your information.</p>
<p>Below file is located at /etc/nginx/sites-available/mcmis</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>server {
server_name mcmis.olcaysahin.com;
root /opt/shiny-server/samples/mcmis/;
gzip on;
gzip_types text/plain text/css application/xml application/x-javascript;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
proxy_buffering off;
proxy_read_timeout 300s;
proxy_pass http://127.0.0.1:3838/mcmis/;
client_max_body_size 1000m;
}
}
</code></pre></div></div>
<h2 id="symbolic-link">Symbolic Link</h2>
<p>As you can see from the above root folder location, created RShiny Application located in a secure location. Therefore this location must be linked using Linux “Symbolic Link” command. See the example below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ln -s /opt/shiny-server/samples/mcmis/ /srv/shiny-server/
</code></pre></div></div>
<p>Now the application can be seen under my porfolio’s subdomain as: <a href="http://mcmis.olcaysahin.com" target="_blank">http://mcmis.olcaysahin.com</a></p>Olcay SahinRShiny application for downloading and visualizing Motor Carrier Management Information System (MCMIS) data. More information can be found here: linkRStudio Reticulate Setup2019-11-01T00:00:00+00:002019-11-01T00:00:00+00:00http://olcaysahin.com/Rstudio_Reticulate_Setup<p>This code written in the Rstudio. Reticulate library enables to run Python codes in Rstudio.
I copied example from Reticulate manual. I will post my own examples soon.</p>
<p>R Code:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">reticulate</span><span class="p">)</span><span class="w">
</span><span class="n">use_condaenv</span><span class="p">(</span><span class="s2">"r-reticulate"</span><span class="p">,</span><span class="n">required</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w">
</span><span class="n">py_run_string</span><span class="p">(</span><span class="s2">"import os as os"</span><span class="p">)</span><span class="w">
</span><span class="n">py_run_string</span><span class="p">(</span><span class="s2">"os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '../Local/Continuum/anaconda3/Library/plugins/platforms/'"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Python Code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kn">import</span> <span class="nn">pandas</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">2.0</span><span class="p">,</span><span class="mf">0.01</span><span class="p">)</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">pi</span><span class="o">*</span><span class="n">t</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">s</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<figure class=" ">
<a href="/assets/images/unnamed-chunk-2-1.png">
<img src="/assets/images/unnamed-chunk-2-1.png" alt="Plot" />
</a>
</figure>Olcay SahinThis code written in the Rstudio. Reticulate library enables to run Python codes in Rstudio. I copied example from Reticulate manual. I will post my own examples soon.Chicago Ride-share Data Analysis2019-10-22T00:00:00+00:002019-10-22T00:00:00+00:00http://olcaysahin.com/ChicagoRideShare<p>I just have written a recent R code to analyze “rideshare trips” data from the Chicago data portal. Data contains individuals’ trips from origin coordinates to destination coordinates along with travel time and cost of the trip and some other information. The goal of using this data is for travel demand modeling. In travel demand modeling, we need to know zone-to-zone interaction. I am going to count how many trips has been made within the zones. However, there are some issues with the data set:</p>
<p>(1) Almost for each trip, census trac information is given. But in order to have a complete data set, missing census tracts are needed to be found. So we need to download the Chicago census tracts polygon data to find missing tracts for trips. Luckily, this Census tract data is also available in the Chicago data portal. As a result, we need to do a spatial match between Census tracts and missing data points in the trips data set.</p>
<p>(2) Downloaded data is about 19GB which has around 73M rows. My computer’s memory is 16GB. If I want to analyze this data either I have to rent an AWS or Google Cloud resources or split the data to the chunks and analyze individually. I could also use big data frameworks for this analysis, but it is not that big to rent Hadoop resources as well. I have to use my computer. I developed the following R code to handle this big data and analysis on the fly.</p>
<p>Source for the trip data: <a href="https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips/m6dm-c72p" target="_blank">link</a></p>
<p>Source for the census tracts data: <a href="https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Census-Tracts-2010/5jrd-6zik" target="_blank">link</a></p>
<p>I had experience with all of the data handling libraries. For big data sets, I can suggest “data.table” library. For my spatial analysis, I used to analyze with PostgreSQL and PostGIS but there is a recent great library by Edzer Pebesma et. al (2019) for R which is “sf”. It is fast and easy to use. I don’t have to write spatial SQL code anymore.</p>
<p>Here is the code:
This code also can be found in my github page: <a href="https://github.com/olcaysah/Chicago-Ride-share-Trips-Analysis" target="_blank">link</a></p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">#' To see how long all process will take time...</span><span class="w">
</span><span class="n">startTime_FullProcess</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Sys.time</span><span class="p">()</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">data.table</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">fasttime</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mapview</span><span class="p">)</span><span class="w">
</span><span class="n">options</span><span class="p">(</span><span class="n">scipen</span><span class="o">=</span><span class="m">999</span><span class="p">)</span><span class="w">
</span><span class="c1">#Read for the column names for the rest of readings.</span><span class="w">
</span><span class="n">TNP_ColNames</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fread</span><span class="p">(</span><span class="s1">'Transportation_Network_Providers_-_Trips (1).csv'</span><span class="p">,</span><span class="w"> </span><span class="n">nrows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="c1">#Column names has gaps. These gaps sometimes make problems. Thats why I added "_" if there is a gap.</span><span class="w">
</span><span class="n">setnames</span><span class="p">(</span><span class="n">TNP_ColNames</span><span class="p">,</span><span class="w"> </span><span class="n">str_replace_all</span><span class="p">(</span><span class="nf">names</span><span class="p">(</span><span class="n">TNP_ColNames</span><span class="p">),</span><span class="w"> </span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">,</span><span class="w"> </span><span class="n">replacement</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"_"</span><span class="p">))</span><span class="w">
</span><span class="cd">#' I also wanted write the results to the disc to not repeat the same process again.</span><span class="w">
</span><span class="cd">#' For this purpose I can append to whenever the chunk of the data is analyzed.</span><span class="w">
</span><span class="cd">#' But first I need to prepare a base empty file for appending</span><span class="w">
</span><span class="cd">#' I am also adding 2 additional columns for newly found census tracts</span><span class="w">
</span><span class="n">TNP_ColNames_Base</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">TNP_ColNames</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">pickup_census_tract</span><span class="o">=</span><span class="kc">NA</span><span class="p">,</span><span class="n">dropoff_census_tract</span><span class="o">=</span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="n">fwrite</span><span class="p">(</span><span class="n">TNP_ColNames_Base</span><span class="p">,</span><span class="s1">'TNP_Pickup_Dropoff.csv'</span><span class="p">,</span><span class="n">append</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w">
</span><span class="cd">#' Read the census tracts shape file:</span><span class="w">
</span><span class="n">censusTracs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_sf</span><span class="p">(</span><span class="s2">"./geo_export_a33b8e6a-8cde-49fa-bca2-5446460ee02b.shp"</span><span class="p">)</span><span class="w">
</span><span class="cd">#' Fix the datum here. Transformation is also needed for the spatial analysis.</span><span class="w">
</span><span class="n">censusTracs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_transform</span><span class="p">(</span><span class="n">censusTracs</span><span class="p">,</span><span class="w"> </span><span class="m">4326</span><span class="p">)</span><span class="w">
</span><span class="cd">#' Lets see the census tracts in our map.</span><span class="w">
</span><span class="n">mapview</span><span class="p">(</span><span class="n">censusTracs</span><span class="p">)</span><span class="w">
</span><span class="cd">#' Let's start the fun part.</span><span class="w">
</span><span class="cd">#' First set the number of chunks for each read.</span><span class="w">
</span><span class="n">nChunks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1e6</span><span class="w">
</span><span class="cd">#' I know we have around 73M of rows.</span><span class="w">
</span><span class="cd">#' Let's set the chunks to read from data.</span><span class="w">
</span><span class="n">chunks</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">73e6</span><span class="p">,</span><span class="n">nChunks</span><span class="p">)</span><span class="w">
</span><span class="cd">#' There are many ways to write this for loop. But I like to lapply which is faster.</span><span class="w">
</span><span class="cd">#' I use bind_rows fcuntion from dplyr. This should be ok for this work.</span><span class="w">
</span><span class="n">mergedData</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">lapply</span><span class="p">(</span><span class="n">chunks</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">chunk</span><span class="p">){</span><span class="w">
</span><span class="cd">#' I just want to see how many minutes or seconds to take to process and analyze the chunk of the data.</span><span class="w">
</span><span class="n">startTime</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Sys.time</span><span class="p">()</span><span class="w">
</span><span class="cd">#' Read given amount of rows:</span><span class="w">
</span><span class="cd">#' I use fread function from data.table library. This is very fast.</span><span class="w">
</span><span class="cd">#' "chunk" variable passed through lapply function</span><span class="w">
</span><span class="n">TNP_Part</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fread</span><span class="p">(</span><span class="s1">'Transportation_Network_Providers_-_Trips (1).csv'</span><span class="p">,</span><span class="w"> </span><span class="n">skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">chunk</span><span class="m">+1</span><span class="p">),</span><span class="n">nrows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nChunks</span><span class="p">,</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w">
</span><span class="cd">#' I disabled the reading header names because we are not reading the always from top of the file.</span><span class="w">
</span><span class="cd">#' That's why we need to add header names.</span><span class="w">
</span><span class="cd">#' setnames is fucntion from data.table</span><span class="w">
</span><span class="n">setnames</span><span class="p">(</span><span class="n">TNP_Part</span><span class="p">,</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">TNP_ColNames</span><span class="p">))</span><span class="w">
</span><span class="cd">#' Just check the if there is a duplicated trips. According to description there should not be duplicated trip.</span><span class="w">
</span><span class="c1">#length(unique(TNP_Part$Trip_ID))</span><span class="w">
</span><span class="cd">#' Let's start with the spatial match with census tracts.</span><span class="w">
</span><span class="cd">#' Lets start with the missing pickups.</span><span class="w">
</span><span class="n">TNP_Pickup</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">TNP_Part</span><span class="p">[,</span><span class="n">.</span><span class="p">(</span><span class="n">Pickup_Centroid_Longitude</span><span class="p">,</span><span class="n">Pickup_Centroid_Latitude</span><span class="p">)]</span><span class="w">
</span><span class="cd">#' Convert the points to geospatial object</span><span class="w">
</span><span class="n">TNP_Pickup</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sf</span><span class="o">::</span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">TNP_Pickup</span><span class="p">,</span><span class="w"> </span><span class="n">coords</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"Pickup_Centroid_Longitude"</span><span class="p">,</span><span class="s2">"Pickup_Centroid_Latitude"</span><span class="p">),</span><span class="w"> </span><span class="n">crs</span><span class="o">=</span><span class="m">4326</span><span class="p">,</span><span class="w"> </span><span class="n">na.fail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w">
</span><span class="cd">#' Let's use the st_intersects from sf package.</span><span class="w">
</span><span class="cd">#' Results returns a list. I want to convert to data.frame. Thanks to dplyr for pipe function.</span><span class="w">
</span><span class="cd">#' This function returns 2 value. One is index value for pickups and other one is index vvalue for the census tracts.</span><span class="w">
</span><span class="cd">#' I could filter out the ones already have census tracts but I want to match with the given one and found one.</span><span class="w">
</span><span class="n">pickup_CT</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sf</span><span class="o">::</span><span class="n">st_intersects</span><span class="p">(</span><span class="n">TNP_Pickup</span><span class="p">,</span><span class="w"> </span><span class="n">censusTracs</span><span class="p">,</span><span class="w"> </span><span class="n">sparse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">()</span><span class="w">
</span><span class="cd">#' Set a new column name for new census pickup tracts</span><span class="w">
</span><span class="n">TNP_Part</span><span class="o">$</span><span class="n">pickup_census_tract</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="cd">#' Now lets update the values.</span><span class="w">
</span><span class="n">TNP_Part</span><span class="o">$</span><span class="n">pickup_census_tract</span><span class="p">[</span><span class="n">pickup_CT</span><span class="o">$</span><span class="n">row.id</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">censusTracs</span><span class="o">$</span><span class="n">geoid10</span><span class="p">[</span><span class="n">pickup_CT</span><span class="o">$</span><span class="n">col.id</span><span class="p">]</span><span class="w">
</span><span class="cd">#' Below is same process for drop offs</span><span class="w">
</span><span class="n">TNP_Dropoff</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">TNP_Part</span><span class="p">[,</span><span class="n">.</span><span class="p">(</span><span class="n">Dropoff_Centroid_Longitude</span><span class="p">,</span><span class="n">Dropoff_Centroid_Latitude</span><span class="p">)]</span><span class="w">
</span><span class="n">TNP_Dropoff</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sf</span><span class="o">::</span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">TNP_Dropoff</span><span class="p">,</span><span class="w"> </span><span class="n">coords</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"Dropoff_Centroid_Longitude"</span><span class="p">,</span><span class="s2">"Dropoff_Centroid_Latitude"</span><span class="p">),</span><span class="w"> </span><span class="n">crs</span><span class="o">=</span><span class="m">4326</span><span class="p">,</span><span class="w"> </span><span class="n">na.fail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w">
</span><span class="n">dropoff_CT</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sf</span><span class="o">::</span><span class="n">st_intersects</span><span class="p">(</span><span class="n">TNP_Dropoff</span><span class="p">,</span><span class="w"> </span><span class="n">censusTracs</span><span class="p">,</span><span class="w"> </span><span class="n">sparse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">()</span><span class="w">
</span><span class="n">TNP_Part</span><span class="o">$</span><span class="n">dropoff_census_tract</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="n">TNP_Part</span><span class="o">$</span><span class="n">dropoff_census_tract</span><span class="p">[</span><span class="n">dropoff_CT</span><span class="o">$</span><span class="n">row.id</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">censusTracs</span><span class="o">$</span><span class="n">geoid10</span><span class="p">[</span><span class="n">dropoff_CT</span><span class="o">$</span><span class="n">col.id</span><span class="p">]</span><span class="w">
</span><span class="cd">#' Lets start the data analysis now.</span><span class="w">
</span><span class="cd">#' Never use the for loop. If you select to use, you will wait forever.</span><span class="w">
</span><span class="cd">#' Use the advantage of data.table process.</span><span class="w">
</span><span class="cd">#' Time stamps format are need to be converted from text to timestamp.</span><span class="w">
</span><span class="n">TNP_Part</span><span class="p">[,</span><span class="w"> </span><span class="n">`:=`</span><span class="w"> </span><span class="p">(</span><span class="n">Trip_Start_Timestamp_formatted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.POSIXct</span><span class="p">(</span><span class="n">Trip_Start_Timestamp</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%m/%d/%Y %I:%M:%S %p"</span><span class="p">,</span><span class="w"> </span><span class="n">tz</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"UTC"</span><span class="p">),</span><span class="w">
</span><span class="n">Trip_End_Timestamp_formatted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.POSIXct</span><span class="p">(</span><span class="n">Trip_Start_Timestamp</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%m/%d/%Y %I:%M:%S %p"</span><span class="p">,</span><span class="w"> </span><span class="n">tz</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"UTC"</span><span class="p">))]</span><span class="w">
</span><span class="cd">#' Add dates and times for grouping purposes.</span><span class="w">
</span><span class="n">TNP_Part</span><span class="p">[,</span><span class="w"> </span><span class="n">`:=`</span><span class="w"> </span><span class="p">(</span><span class="n">Trip_Start_Date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.IDate</span><span class="p">(</span><span class="n">Trip_Start_Timestamp_formatted</span><span class="p">),</span><span class="w"> </span><span class="c1"># Extract the trip start date</span><span class="w">
</span><span class="n">Trip_End_Date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.IDate</span><span class="p">(</span><span class="n">Trip_End_Timestamp_formatted</span><span class="p">),</span><span class="w"> </span><span class="c1"># Extract the trip end date</span><span class="w">
</span><span class="n">start_hour_of_day</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hour</span><span class="p">(</span><span class="n">Trip_Start_Timestamp_formatted</span><span class="p">),</span><span class="w"> </span><span class="c1"># Extract the trip start hour</span><span class="w">
</span><span class="n">end_hour_of_day</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hour</span><span class="p">(</span><span class="n">Trip_End_Timestamp_formatted</span><span class="p">),</span><span class="w"> </span><span class="c1"># Extract the trip end hour</span><span class="w">
</span><span class="n">wday_trip_start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wday</span><span class="p">(</span><span class="n">Trip_Start_Timestamp_formatted</span><span class="p">),</span><span class="w"> </span><span class="c1"># Extract the trip start weekday</span><span class="w">
</span><span class="n">wday_trip_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wday</span><span class="p">(</span><span class="n">Trip_End_Timestamp_formatted</span><span class="p">))]</span><span class="w"> </span><span class="c1"># Extract the trip end weekday</span><span class="w">
</span><span class="cd">#' Thanks data.table. I want to buy a cup of coffee.</span><span class="w">
</span><span class="cd">#' This is increadbly fast.</span><span class="w">
</span><span class="cd">#' Now get the number of trips for each census tracts depending on the what resolution you need.</span><span class="w">
</span><span class="n">TNP_Summary</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">TNP_Part</span><span class="p">[,</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">nTrip</span><span class="o">=</span><span class="n">.N</span><span class="p">,</span><span class="w"> </span><span class="c1">#Number of trips</span><span class="w">
</span><span class="n">aveTrip_TT_sec</span><span class="o">=</span><span class="n">mean</span><span class="p">(</span><span class="n">Trip_Seconds</span><span class="p">)),</span><span class="w"> </span><span class="c1"># AVerage travel time</span><span class="w">
</span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'Trip_End_Date'</span><span class="p">,</span><span class="w">
</span><span class="s1">'end_hour_of_day'</span><span class="p">,</span><span class="w">
</span><span class="s1">'pickup_census_tract'</span><span class="p">,</span><span class="w">
</span><span class="s1">'dropoff_census_tract'</span><span class="p">)]</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">na.omit</span><span class="p">()</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">Sys.time</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">startTime</span><span class="p">)</span><span class="w">
</span><span class="cd">#' If you need to write this file to local disk you can use the append.</span><span class="w">
</span><span class="cd">#' fwrite is a function from data.table lib.</span><span class="w">
</span><span class="c1">#fwrite(TNP_Part,'TNP_Pickup_Dropoff.csv',append = T)</span><span class="w">
</span><span class="cd">#' That's it.</span><span class="w">
</span><span class="cd">#' Now it is time to return the analysed chunk of data set to bind_rows fucntion for collecting all the chunks in one data frame.</span><span class="w">
</span><span class="cd">#' My memory now can handle this</span><span class="w">
</span><span class="nf">return</span><span class="p">(</span><span class="n">TNP_Summary</span><span class="p">)</span><span class="w">
</span><span class="p">}))</span><span class="w">
</span><span class="cd">#' Let see how long it will take to finish all analysis</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">Sys.time</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">startTime_FullProcess</span><span class="p">)</span><span class="w">
</span><span class="cd">#' Time difference of 1.033505 hours</span><span class="w">
</span><span class="cd">#' Of course if I filtered out the ones already have the census tracts, it would take way much less time.</span><span class="w">
</span><span class="cd">#' Now let's make the final analysis. Then move the second part of the analysis.</span><span class="w">
</span><span class="n">finalMergedData</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mergedData</span><span class="p">[,</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">sum</span><span class="o">=</span><span class="nf">sum</span><span class="p">(</span><span class="n">nTrip</span><span class="p">)),</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'Trip_End_Date'</span><span class="p">,</span><span class="w"> </span><span class="s1">'end_hour_of_day'</span><span class="p">,</span><span class="w"> </span><span class="s1">'pickup_census_tract'</span><span class="p">,</span><span class="s1">'dropoff_census_tract'</span><span class="p">)]</span><span class="w">
</span><span class="cd">#' As a final word, this script is memory friendly and fast.</span><span class="w">
</span><span class="cd">#' I can also do this process in parallel mode, but source data is big. Therfore, it could crash.</span><span class="w">
</span><span class="cd">#' It is kind of still slow because spatial analysis make it slow.</span><span class="w">
</span><span class="cd">#' In my computer (Windows 10 Intel Xeon CPU E5-2687W v2 @ 340GHz) RStudio consumes average 3.6GB memory.</span><span class="w">
</span></code></pre></div></div>Olcay SahinI just have written a recent R code to analyze “rideshare trips” data from the Chicago data portal. Data contains individuals’ trips from origin coordinates to destination coordinates along with travel time and cost of the trip and some other information. The goal of using this data is for travel demand modeling. In travel demand modeling, we need to know zone-to-zone interaction. I am going to count how many trips has been made within the zones. However, there are some issues with the data set:Data.Table Samples2019-05-21T00:00:00+00:002019-05-21T00:00:00+00:00http://olcaysahin.com/Data.Table-Samples<ul>
<li>In this post I am going to post some useful and handy data.table examples which I implemented in my codes.</li>
</ul>
<p>Data.table cheat sheet <a href="https://s3.amazonaws.com/assets.datacamp.com/blog_assets/datatable_Cheat_Sheet_R.pdf" target="_blank">link</a></p>
<ul>
<li>Read list of files as a light speed:
I had around 10k of same format csv files. Below code reads them incredibly fast.
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data_files</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">list.files</span><span class="p">(</span><span class="s1">'../../../../../Data/'</span><span class="p">,</span><span class="w"> </span><span class="n">full.names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">,</span><span class="w"> </span><span class="n">recursive</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">,</span><span class="w"> </span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'.csv'</span><span class="p">)</span><span class="w">
</span><span class="n">l</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lapply</span><span class="p">(</span><span class="n">data_files</span><span class="p">,</span><span class="w"> </span><span class="n">fread</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">','</span><span class="p">)</span><span class="w"> </span><span class="c1">#Read the files</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbindlist</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w">
</span></code></pre></div> </div>
</li>
</ul>Olcay SahinIn this post I am going to post some useful and handy data.table examples which I implemented in my codes.TSSC2019 Signalized Intersections Challenge2019-03-19T00:00:00+00:002019-03-19T00:00:00+00:00http://olcaysahin.com/TSSC2019_Signalized_Intersections_Challenge<ul>
<li>
<p><a href="https://github.com/TSSC2019/Big_Data_Challenge_on_Signalized_Intersections" target="_blank">TSSC2019 - Big Data Challenge on Signalized Intersections</a></p>
<p>Another traffic data related challenge. This time Traffic Signals Systems Committee (AHB25) organizing a challenge on developing a visualization tool and an algorithm to assist the decision making of the Utah Department of Transportation. Me and my old colleague Ilyas Ustun will attend the competition to provide a useful tool to UDOT.</p>
<p>Here is the <a href="https://github.com/TSSC2019/Big_Data_Challenge_on_Signalized_Intersections" target="_blank">link</a> for the competition Github Page.</p>
</li>
</ul>Olcay SahinTSSC2019 - Big Data Challenge on Signalized IntersectionsTRB 20192019-01-02T00:00:00+00:002019-01-02T00:00:00+00:00http://olcaysahin.com/TRB_2019<ul>
<li>
<p>2019 TRB Annual Meeting</p>
<p>I am going to be attending the Transportation Research Board (TRB)’s Annual Meeting in January 2019 at the Walter E. Washington Convention Center, in Washington, D.C. I have two accepted papers at this conference, one as the author and the other as a co-author. TRB’s Annual Meeting is one of the largest gatherings of transportation professionals and is expected to attract more than 13,000 participants from around the world.</p>
</li>
<li>
<p>Papers:</p>
</li>
</ul>
<ol>
<li>
<p><strong>Sahin, O.</strong>, R.V. Nezafat, Cetin, M. (2019). <em>Classification of Truck Trailers Based on Side-Fire LIDAR Data</em> In Transportation Research Board 98th Annual Meeting.</p>
<p>Poster presentation:
Monday 10:15 AM- 12:00 PM
Convention Center, Hall A</p>
</li>
<li>
<p>R.V. Nezafat, <strong>Sahin, O.</strong>, Cetin, M. (2019). <em>A Deep Transfer Learning Approach for Classification of Truck Body Types Based on Side-Fire LIDAR Data</em> In Transportation Research Board 98th Annual Meeting.</p>
<p>Presentation:
Tuesday 1:30 PM- 3:15 PM
Convention Center, 151B</p>
</li>
</ol>Olcay Sahin2019 TRB Annual MeetingTransfor192018-12-10T00:00:00+00:002018-12-10T00:00:00+00:00http://olcaysahin.com/transfor19<ul>
<li>
<p><a href="https://github.com/TRANSFORABJ70/TRANSFOR19">TRB-ABJ70 Transportation Forecasting Competition</a></p>
<p>Me and my old colleague Ilyas Ustun attended to the <a href="https://ta.itss-ieee.org/transportation-forecasting-competition-transfor-19-call-for-participation-2019-trb-annual-meeting-workshop/" target="_blank">TRB-ABJ70 Transportation Forecasting Competition </a>. The results will be announce on December 17th.</p>
</li>
</ul>
<p>** Update:
According to results we were not able to selected in the first 5 top attendees. The source code will be open in the competition’s Github page.</p>Olcay SahinTRB-ABJ70 Transportation Forecasting CompetitionVDOT Hackathon2018-04-30T00:00:00+00:002018-04-30T00:00:00+00:00http://olcaysahin.com/vdot-hackathon<ul>
<li>
<p><a href="https://smarter-roads-hackathon-vb.devpost.com/">VDOT SmarterRoads Hackaton</a></p>
<p>Old Dominion University’s Transportation Research Institute (TRI) attended the VDOT SmarterRoads Hackathon. We formed 3 teams and we have succesfully accomplished the projects in a given short time.</p>
<p>Me and my colleague Gulsevi developed a <a href="https://devpost.com/software/web-application-of-toll-based-route-guidance" target="_blank">web application of toll-based route guidance</a>. Toll data has been parsed real time from <a href="http://smarterroads.org" target="_blank">SmarterRoads Data Portal</a>, <a href="https://developers.google.com/maps/documentation/distance-matrix/intro" target="_blank">travel time</a> and <a href="https://developers.google.com/maps/documentation/directions/intro" target="_blank">direction</a> data parsed from <a href="https://cloud.google.com/maps-platform/" target="_blank">Google Maps Api</a>.</p>
<p>When the user enter origin and destination locations, parsed data analyzed and suggests multiple route options with the tolling information. User can select desired route option for the destination.</p>
<p>The web application written in R with shiny package The source code can be found in <a href="https://github.com/olcaysah/VDOT_Hackathon" target="_blank">here</a>. Leaflet map has been used for displaying the waypoints of selected route.</p>
<p>The other teams projects are <a href="https://devpost.com/software/smartpave-smart-systematic-pavement-management" target="_blank">SmartPave</a> and <a href="https://devpost.com/software/vi-care" target="_blank">Vi-Care</a>.</p>
<p>Unfortunately my project was not selected but I had a great time and experience.</p>
<p>In the hackathon, based on the submission rules, we used only provided data feeds. Actually there is a plenty of resources for the data feeds. The list is <a href="https://docs.google.com/document/d/14WyrbsxvVkHfzruC_mCqdrKWr8J-5H-xzsy5mDcR650/edit" target="_blank">here</a>. Download it before it has gone!</p>
<p>Here is the snapshot of the user interface. I want to note that it needs be still improved.</p>
<p><img src="http://olcaysahin.com/pix/snapshot.PNG" alt="alt text" title="Toll based route choise user interface" /></p>
</li>
</ul>Olcay SahinVDOT SmarterRoads HackatonHRBT Overheight Trucks Problem2017-09-10T00:00:00+00:002017-09-10T00:00:00+00:00http://olcaysahin.com/hrbt-overheight<ul>
<li>HRBT Overheight Trucks</li>
</ul>
<p>I have started to a new project which is analyzing the overheight truck turn arounds at Hampton Roads Bridge Tunnel (HRBT).</p>
<p>More information can be found in <a href="https://www.odu.edu/news/2017/12/hrbt_trucks#.Wyla1VVKiUk" target="_blank">here</a>.</p>Olcay SahinHRBT Overheight Trucks