Getting Lighthouse scores from HTTPArchive for sites in India.

I’m about to go on a short trip to India, and I’ve been thinking about longer-term developer relations work for Chrome and Web in the region. As with most trips I like to do a bit of research ahead of time so I can get a better understanding of what the web looks like from the perspective of the country I am visiting.
I’ve been following a bunch of the updates to HTTPArchive over the past couple of months and it’s been amazing to see the improvements to the types of data it collects and stores in its BigQuery tables.


This content originally appeared on Modern Web Development with Chrome and was authored by Paul Kinlan

<p>I'm about to go on a short trip to India, and I've been thinking about longer-term developer relations work for Chrome and Web in the region. As with most trips I like to do a bit of research ahead of time so I can get a better understanding of what the web looks like from the perspective of the country I am visiting.</p> <p>I've been following a bunch of the updates to <a href="https://httparchive.org/">HTTPArchive</a> over the past couple of months and it's been amazing to see the improvements to the types of data it collects and stores in its <a href="https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/docs/bigquery-gettingstarted.md">BigQuery</a> tables. One specific piece of information that is of massive interest to me is the <a href="https://developers.google.com/web/tools/lighthouse/">Lighthouse</a> data generated on each run of HTTPArchive. With this data I was keen to see if I could use it to get a snapshot of the data and get a high-level understanding of how people might experience the web in the country.</p> <p>The good news is that it's not too hard to analyse the Lighthouse data in HTTPArchive.</p> <p>For my needs though, the harder part is to get a lock on what a 'top site' in any given country is, especially when I am thinking about developer relations work that we could and should be doing.</p> <p>Here is how I broke the problem down. In each country there are many types of developers that build for the web and personally I tend to bucket them in to 3 groups: Those whose current project target the local market; Those that target a foreign market (I building for export); and those that target a global audience.</p> <p>When I think about the above three groups, it's nearly impossible to work out the intent of the site and the people behind it. But there are some heuristics that you can use to at least help you reason and understand the data.</p> <p>For my analysis I didn't think I could get a list of the top sites visited by users in India, so I made a simple assumption that '.in' domains are <em>likely</em> to be built for people in India. The sensitivity and specificity for the question of ‘indian sites’ is not 100% by focusing on ‘.in domains’ — users all over the world like to use experiences that aren't just locked to the countries TLD — but it seems like decent measure of the state of Indian sites as a first pass.</p> <p>This type of analysis turns out to be pretty easy. You open up <a href="https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/docs/bigquery-gettingstarted.md">BigQuery</a> and find the latest table that contains the Lighthouse data run [httparchive:lighthouse.2018_08_01_mobile] in this case and run the following query.</p> <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-sql" data-lang="sql"><span style="color:#66d9ef">SELECT</span> url, JSON_EXTRACT(report, <span style="color:#e6db74">'$.categories.seo.score'</span>) <span style="color:#66d9ef">AS</span> [seo_score], JSON_EXTRACT(report, <span style="color:#e6db74">'$.categories.pwa.score'</span>) <span style="color:#66d9ef">AS</span> [pwa_score], JSON_EXTRACT(report, <span style="color:#e6db74">'$.categories.performance.score'</span>) <span style="color:#66d9ef">AS</span> [speed_score], JSON_EXTRACT(report, <span style="color:#e6db74">'$.categories.accessibility.score'</span>) <span style="color:#66d9ef">AS</span> [accessibility_score] <span style="color:#66d9ef">FROM</span> [httparchive:lighthouse.<span style="color:#ae81ff">2018</span>_08_01_mobile] <span style="color:#66d9ef">WHERE</span> url <span style="color:#66d9ef">LIKE</span> <span style="color:#e6db74">'%.in/'</span> </code></pre></div><p>The above query is filtered on domains ending in '.in', and it returns the Lighthouse score for each of the Lighthouse test categories. The Lighthouse data is stored as a JSON object, which you have to extract the required components via an XPath like syntax for JSON.</p> <p>The number of results is actually pretty large and not of much use to present here, but I did pivot these into a histogram.</p> <table> <thead> <th>Score Range</th> <th>SEO Score</th> <th>PWA Score</th> <th>Speed Score</th> <th>A11Y Score</th> </thead> <tbody> <tr> <td>0</td> <td>0</td> <td>46</td> <td>279</td> <td>25</td> </tr> <tr> <td>0.5</td> <td>84</td> <td>13992</td> <td>6502</td> <td>3973</td> </tr> <tr> <td>0.7</td> <td>3391</td> <td>1400</td> <td>2222</td> <td>7585</td> </tr> <tr> <td>0.8</td> <td>1438</td> <td>19</td> <td>1147</td> <td>2374</td> </tr> <tr> <td>0.9</td> <td>2762</td> <td>9</td> <td>1545</td> <td>1069</td> </tr> <tr> <td>1</td> <td>7752</td> <td>13</td> <td>3189</td> <td>434</td> </tr> </tbody> </table> <p>Further drill-down and analysis of the data needs to take place, to understand exactly which specific issues are affecting the scores, however in some cases like with the 'PWA Score' I've seen enough of the site scores in the past to know what issues affect the overall score and I can see some of the challenges ahead of us now.</p> <p>Next up. Try and find a way to get the sites that Indian users frequent.... Hint, it's <a href="https://paul.kinlan.me/crux-topsites-and-lighthouse-scores-for-india/">here</a></p>


This content originally appeared on Modern Web Development with Chrome and was authored by Paul Kinlan


Print Share Comment Cite Upload Translate Updates
APA

Paul Kinlan | Sciencx (2018-08-24T08:09:10+00:00) Getting Lighthouse scores from HTTPArchive for sites in India.. Retrieved from https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/

MLA
" » Getting Lighthouse scores from HTTPArchive for sites in India.." Paul Kinlan | Sciencx - Friday August 24, 2018, https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/
HARVARD
Paul Kinlan | Sciencx Friday August 24, 2018 » Getting Lighthouse scores from HTTPArchive for sites in India.., viewed ,<https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/>
VANCOUVER
Paul Kinlan | Sciencx - » Getting Lighthouse scores from HTTPArchive for sites in India.. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/
CHICAGO
" » Getting Lighthouse scores from HTTPArchive for sites in India.." Paul Kinlan | Sciencx - Accessed . https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/
IEEE
" » Getting Lighthouse scores from HTTPArchive for sites in India.." Paul Kinlan | Sciencx [Online]. Available: https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/. [Accessed: ]
rf:citation
» Getting Lighthouse scores from HTTPArchive for sites in India. | Paul Kinlan | Sciencx | https://www.scien.cx/2018/08/24/getting-lighthouse-scores-from-httparchive-for-sites-in-india/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.