Advertisement
Advertisement


Can you get the number of lines of code from a GitHub repository?


Question

In a GitHub repository you can see “language statistics”, which displays the percentage of the project that’s written in a language. It doesn’t, however, display how many lines of code the project consists of. Often, I want to quickly get an impression of the scale and complexity of a project, and the count of lines of code can give a good first impression. 500 lines of code implies a relatively simple project, 100,000 lines of code implies a very large/complicated project.

So, is it possible to get the lines of code written in the various languages from a GitHub repository, preferably without cloning it?


The question “Count number of lines in a git repository” asks how to count the lines of code in a local Git repository, but:

  1. You have to clone the project, which could be massive. Cloning a project like Wine, for example, takes ages.
  2. You would count lines in files that wouldn’t necessarily be code, like i13n files.
  3. If you count just (for example) Ruby files, you’d potentially miss massive amount of code in other languages, like JavaScript. You’d have to know beforehand which languages the project uses. You’d also have to repeat the count for every language the project uses.

All in all, this is potentially far too time-intensive for “quickly checking the scale of a project”.

2018/10/30
1
443
10/30/2018 4:00:42 AM

Accepted Answer

Not currently possible on Github.com or their API-s

I have talked to customer support and confirmed that this can not be done on github.com. They have passed the suggestion along to the Github team though, so hopefully it will be possible in the future. If so, I'll be sure to edit this answer.

Meanwhile, Rory O'Kane's answer is a brilliant alternative based on cloc and a shallow repo clone.

2017/05/23
35
5/23/2017 12:10:43 PM


You can run something like

git ls-files | xargs wc -l

which will give you the total count →

lines of code

Or use this tool → http://line-count.herokuapp.com/

2020/02/23

There is an extension for Google Chrome browser - GLOC which works for public and private repos.

Counts the number of lines of code of a project from:

  • project detail page
  • user's repositories
  • organization page
  • search results page
  • trending page
  • explore page

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here

2020/01/21

If you go to the graphs/contributors page, you can see a list of all the contributors to the repo and how many lines they've added and removed.

Unless I'm missing something, subtracting the aggregate number of lines deleted from the aggregate number of lines added among all contributors should yield the total number of lines of code in the repo. (EDIT: it turns out I was missing something after all. Take a look at orbitbot's comment for details.)

UPDATE:

This data is also available in GitHub's API. So I wrote a quick script to fetch the data and do the calculation:

'use strict';

function countGithub(repo) {
fetch('https://api.github.com/repos/'+repo+'/stats/contributors')
    .then(response => response.json())
    .then(contributors => contributors
        .map(contributor => contributor.weeks
            .reduce((lineCount, week) => lineCount + week.a - week.d, 0)))
    .then(lineCounts => lineCounts.reduce((lineTotal, lineCount) => lineTotal + lineCount))
    .then(lines => window.alert(lines));
}

countGithub('jquery/jquery'); // or count anything you like

Just paste it in a Chrome DevTools snippet, change the repo and click run.

Disclaimer (thanks to lovasoa):

Take the results of this method with a grain of salt, because for some repos (sorich87/bootstrap-tour) it results in negative values, which might indicate there's something wrong with the data returned from GitHub's API.

UPDATE:

Looks like this method to calculate total line numbers isn't entirely reliable. Take a look at orbitbot's comment for details.

2020/02/01

You can clone just the latest commit using git clone --depth 1 <url> and then perform your own analysis using Linguist, the same software Github uses. That's the only way I know you're going to get lines of code.

Another option is to use the API to list the languages the project uses. It doesn't give them in lines but in bytes. For example...

$ curl https://api.github.com/repos/evalEmpire/perl5i/languages
{
  "Perl": 274835
}

Though take that with a grain of salt, that project includes YAML and JSON which the web site acknowledges but the API does not.

Finally, you can use code search to ask which files match a given language. This example asks which files in perl5i are Perl. https://api.github.com/search/code?q=language:perl+repo:evalEmpire/perl5i. It will not give you lines, and you have to ask for the file size separately using the returned url for each file.

2014/11/12

You can use GitHub API to get the sloc like the following function

function getSloc(repo, tries) {

    //repo is the repo's path
    if (!repo) {
        return Promise.reject(new Error("No repo provided"));
    }

    //GitHub's API may return an empty object the first time it is accessed
    //We can try several times then stop
    if (tries === 0) {
        return Promise.reject(new Error("Too many tries"));
    }

    let url = "https://api.github.com/repos" + repo + "/stats/code_frequency";

    return fetch(url)
        .then(x => x.json())
        .then(x => x.reduce((total, changes) => total + changes[1] + changes[2], 0))
        .catch(err => getSloc(repo, tries - 1));
}

Personally I made an chrome extension which shows the number of SLOC on both github project list and project detail page. You can also set your personal access token to access private repositories and bypass the api rate limit.

You can download from here https://chrome.google.com/webstore/detail/github-sloc/fkjjjamhihnjmihibcmdnianbcbccpnn

Source code is available here https://github.com/martianyi/github-sloc

2017/03/30