Yes, a clearly question and I have tried to find for months. And if you don’t want to read whole of this topic to get the answer, please follows 4 steps bellow:
Step 1. Clone and setup follow the guide on CVEDataFeed repository
> git clone https://github.com/cuongmx/CVEDataFeed.git
Step 2. Create a mongodb, use something like mlab or MongoDB Atlas
Step 3. Setup environments and run command to import database from NVD
> python3 cvedatafeed.py importonline
Step 4. Build a frontend to browser all collection from the MongoDB (like https://cvedata.com)
So, if you also don’t want to build a frontend, please visit my new product https://cvedata.com
In this topic, I will try to take some keynotes about the CVEData building process which main target is how to cover all functions of CVEDetails.
0. A story
I’m working as a security pentest and security consultant. So, Docx coding (reporting) is one of some main tasks which I work everyday. And one of some interesting bug types which I like to report is “Using Components with Known Vulnerabilities” because my simple task is paste the link’s product at the cvedetails.com. In that way, I have been big fan of CVEDetails. However, one day, as usually, after pasting the link to my report, I sent to my customer and take a coffee. One moments, my kindly customer reply “Where are my CVEs on 2020?”
Oh man, there is not any update on CVEDetails from Nov 2019
Do something like a trending fan, I tried to find some alternative:
Do thing like a big fan, I wrote email for the author (Mr Serkan Özkan) to try get update.
Mr. Serkan Özkan was pleased to reply to me and send me another good products (vulniq.com). However, I could not find a familiarity on it :(
So, in the email to Mr Serkan Özkan, I promised to build another site if he discontinues CVEDetails.
I am a respectful credibility person. So, about last month, I starting research about the CVEDetails. Fortunately, I found 2 keywords make this task be very possible:
1. NVD Datasource
NVD (National Vulnerability Database - https://nvd.nist.gov/) is the original datasource and fastest update about the CVE (not cve.mitre.org). Staff at NVD is very hard working, they release CVE update every 2h, including holiday (❤). And to make clearly, CVEDetails or CVEData or any CVE site, they just show data from NVD in difference ways.
To feed data from NVD, you just need download Json file from https://nvd.nist.gov/vuln/data-feeds (all CVE Data from 1999). And to keep update data, you need follow CVE-Modified and CVE-Recent. A note: CVE-Modified and CVE-Recent just stored recent data in 7 days, and you must keep run update job in every 7 days.
2. CPE Name
CPE (Common Platform Enumeration — https://nvd.nist.gov/products/cpe) is a naming scheme which is defined by NVD to unique system, software, packages as URI string.
For example:
cpe:2.3:o:linux:linux_kernel:2.4.7:*:*:*:*:*:*:* is used to define the Linux Kernel product, version 2.4.7 by Linux vendor, type is Operating system. CPE:2.3 is version of CVE.
3. Some others
There are some other problems which I resolved:
Vulnerability type
This is a very useful feature in CVEDetails. However, it’s not in original NVD data. As author, he matching keyword and CWE to classify.
Follow the author’s guide, I tried some statistic algorithm and got nice keywords and CWE set. I measure original classification in CVEDetails with my set, the results is about 99%
The comparison result, more details in github
#testFilter("exec code",[r"(code|command).*(execution|execute)", r"(execution|execute).*(code|command)"])
#out: 10552/10552
#testFilter("dos",[r"denial of service"])
#out: 8260/8260
#testFilter("overflow",[r"overflow", r"(restrict|crash|invalid|violat|corrupt).*(buffer|stack|heap|memory)", r"(buffer|stack|heap|memory).*(restrict|crash|invalid|violat|corrupt)"])
#out: 5242/5814
#testFilter("priv",[r"(gain|escalat).*privil", r"privil.*(gain|escalat)"])
#out: 1910/1910
CVSS v2 or v3
In fact, not of all CVE have just score in CVSS2 and CVEDetails just show vector string in CVSS2. This limit cannot show fully the complexity of CVEs. And in CVEData, I prioritize CVSS3 and when show in website, I will try convert some field in CVSS2 to 3.
CVE Name
This missing thing in CVEDetails or other bug data site, I built it from CVEID, Vuln type and vendor, product affect.
4. CVEData architect
This is comunity project, so the cost the importance, there 3 points to choice architect:
- Full automation, no need operation
- Good Vendor, Good Infrastructure
- Free or cheap
Detail configurations:
- Protector, https: Cloudflare ~ free
- Front-end: Django run on Google App Engine ~ free for 1000 hours/months :-S
- Back-end: Google Cloud Functions run in Cloud Scheduler ~ free 3 jobs
- DB: MongoDB Atlas, Free max 500MB data, total size about 700 MB, however I have voucher for 1year ~ free 1 year (hope CVEData live over 1 year :-P)
- Monitor: UptimeRobot ~ free
- Source repo: Github
5. Next step
I know now trending is threat intelligence. However, classic style (like dictionary) also has its value, at least with me. In next time, I have some ideas to continues:
- Build bug trending to catch bugbounty trending $_$
- CVE Awards: best cve, hotest cve, voting,…
- Add more datasource to get CVE’s author and build Hall of Fame for CVE.
If you enjoy or have any other ideas, please let me know. Thank you!