[728]GitHub缪斯数据集(GitHub MUSAE Data Set)
0、数据编号:728
1、数据名称:GitHub缪斯数据集(GitHub MUSAE Data Set)
2、数据来源:The University of Edinburgh
3、时间跨度:截至2019-12-09
4、区域范围:
5、数据大小:2.3MB
6、数据格式:csv
7、数据简介:
2019 年 10 月从公共 API 收集的 GitHub 开发人员的大型社交网络。节点是至少为?个存储库加星标的开发人员,边缘是它们之间的相互关注者关系。顶点特征是根据位置、加星标的存储库、雇主和电子邮件地址提取的。与图相关的任务是必须预测GitHub用户是Web还是机器学习开发人员的二元节点分类 。此任务功能派生自每个用户的职务。
英文原文:
A social network of GitHub users with user-level attributes, connectivity data and a binary target variable. A large social network of GitHub developers which was collected from the public API in June 2019. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships between them. The vertex features are extracted based on the location, repositories starred, employer and e-mail address. The task related to the graph is binary node classification – one has to predict whether the GitHub user is a web or a machine learning developer. This target feature was derived from the job title of each user.
参考文献:
[1] Multi-scale Attributed Node Embedding. Benedek Rozemberczki and Carl Allen and Rik Sarkar.