Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub datagen #107

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

GitHub datagen #107

wants to merge 3 commits into from

Conversation

davetroiano
Copy link
Contributor

@davetroiano davetroiano commented Jul 8, 2022

No description provided.

This is standard, and editing in IDEs is friendlier with .avsc
extension.
@davetroiano davetroiano requested a review from a team as a code owner July 8, 2022 20:34
@davetroiano
Copy link
Contributor Author

davetroiano commented Jul 8, 2022

sample dategen output:

{
   "type":"COMMITS",
   "createdAt":1657053559000,
   "data":{
      "sha":"dqsvg43667u67rn693aj97oo57m046mtk776w1e5",
      "node_id":"BYdX_r_9nAFkUhDZ_kT5ntC_2__7_FG__9_ijXOw",
      "commit":{
         "author":{
            "name":"Matthew Miller",
            "email":"[email protected]",
            "date":1657053559000
         },
         "committer":{
            "name":"GitHub",
            "email":"n\u0011QgNf<pb!(",
            "date":1657053559000
         },
         "message":"Merge pull request #45 from org88/branch-patch-14\\n\\nUpdate",
         "tree":{
            "sha":"345bp1ri36tb45k85t9qobalc27gpl91vg5m1svi",
            "url":"https://api.github.com/repos/org83/ghconnect/git/trees/hrnfxqoi3o3zgs725gs5a1342867y2o4r1lz3u7w"
         },
         "url":"https://api.github.com/repos/org29/ghconnect/git/trees/900x05d122d1mabi37e2nqfzv3s69dbvw0p03q82",
         "comment_count":3,
         "verification":{
            "verified":true,
            "reason":"valid",
            "signature":"-----BEGIN PGP SIGNATURE-----\\npx9rmk0dl1v42wu590eqy0ojq6ft1l93443d043j0590ic5gzlf7n65j891c6nkx9s3jjwoygk338h590l38td302rs4036c6w33ms54pnu0x531p594nvovu0jcocd916dkancz76wm48w1v7qo3k9r3whf7h4yugi9czb596kd9c4no29m5a7p8ka0f9ah4bq3p0pn68pgzvc6c797lo7l86d13d13f639rogm5e9gtc15oc2w6qoqtwnf4j0gc02fibp6f39bd8y70mxg458r07z192i280xa92dse28gqn54c0j2assc00b0nfe51q87sxmhv51876b39n24t3hc722kyv058jp1do14p1t2v61p8q91r3di22b29m34i42wo8jvr547j4zojv18zpd747q7j12daa4531uzeiidlq2c9hq7qd4g4nz5vl3756vlx9ph459bop9n7b9190vf9g86wg487q76aib6c4y65c99417x7rt5rkpayv0x\\n-----END PGP SIGNATURE-----",
            "payload":"tree h91ceqful2y6hb49c51f9328k13nt70579amu3r5\\nparent at9domrzn543i453i328j497cl9f87hm9a287qi6\\nparent 7fhs0x6426i615l9bxv9lll8oeplup8wwi892kxl\\nauthor First Last <> 1657053439 -0400\\ncommitter GitHub <[email protected]> 1657053439 -0400\\n\\nMerge pull request #63 from org27/patch-43\\n\\nUpdate"
         }
      },
      "url":"https://api.github.com/repos/org74/ghconnect/commits/kui90x264125p2ss4m95ox77olkv0e79847049ce",
      "html_url":"https://github.com/org96/ghconnect/commit/gt5wt413066d3bj63dr2j0995h804337bo9k706v",
      "comments_url":"https://api.github.com/repos/org63/ghconnect/commits/xy9wu07q7afl40qx220i49km9m2j206532145yzu/comments",
      "author":{
         "login":"user95",
         "id":829082,
         "node_id":"ulk8B4P4prbaR0R6Aky37dXl4a06ai29oKJYRt73=",
         "avatar_url":"https://avatars.githubusercontent.com/u/tnmhxkol",
         "gravatar_id":"hkxjcxoy",
         "url":"https://api.github.com/users/org64",
         "html_url":"https://github.com/org68",
         "followers_url":"https://api.github.com/users/org83/followers",
         "following_url":"https://api.github.com/users/org56/following{/other_user}",
         "gists_url":"https://api.github.com/users/org17/gists{/gist_id}",
         "starred_url":"https://api.github.com/users/org39/starred{/owner}{/repo}",
         "subscriptions_url":"https://api.github.com/users/org22/subscriptions",
         "organizations_url":"https://api.github.com/users/org74/orgs",
         "repos_url":"https://api.github.com/users/org72/repos",
         "events_url":"https://api.github.com/users/org17/events{/privacy}",
         "received_events_url":"https://api.github.com/users/org78/received_events",
         "type":"User",
         "site_admin":false
      },
      "committer":{
         "login":"user27",
         "id":829082,
         "node_id":"b1c46970cRe92QSTLC3gPm0KfYF3zU4WYpQZCNpE=",
         "avatar_url":"https://avatars.githubusercontent.com/u/lqsgyrcg",
         "gravatar_id":"ovogkmii",
         "url":"https://api.github.com/users/org68",
         "html_url":"https://github.com/org23",
         "followers_url":"https://api.github.com/users/org53/followers",
         "following_url":"https://api.github.com/users/org44/following{/other_user}",
         "gists_url":"https://api.github.com/users/org32/gists{/gist_id}",
         "starred_url":"https://api.github.com/users/org54/starred{/owner}{/repo}",
         "subscriptions_url":"https://api.github.com/users/org44/subscriptions",
         "organizations_url":"https://api.github.com/users/org38/orgs",
         "repos_url":"https://api.github.com/users/org31/repos",
         "events_url":"https://api.github.com/users/org88/events{/privacy}",
         "received_events_url":"https://api.github.com/users/org46/received_events",
         "type":"User",
         "site_admin":false
      },
      "parents":[
         {
            "sha":"81ww1l49o57tt6z17gm39l6p95u9lniqrm6h24cc",
            "url":"https://api.github.com/repos/org72/ghconnect/commits/n99871qgd16qig91kj0383bkae6hw533u6t7518b",
            "html_url":"https://github⚼com/org68/ghconnect/commit/mang3u8pu5163vt4arb30809wh9d576zcjy90jgd"
         },
         {
            "sha":"1h23v3h12ark725q26vyrk2z344lv6o6z7o11q2t",
            "url":"https://api.github.com/repos/org75/ghconnect/commits/xldo72987c2s0ellf30mt2s2jzj6hm7131ynakl2",
            "html_url":"https://github贐com/org49/ghconnect/commit/vax3s14w3ec36u4rubbx70fv7u43mek2z158x05j"
         }
      ]
   },
   "id":"87z1z4lkg4v54pdz3ei41qic0oi4y385sp9700ih"
}

Real commit from the GitHub connector:

{
   "type":"COMMITS",
   "createdAt":1657053439000,
   "data":{
      "sha":"d2dbd57d70e77e4b9409c504d948b8f01a2a7664",
      "node_id":"C_kwDOHnN-T9oAKGQyZGJkNTdkNzBlNzdlNGI5NDA5YzUwNGQ5NDhiOGYwMWEyYTc2NjQ",
      "commit":{
         "author":{
            "name":"Dave Troiano",
            "email":"[email protected]",
            "date":1657053439000
         },
         "committer":{
            "name":"GitHub",
            "email":"[email protected]",
            "date":1657053439000
         },
         "message":"Merge pull request #1 from davetroiano/davetroiano-patch-1\n\nUpdate README.md",
         "tree":{
            "sha":"80b40897a126c3b758f796ab5c863c91f09e2c90",
            "url":"https://api.github.com/repos/davetroiano/ghconnect/git/trees/80b40897a126c3b758f796ab5c863c91f09e2c90"
         },
         "url":"https://api.github.com/repos/davetroiano/ghconnect/git/commits/d2dbd57d70e77e4b9409c504d948b8f01a2a7664",
         "comment_count":0,
         "verification":{
            "verified":true,
            "reason":"valid",
            "signature":"-----BEGIN PGP SIGNATURE-----\n\nwsBcBAABCAAQBQJixKD/CRBK7hj4Ov3rIwAARt8IAA3L0Jp7uq89NDMl4RvRBzEE\n4ShFtqsymTwBfttpa0R8LVXS6G8T+E0sa5QvkzmRoMPPMnpk6v7dzputjW+443sU\nmiEnI5QJN/vk1xelksG4JTchaGC49XHH/7RlMp+pkTEc2lg849TRdLj36Up94QQ3\nTX3PsRDRMRROv/DH7Ffc7z8PeGM3xSP71STvoVAan8QZUr6JaJYG0l7ytQqW5etm\n8dXOtXvXJvGdjuQSOcBp7g4xN8QgfsHUo4E7IcJg4GlF5r05dyS7lH8bnvhMZPBk\nSSORmwbGkCb68skxxy0/+shPddnYM9Lond2ZOh3q+LhjRxFJdt4q0Ftl1AQv6L8=\n=V4pt\n-----END PGP SIGNATURE-----\n",
            "payload":"tree 80b40897a126c3b758f796ab5c863c91f09e2c90\nparent fe5ad67363f714ba343bbfbbe3c1a1d9ec67746f\nparent 80123e4683803c00d240086eb5563578d00bdeae\nauthor Dave Troiano <[email protected]> 1657053439 -0400\ncommitter GitHub <[email protected]> 1657053439 -0400\n\nMerge pull request #1 from davetroiano/davetroiano-patch-1\n\nUpdate README.md"
         }
      },
      "url":"https://api.github.com/repos/davetroiano/ghconnect/commits/d2dbd57d70e77e4b9409c504d948b8f01a2a7664",
      "html_url":"https://github.com/davetroiano/ghconnect/commit/d2dbd57d70e77e4b9409c504d948b8f01a2a7664",
      "comments_url":"https://api.github.com/repos/davetroiano/ghconnect/commits/d2dbd57d70e77e4b9409c504d948b8f01a2a7664/comments",
      "author":{
         "login":"davetroiano",
         "id":4550245,
         "node_id":"MDQ6VXNlcjQ1NTAyNDU=",
         "avatar_url":"https://avatars.githubusercontent.com/u/4550245?v=4",
         "gravatar_id":"",
         "url":"https://api.github.com/users/davetroiano",
         "html_url":"https://github.com/davetroiano",
         "followers_url":"https://api.github.com/users/davetroiano/followers",
         "following_url":"https://api.github.com/users/davetroiano/following{/other_user}",
         "gists_url":"https://api.github.com/users/davetroiano/gists{/gist_id}",
         "starred_url":"https://api.github.com/users/davetroiano/starred{/owner}{/repo}",
         "subscriptions_url":"https://api.github.com/users/davetroiano/subscriptions",
         "organizations_url":"https://api.github.com/users/davetroiano/orgs",
         "repos_url":"https://api.github.com/users/davetroiano/repos",
         "events_url":"https://api.github.com/users/davetroiano/events{/privacy}",
         "received_events_url":"https://api.github.com/users/davetroiano/received_events",
         "type":"User",
         "site_admin":false
      },
      "committer":{
         "login":"web-flow",
         "id":19864447,
         "node_id":"MDQ6VXNlcjE5ODY0NDQ3",
         "avatar_url":"https://avatars.githubusercontent.com/u/19864447?v=4",
         "gravatar_id":"",
         "url":"https://api.github.com/users/web-flow",
         "html_url":"https://github.com/web-flow",
         "followers_url":"https://api.github.com/users/web-flow/followers",
         "following_url":"https://api.github.com/users/web-flow/following{/other_user}",
         "gists_url":"https://api.github.com/users/web-flow/gists{/gist_id}",
         "starred_url":"https://api.github.com/users/web-flow/starred{/owner}{/repo}",
         "subscriptions_url":"https://api.github.com/users/web-flow/subscriptions",
         "organizations_url":"https://api.github.com/users/web-flow/orgs",
         "repos_url":"https://api.github.com/users/web-flow/repos",
         "events_url":"https://api.github.com/users/web-flow/events{/privacy}",
         "received_events_url":"https://api.github.com/users/web-flow/received_events",
         "type":"User",
         "site_admin":false
      },
      "parents":[
         {
            "sha":"fe5ad67363f714ba343bbfbbe3c1a1d9ec67746f",
            "url":"https://api.github.com/repos/davetroiano/ghconnect/commits/fe5ad67363f714ba343bbfbbe3c1a1d9ec67746f",
            "html_url":"https://github.com/davetroiano/ghconnect/commit/fe5ad67363f714ba343bbfbbe3c1a1d9ec67746f"
         },
         {
            "sha":"80123e4683803c00d240086eb5563578d00bdeae",
            "url":"https://api.github.com/repos/davetroiano/ghconnect/commits/80123e4683803c00d240086eb5563578d00bdeae",
            "html_url":"https://github.com/davetroiano/ghconnect/commit/80123e4683803c00d240086eb5563578d00bdeae"
         }
      ]
   },
   "id":"d2dbd57d70e77e4b9409c504d948b8f01a2a7664"
}

@davetroiano davetroiano force-pushed the github-datagen branch 3 times, most recently from 9829a7e to a15e630 Compare July 11, 2022 18:40
Copy link

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @davetroiano! I've looked the PR over and this LGTM
I like the overlap with the userid with the other generated records, should make for a rich environment for doing joins, aggregations etc.

@ybyzek
Copy link
Contributor

ybyzek commented Jul 20, 2022

@davetroiano maybe confirm with the connect team on the release brancing strategy for datagen, but it's possible this needs to merged into 0.5.x for it to be published in the next release.

@ybyzek ybyzek requested a review from a team July 20, 2022 15:21
Copy link
Contributor

@ybyzek ybyzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the name changes from avro to avsc should be a separate PR because it expands the scope of this PR and might need separate discussion/review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants