{"id":762,"date":"2020-12-06T00:00:00","date_gmt":"2020-12-05T18:30:00","guid":{"rendered":"https:\/\/www.guru99.com\/accessing-internet-data-with-python.html"},"modified":"2024-08-12T16:02:04","modified_gmt":"2024-08-12T10:32:04","slug":"accessing-internet-data-with-python","status":"publish","type":"post","link":"https:\/\/www.guru99.com\/accessing-internet-data-with-python.html","title":{"rendered":"Python Internet Access using Urllib.Request and urlopen()","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"<h2>What is urllib?<\/h2>\n<p>urllib is a Python module that can be used for opening URLs. It defines functions and classes to help in URL actions.<\/p>\n<p>With Python you can also access and retrieve data from the internet like XML, HTML, JSON, etc.  You can also use Python to work with this data directly. In this tutorial we are going to see how we can retrieve data from the web. For example, here we used a guru99 video URL, and we are going to access this video URL using Python as well as print HTML file of this URL.<\/p>\n\n<style>.kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-table-of-content-wrap{padding-top:var(--global-kb-spacing-sm, 1.5rem);padding-right:var(--global-kb-spacing-sm, 1.5rem);padding-bottom:var(--global-kb-spacing-sm, 1.5rem);padding-left:var(--global-kb-spacing-sm, 1.5rem);background-color:#edf2f7;border-top-width:1px;border-right-width:1px;border-bottom-width:1px;border-left-width:1px;box-shadow:0px 0px 14px 0px rgba(0, 0, 0, 0.2);max-width:450px;}.kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-table-of-contents-title-wrap{padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;}.kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-table-of-contents-title{font-weight:regular;font-style:normal;}.kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-table-of-content-wrap .kb-table-of-content-list{font-weight:regular;font-style:normal;margin-top:var(--global-kb-spacing-sm, 1.5rem);margin-right:0px;margin-bottom:0px;margin-left:0px;}.kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-toggle-icon-style-basiccircle .kb-table-of-contents-icon-trigger:after, .kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-toggle-icon-style-basiccircle .kb-table-of-contents-icon-trigger:before, .kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-toggle-icon-style-arrowcircle .kb-table-of-contents-icon-trigger:after, .kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-toggle-icon-style-arrowcircle .kb-table-of-contents-icon-trigger:before, .kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-toggle-icon-style-xclosecircle .kb-table-of-contents-icon-trigger:after, .kb-table-of-content-nav.kb-table-of-content-id_57115a-80 .kb-toggle-icon-style-xclosecircle .kb-table-of-contents-icon-trigger:before{background-color:#edf2f7;}<\/style>\n\n<div class='code-block code-block-2' style='margin: 8px 0; clear: both;'>\n<style>\n.guru99_incontent_21 {\n\tmin-height: 280px !important;\n  display: flex;\n  align-items: center;\n  justify-content: center;\n}\n<\/style>\n\n<div align=\"center\" id=\"guru99_mobile_display\" class=\"guru99_incontent_21\">\n    \n  <script>\n    googletag.cmd.push(function() { googletag.display('guru99_mobile_display'); });\n  <\/script>\n<\/div><\/div>\n\n<h2>How to Open URL using Urllib<\/h2>\n<p>Before we run the code to connect to Internet data, we need to import statement for URL library module or &#8220;urllib&#8221;.<\/p>\n<p style=\"text-align:center;\"><a href=\"https:\/\/www.guru99.com\/images\/Pythonnew\/python19_1.png\" data-lasso-id=\"474196\"><img decoding=\"async\" alt=\"Open URL using Urllib\" src=\"https:\/\/www.guru99.com\/images\/Pythonnew\/python19_1.png\" width=\"90%\" class=\"\"><\/a><\/p>\n<ul>\n<li>Import urllib<\/li>\n<li>Define your main function<\/li>\n<li>Declare the variable webUrl<\/li>\n<li>Then call the urlopen function on the URL lib library<\/li>\n<li>The URL we are opening is guru99 tutorial on youtube<\/li>\n<li>Next, we going to print the result code<\/li>\n<li>Result code is retrieved by calling the getcode function on the webUrl variable we have created<\/li>\n<li>We going to convert that to a string, so that it can be concatenated with our string &#8220;result code&#8221;<\/li>\n<li>This will be a regular HTTP code &#8220;200&#8221;, indicating http request is processed successfully<\/li>\n<\/ul>\n<h2>How to get HTML file form URL in Python<\/h2>\n<p>You can also read the HTML file by using the &#8220;read function&#8221; in Python, and when you run the code, the HTML file will appear in the console.<\/p>\n<p style=\"text-align: center; \"><a href=\"https:\/\/www.guru99.com\/images\/Pythonnew\/python19_2.png\" data-lasso-id=\"474197\"><img decoding=\"async\" alt=\"HTML file form URL in Python\" src=\"https:\/\/www.guru99.com\/images\/Pythonnew\/python19_2.png\" width=\"90%\" class=\"\"><\/a><\/p>\n<ul>\n<li>Call the read function on the webURL variable<\/li>\n<li>Read variable allows to read the contents of data files<\/li>\n<li>Read the entire content of the URL into a variable called data<\/li>\n<li>Run the code- It will print the data into HTML format<\/li>\n<\/ul>\n<p>Here is the complete code<\/p>\n<h3>Python 2 Example<\/h3>\n<pre>#  \n# read the data from the URL and print it\n#\nimport urllib2\n\ndef main():\n# open a connection to a URL using urllib2\n   webUrl = urllib2.urlopen(\"https:\/\/www.youtube.com\/user\/guru99com\")\n  \n#get the result code and print it\n   print \"result code: \" + str(webUrl.getcode()) \n  \n# read the data from the URL and print it\n   data = webUrl.read()\n   print data\n \nif __name__ == \"__main__\":\n  main()<\/pre>\n<h3>Python 3 Example<\/h3>\n<pre>#\n# read the data from the URL and print it\n#\nimport urllib.request\n# open a connection to a URL using urllib\nwebUrl  = urllib.request.urlopen('https:\/\/www.youtube.com\/user\/guru99com')\n\n#get the result code and print it\nprint (\"result code: \" + str(webUrl.getcode()))\n\n# read the data from the URL and print it\ndata = webUrl.read()\nprint (data)<\/pre>","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"excerpt":{"rendered":"<p>What is urllib? urllib is a Python module that can be used for opening URLs. It defines functions and classes to help in URL actions. With Python you can also access and retrieve data from the internet like XML, HTML, JSON, etc. You can also use Python to work with this data directly. In this&#8230;<\/p>\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"author":60,"featured_media":46792,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_blocks_custom_css":"","_kad_blocks_head_custom_js":"","_kad_blocks_body_custom_js":"","_kad_blocks_footer_custom_js":"","_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[40],"tags":[154,146],"coauthors":[501],"class_list":["post-762","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-convertbox-developer","tag-non-amp"],"taxonomy_info":{"category":[{"value":40,"label":"Python"}],"post_tag":[{"value":154,"label":"Convertbox-Developer"},{"value":146,"label":"Non AMP"}]},"featured_image_src_large":["https:\/\/www.guru99.com\/images\/accessing-internet-data.png",650,407,false],"author_info":{"display_name":"Anna Blake","author_link":"https:\/\/www.guru99.com\/author\/anna"},"comment_info":0,"category_info":[{"term_id":40,"name":"Python","slug":"python","term_group":0,"term_taxonomy_id":40,"taxonomy":"category","description":"","parent":0,"count":101,"filter":"raw","cat_ID":40,"category_count":101,"category_description":"","cat_name":"Python","category_nicename":"python","category_parent":0}],"tag_info":[{"term_id":154,"name":"Convertbox-Developer","slug":"convertbox-developer","term_group":0,"term_taxonomy_id":154,"taxonomy":"post_tag","description":"","parent":0,"count":834,"filter":"raw"},{"term_id":146,"name":"Non AMP","slug":"non-amp","term_group":0,"term_taxonomy_id":146,"taxonomy":"post_tag","description":"","parent":0,"count":1292,"filter":"raw"}],"gt_translate_keys":[{"key":"link","format":"url"}],"_links":{"self":[{"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/posts\/762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/users\/60"}],"replies":[{"embeddable":true,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/comments?post=762"}],"version-history":[{"count":0,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/posts\/762\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/media\/46792"}],"wp:attachment":[{"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/media?parent=762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/categories?post=762"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/tags?post=762"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.guru99.com\/wp-json\/wp\/v2\/coauthors?post=762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}