For a constant Learner & Data Analysts

This blog is helpful to those who see their career & passion in data Analysis & data scientist work. Focus would be on concepts and eventually discuss examples in Excel, SAS , R and Python. Happy Learning DataOps :)

Featured Post

Reference Books and material for Analytics

Website for practising R on Statistical conceptual Learning: https://statlearning.com  Reference Books & Materials: 1) Statis...

Wednesday, October 30, 2019

Mistagging of information when you don't know your data

Finding out relevant articles related to an entity is an interesting task. It becomes complex when an entity is known with various acronyms and short forms. It becomes further complex when you have multiple entities with similar names, short names or acronyms.

The whole effort of complex web crawling and web scrapping framework using python scrappy, selenium etc. including tagging and presentation will go for a toss if articles and documents are not entity-tagged properly.

If you search South Indian Bank Stock on https:/moneycontrol.com/  website today (as on 30th Oct 19) and go to News & Research, the most recent and relevant articles you find for this stock is actually not related at all with South Indian Bank entity. Forget about title, you would not even find a mention of the entity, South Indian Bank anywhere inside the article. Actually it is related with an entity which is completely different but similar in name called Indian Bank.







Though money control website and mobile application are amazing in various aspects and it is one of the good sources of information for most of us who are active in share market but this kind of blunder does occur when you do not understand you data well.
Finding relevant articles related to an entity through matching has to be improved specially in these cases.

My Suggestion to moneycontrol Application-cum-AI architect would be to follow following simple steps while tagging.

Ø  Tag articles with entity name matches directly with Title text
Ø  Tag articles with entity name matches directly with Body text
Ø  Tag articles with entity name matches Partially but sufficient with Title text
o   Complex Fuzzy Match
o   Matches with Acronym
o   Matches with other short form
Ø  Tag articles with entity name matches Partially but sufficient with Body text
o   Complex Fuzzy Match
o   Matches with Acronym
o   Matches with other short form
Ø   Save the name of Matched and Matching entities along with Article IDs, steps etc.
Ø  Exclude an Article if it Matched entities directly matches with other matching entities.


Posted by Ashutosh at 9:56 PM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

Tuesday, September 3, 2019

AWS Solution Architect Associate Exam (Read Time - 4 Mins)

I passed my AWS Solution Architect Associate Exam couple of months back. Please find below useful tips on the same.

Before the exam:
  • Go through the official AWS learning library: https://www.aws.training/LearningLibrary. It is entirely free & has the most updated information about AWS services.
  • Complete the official AWS Exam Readiness: AWS Certified Solutions Architect (Associate) - Digital training (Free) :https://www.aws.training/learningobject/curriculum?id=20685
  • Read the FAQ of each AWS Service. e.g., https://aws.amazon.com/vpc/faqs/
  • Understand the AWS Well-Architected Framework & read each whitepaper from here: https://aws.amazon.com/architecture/well-architected/
  • Take handwritten notes & make personalized cheat sheets whenever possible.
  • Do plenty of hands-on practice. I had used Qwiklabs & it helped me a lot (https://www.qwiklabs.com)
  • You need to understand how each AWS service can be tweaked for Cost, Quality, and Performance. How can you make S3 cheaper? How can you make it more redundant/secure? How can you make it more performant? What about DynamoDB or EBS? EC2? Etc.
  • Take plenty of practice tests; it will give you confidence for the actual exam.

During the exam:
  • Get plenty of rest before the exam day. It's very challenging to maintain concentration for 130 minutes, without any breaks.
  • Read the answers first to understand what to focus on in the question.
  • Read each question twice & make sure you have found the "keywords." It's the part of the question that tells you exactly what they want. e.g., "Which option provides the MOST COST EFFECTIVE solution."
  • If you have no clue at first, eliminate wrong answers, then guess. Mark it for review and revisit it if you have time.


I hope you find this useful. And all the best for your exam !!
Posted by Ashutosh at 3:37 AM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

Tuesday, August 27, 2019

Apache Spark in Google Collaboratory

This is from my learning notes!!!

1.1    Setting up Spark on Google Colab


Google Collaborator is perfect cloud platform for someone to start learning Python. You can access what you practiced from anywhere and everywhere.

This could also be used to learn Spark . Please follow below steps. Make sure you check the file version and do the modification as needed (like look for latest .tgz file etc.)

1.1.1    Install Java, Spark, and Findspark

!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
!tar xf spark-2.4.3-bin-hadoop2.7.tgz
!pip install -q findspark

1.1.2    Set Environment Variables

import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.3-bin-hadoop2.7"

1.1.3    Start a SparkSession

import findspark
findspark.init()
from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local[*]").getOrCreate()

1.1.4    Use Spark!

df = spark.createDataFrame([{"winner": "Humanity"} for x in range(100)])

df.show(2)


Posted by Ashutosh at 12:42 AM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Newer Posts Older Posts Home
Subscribe to: Posts (Atom)

Group of Learners

Blog Archive

  • ►  2021 (3)
    • ►  May (2)
    • ►  January (1)
  • ►  2020 (2)
    • ►  April (2)
  • ▼  2019 (3)
    • ▼  October (1)
      • Mistagging of information when you don't know your...
    • ►  September (1)
      • AWS Solution Architect Associate Exam (Read Time -...
    • ►  August (1)
      • Apache Spark in Google Collaboratory
  • ►  2018 (5)
    • ►  April (2)
    • ►  February (3)
  • ►  2016 (16)
    • ►  December (14)
    • ►  November (1)
    • ►  October (1)

Follow this Blog

Posts
Atom
Posts
All Comments
Atom
All Comments

Wikipedia

Search results

Author

My photo
Ashutosh
IT Professional, Researcher,Pita, Pati,Putra & Data Analyst. I am building my own datamart of Knowledge thru Life Experiences . My performance measure is satisfaction that I get back. Thanks
View my complete profile
Simple theme. Powered by Blogger.